R tm

来源:互联网 发布:尚学堂的大数据怎么样 编辑:程序博客网 时间:2024/06/07 23:13

===

> tdm <- TermDocumentMatrix(doc.corpus)

Error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms),  : 
  'i, j, v' different lengths
In addition: Warning messages:
1: In parallel::mclapply(x, termFreq, control) :
  all scheduled cores encountered errors in user code
2: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL'
3: In simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allTerms),  :

  NAs introduced by coercion

原来是读入数据时的解码类型不对。

msg <- readLines(file, encoding = "latin1")

=== tm 中的stemming不准确