Utilizare pachet: http://cran.r-project.org/web/packages/tm.plugin.webmining/
Package ‘tm.plugin.webmining’
Utilizare google news:
GoogleNewsSource(query, params = list(hl = "en", q = query, ie = "utf-8", num
= 100, output = "rss"), ...)
exemplu practic:
corpus <- Corpus(GoogleNewsSource("Microsoft"))
Scenariu de test:
> library(tm)
> library(tm.plugin.webmining)
> googlenews <- WebCorpus(GoogleNewsSource("Stiri"))
> googlenews
<<WebCorpus (documents: 100, metadata (corpus/indexed): 3/0)>>
>corpus.update(googlenews,)
> inspect(googlenews)
VCorpus(VectorSource(googlenews))
dtm <- DocumentTermMatrix(googlenews)
findFreqTerms(dtm, 5)
inspect(removeSparseTerms(dtm, 0.4))
writeCorpus(googlenews, path = "C:\R", filenames = NULL)