The Reuters Corpus contains 10,788 news documents totaling 1.3 million words.The documents have been classified into 90 topics, and grouped into two sets, called "training" and "test"; thus, the text with fileid Unlike the Brown Corpus, categories in the Reuters corpus overlap with each other, simply because a news story often covers multiple topics.This corpus contains text from 500 sources, and the sources have been categorized by genre, such as Next, we need to obtain counts for each genre of interest.

As just mentioned, a text corpus is a large body of text.We can ask for the topics covered by one or more documents, or for the documents included in one or more categories.For convenience, the corpus methods accept a single fileid or a list of fileids.Similarly, we can specify the words or sentences we want in terms of files or categories.The first handful of words in each of these texts are the titles, which by convention are stored as upper case.

