Distinguish authors via PCA 用PCA进行作者识别分析
This exercise will use NLTK database and distinguish four authors via PCA. As a programming beginner, there might be some mistakes in this post. If any please comment below. Any feedback is welcome. NLTK (Natural Language Toolkit) is a platform with plenty of human language data with over 50 corpora and lexical resources, such as Brown Corpus, Project Gutenberg, NPS Chat, etc. This post we will use text from Project Gutenberg and analyze four