Skip to content

The dataset is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. • Builded vocabulary from the dataset which was used as a feature set. • Implemented Multinomial Naive Bayes classifier from scratch for classifying news into appropriate group.

Notifications You must be signed in to change notification settings

tanishq9/Text-Classification-20-Newsgroups

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

Text-Classification for 20-Newsgroups

• The dataset is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.

• Builded vocabulary from the dataset which was used as a feature set.

• Implemented Multinomial Naive Bayes classifier from scratch for classifying news into appropriate group.

Results :

• Naive Bayes from scratch : 0.8474

• SKlearn Naive Bayes : 0.8476

About

The dataset is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. • Builded vocabulary from the dataset which was used as a feature set. • Implemented Multinomial Naive Bayes classifier from scratch for classifying news into appropriate group.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published