Skip to content

A simple least squares model that predicts whether a given word is Spanish or French based on a selection of simple features.

Notifications You must be signed in to change notification settings

malmahasnah/languageclassifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 

Repository files navigation

Language Classifier

DSC 140A: Probabilistic Modeling and Machine Learning at UC San Diego

A simple Least Squares Classifier. Predicts whether a given word is Spanish or French based on a few bi-gram features.

I wrote a function that generates every two letter sequence in the alphabet to use it as a feature; I also manually added some common French and Spanish sequences and preffixes. This model achieved an accuracy of at least 75% on the training data. This model performed well on the unseen data and achieved an accuracy of %84.12 on the leaderboard.

Acknowledgements: Professor Justin Eldridge, UC San Diego.

About

A simple least squares model that predicts whether a given word is Spanish or French based on a selection of simple features.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages