Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Corpus word set for Solthiruthi #206

Open
arcturusannamalai opened this issue May 17, 2020 · 3 comments
Open

Corpus word set for Solthiruthi #206

arcturusannamalai opened this issue May 17, 2020 · 3 comments

Comments

@arcturusannamalai
Copy link
Collaborator

Use open datasets from

  1. https://www.kaggle.com/disisbig/tamil-wikipedia-articles
  2. https://www.kaggle.com/disisbig/tamil-news-dataset
@VpkPrasanna
Copy link

Hi @arcturusannamalai can you please elaborate this issue .
do we need to add this dataset into our library ?

@arcturusannamalai
Copy link
Collaborator Author

@VpkPrasanna - yes you can use these datasets and form a valid word list for the spelling checker; currently the word lists are https://github.com/Ezhil-Language-Foundation/open-tamil/blob/main/solthiruthi/data/tamilvu_dictionary_words.txt etc.

@VpkPrasanna
Copy link

@VpkPrasanna - yes you can use these datasets and form a valid word list for the spelling checker; currently the word lists are https://github.com/Ezhil-Language-Foundation/open-tamil/blob/main/solthiruthi/data/tamilvu_dictionary_words.txt etc.

SO i have to add the new datasets into the same file right ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants