This repository has been archived by the owner on Oct 30, 2018. It is now read-only.
Automatic detection of the char encoding using the juniversalchardet #39
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
First, thank you for your work!
I have included the autodetection of the char encoding used by the web page using the juniversalchardet library. Feel free to include it in the master branch or discard it :-)
Moreover, I have added some code to make easy the integration of an autodetection language library (as jlangdetect or lingpipe) in StopWords.scala. Nowadays I am using my own private language identifier but it would be easy to include some other library. Maybe in the future :-)
Thank you again, and good luck