Automatic detection of the char encoding using the juniversalchardet #39

chechu · 2012-02-29T21:58:32Z

First, thank you for your work!

I have included the autodetection of the char encoding used by the web page using the juniversalchardet library. Feel free to include it in the master branch or discard it :-)

Moreover, I have added some code to make easy the integration of an autodetection language library (as jlangdetect or lingpipe) in StopWords.scala. Nowadays I am using my own private language identifier but it would be easy to include some other library. Maybe in the future :-)

Thank you again, and good luck

…library. Moreover, the system now is prepared to use a language detection previously to count the stop words in each fragment.

Jesus Lanchas added 2 commits February 29, 2012 14:18

Automatic detection of the char encoding using the juniversalchardet …

b73646a

…library. Moreover, the system now is prepared to use a language detection previously to count the stop words in each fragment.

Complement to the previous commit.

c6ccc8a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatic detection of the char encoding using the juniversalchardet #39

Automatic detection of the char encoding using the juniversalchardet #39

chechu commented Feb 29, 2012

Automatic detection of the char encoding using the juniversalchardet #39

Are you sure you want to change the base?

Automatic detection of the char encoding using the juniversalchardet #39

Conversation

chechu commented Feb 29, 2012