Skip to content
This repository has been archived by the owner on Oct 30, 2018. It is now read-only.

Automatic detection of the char encoding using the juniversalchardet #39

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

chechu
Copy link

@chechu chechu commented Feb 29, 2012

First, thank you for your work!

I have included the autodetection of the char encoding used by the web page using the juniversalchardet library. Feel free to include it in the master branch or discard it :-)

Moreover, I have added some code to make easy the integration of an autodetection language library (as jlangdetect or lingpipe) in StopWords.scala. Nowadays I am using my own private language identifier but it would be easy to include some other library. Maybe in the future :-)

Thank you again, and good luck

Jesus Lanchas added 2 commits February 29, 2012 14:18
…library.

Moreover, the system now is prepared to use a language detection previously to count the stop words in each fragment.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant