Skip to content

Commit

Permalink
Merge pull request #107 from oscar-project/fix-#106
Browse files Browse the repository at this point in the history
Add information about other langID models
  • Loading branch information
Uinelj authored Jul 25, 2023
2 parents 916eba3 + 2778415 commit 888f6ff
Showing 1 changed file with 11 additions and 2 deletions.
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,9 +33,18 @@ apt install -y libboost-all-dev libeigen3-dev

and use `cargo install ungoliant --features kenlm` or `cargo b --features kenlm` if you're building from source.

### Getting the language identification file (for fastText):
### Getting a language identification file (for fastText):

By default, `ungoliant` expects the `lid.176.bin` model by meta.
Use `curl https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin -o lid.176.bin` to get it.

However, you can use the model you want: just point to its path using `ungoliant download --lid-path <path to lid>`.

Other options include:

- NLLB model (https://huggingface.co/facebook/fasttext-language-identification)
- OpenLID model (https://github.com/laurieburchell/open-lid-dataset)

Use `curl https://dl.fbaipublicfiles.com/fasttext/supervised-models/lid.176.bin -o lid.176.bin`.

## Usage

Expand Down

0 comments on commit 888f6ff

Please sign in to comment.