giLM

giLM is a the GPU language model based on gLM( he details about the design and implementation can be found in this paper, published at ACL 2016.) giLM was designed for MODLMs, which is a mixture of Neural language model and n-gram language model [http://arxiv.org/abs/1606.00499]

The codes of BTree and TRIE were implemented by Bogoychev but parts of them are useless for giLM because giLM actually does not need backoff parameter. The query on GPU is different from gLM and is divided into two parts: parallel search then parallel traversal. If you have any question, please connect me at (ljyduke@gamil.com) without hesitation.

Build

git clone https://github.com/DukeEnglish/giLM.git
cd giLM
mkdir release_build
cd release_build
cmake ..
make -j4


### Additional cmake build flags
- `-DBUILDTYPE=debug` builds with -O0 and -g
- `-DCOMPUTE_VER` set the compute version of the hardware. Default is 52. **IT WILL NOT PRODUCE CORRECT SCORES IF IT IS COMPILED WITH A WRONG COMPUTE VERSION!!! CHECK YOUR GPU'S COMPUTE VERSION [HERE](https://en.wikipedia.org/wiki/CUDA)**. If `make test` doesn't fail any of the GPU tests, it means your compute version is correct.
- `-DBAD_HOST` this should help building on older Ubuntu systems such as 12.04 and 14.04. Don't use it unless you have trouble building.
- `-DPYTHON_INCLUDE_DIR` defines the path to the python library such as `/usr/include/python2.7/pyconfig.h` or `/usr/include/python3.6m/pyconfig` and enables building the python components.
- `-DPYTHON_VER` is set to default to 2.7 If you want to build the python components with a different version, set it to your desired version. It would have no effect unless `-DPYTHON_INCLUDE_DIR` is set.
- `--DYAMLCPP_DIR` should be se if your yaml-cpp is in a non standard location (standard is `/usr/incude`).


## Binarize arpa files
```bash
cd path_to_gilm/release_build/bin
./binarize_v2 path_to_arpa_file output_path [btree_node_size]

btree_node_size should be an odd number. Personally I found that 31 works best, but you should experiment. The number could vary with different size arpa files and different GPUs

Batch query

To benchmark giLM in batch setting do:

cd path_to_gilm/release_build/bin
./batch_query_v2 path_to_binary_lm_dir path_to_test_file [gpuDeviceID=0] [addBeginEndMarkers_bool=1] //[default setup]

path_to_binary_lm_dir : the directory of binary_lm path_to_test_file: the batch query file (which contains all the sentence you want to query. For single sentence you should use interactive query)

Results

From the results of experiments, giLM can satisfy the requirements of MODLMs. Compared with existing GPU language model, it outperforms over the requirements of MODLMs. This project is expected to contribute a efficient queryable language model on GPU for MODLMs and as a foundation to improve the applications where MODLMs can be applied, such as neural machine translation.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Btree		Btree
LM		LM
Parser		Parser
Test		Test
Trie		Trie
arpa		arpa
bin		bin
gpu		gpu
lib		lib
misc		misc
misc_testing		misc_testing
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
language-model-gpu.pdf		language-model-gpu.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

giLM

Build

Batch query

Results

About

Releases

Packages

Languages

License

DukeEnglish/giLM

Folders and files

Latest commit

History

Repository files navigation

giLM

Build

Batch query

Results

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages