Skip to content

Latest commit

 

History

History
162 lines (136 loc) · 4.82 KB

README.md

File metadata and controls

162 lines (136 loc) · 4.82 KB

japanese-names-trans

Translate Japanese Names to / from English

This software uses PyTorch port of OpenNMT, an open-source (MIT) neural machine translation system.

It is also available as a free online service (up to 500 translations per month) from Japanese-name.app or from NamSor API.

Requirements

Install OpenNMT-py from pip:

pip install OpenNMT-py

Japanese Names Parallel Corpus

The names in data\parallel-japanese-corpus are represented (one line per FirstName or LastName) as follow :

names-en-train

^ln f u n a k o s h i $
^fn s a b u r o $
[...]

names-jp-train

^ln 船 越 $
^fn 三 朗 $
[...]

Train ONMT translation model

ONMT preprocess data

Prepare Japanese to English data :

onmt_preprocess -train_src data/parallel-japanese-corpus/names-jp-train.txt -train_tgt data/parallel-japanese-corpus/names-en-train.txt -valid_src data/parallel-japanese-corpus/names-jp-val.txt -valid_tgt data/parallel-japanese-corpus/names-en-val.txt -save_data data/onmt-model/jp_en_data

This will output three files in data/onmt-model : jp_en_data.train.0.pt, jp_en_data.valid.0.pt, jp_en_data.vocab.pt

Prepare English to Japanese data :

onmt_preprocess -train_src data/parallel-japanese-corpus/names-en-train.txt -train_tgt data/parallel-japanese-corpus/names-jp-train.txt -valid_src data/parallel-japanese-corpus/names-en-val.txt -valid_tgt data/parallel-japanese-corpus/names-jp-val.txt -save_data data/onmt-model/en_jp_data

This will output three files in data/onmt-model : en_jp_data.train.0.pt, en_jp_data.valid.0.pt, en_jp_data.vocab.pt

ONMT train model

Train Japanese to English machine translation model :

onmt_train -data data/onmt-model/jp_en_data -save_model data/onmt-model/jp_en_model -world_size 1 -gpu_ranks 0

This will output files in data/onmt-model : jp_en_model_step_100000.pt

Train English to Japanese machine translation model :

onmt_train -data data/onmt-model/en_jp_data -save_model data/onmt-model/en_jp_model -world_size 1 -gpu_ranks 0

This will output files in data/onmt-model : en_jp_model_step_100000.pt

ONMT test model

Test Japanese to English machine translation model, with top-4 candidates outputs :

onmt_translate -model data/onmt-model/jp_en_model_step_100000.pt -src data/parallel-japanese-corpus/names-jp-test.txt -output data/test/names-en-test-out.txt -replace_unk -n_best 3

Test English to Japanese machine translation model, with top-4 candidates outputs :

onmt_translate -model data/onmt-model/en_jp_model_step_100000.pt -src data/parallel-japanese-corpus/names-en-test.txt -output data/test/names-jp-test-out.txt -replace_unk -n_best 3

Overall accuracy

We use the test outputs to calculate the accuracy, for getting the first translation right ; the first OR the second translation right ; any of the first N candidates right :

Translation direction Match 1 Match 2 Match 3 Match 4 Match 5
English To Japanese 57% 70% 76% 79% 82%
Japanese To English 87% 92% 94% 96% 97%

Running the ONMT server

Install flask from pip:

pip install flask

To run the ONMT server, copy jp_en_model_step_100000.pt and en_jp_model_step_100000.pt into directory /available_models/ then run :

onmt_server 

You can try the following GET method to check that the server is running :

curl -i -X GET \
    http://localhost:5000/translator/health

which should return

{
  "status": "ok"
}

Use model ID=100 for translating to English and ID=101 to translate to Japanese. Models are configured in /available_models/conf.json

You can query the server to translate using this POST method

curl -i -X POST -H "Content-Type: application/json" \
    -d '[{"src": "^ln f u n a k o s h i $", "id": 100}]' \
    http://localhost:5000/translator/translate

which should return

[
  [
    {
      "n_best": 5,
      "pred_score": -0.23048973083496094,
      "src": "^ln f u n a k o s h i $",
      "tgt": "^ln 船 越 $"
    }
  ],
  [
    {
      "n_best": 5,
      "pred_score": -1.6027336120605469,
      "src": "^ln f u n a k o s h i $",
      "tgt": "^ln 舩 越 $"
    }
  ],
  [
    {
      "n_best": 5,
      "pred_score": -5.745663642883301,
      "src": "^ln f u n a k o s h i $",
      "tgt": "^ln 舟 越 $"
    }
  ],
  [
    {
      "n_best": 5,
      "pred_score": -8.610189437866211,
      "src": "^ln f u n a k o s h i $",
      "tgt": "^ln 二 越 $"
    }
  ],
  [
    {
      "n_best": 5,
      "pred_score": -8.685261726379395,
      "src": "^ln f u n a k o s h i $",
      "tgt": "^ln 布 越 $"
    }
  ]
]