Skip to content

MaximeNe/Sentence-aligner-for-text-corpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sentence aligner for FR - FRNB corpus

Tool used:

The hunalign sentence aligner
github: https://github.com/danielvarga/hunalign
paper: D. Varga, L. Németh, P. Halácsy, A. Kornai, V. Trón, V. Nagy (2005). Parallel corpora for medium density languages In Proceedings of the RANLP 2005, pages 590-596. (pdf) It remains unchanged and in the directory hunalign

Auto generation of the required dictionary:

The file "Appendix NB.xlsx" from the All-inGMT projet is used to generate a dictionary from French to NonBinary French. It is completed with words used in both FR and FRNB texts to align.

Author of added code:

Maxime NEMO
Maxime.Nemo@grenoble-inp.org

How to use:

Choice 1 (preferred):

Use the built app in the github repo -> click "actions", and then select the lasted build. Then on the "artifact" section, click "app"
To use the app, then unzip the file, and go to dist/run.

  • create a file named "fr.txt" containing the source sentences (one by line)
  • create a file named "nb.txt" containing the target sentences (one by line)
  • run the program name run
  • The output is a .xls file named output.xls

Choice 2 (for developpers):

install hunalign

check the hunaligh github

install dependencies

  pip install xlwt xlrd==1.2.0 

use it

  • create a file named "fr.txt" containing the source sentences (one by line)
  • create a file named "nb.txt" containing the target sentences (one by line)
  • run the python script
     python3 run.py 
  • The output is a .xls file named output.xls

Licence

Licensed under the GNU LGPLv3 or later.

About

Sentence aligner for French - Non Binary French corpus

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published