Alignment: Usage

Aligning texts

Alignment experiment folders only require src.txt and trg.txt files to be run. A config file will be generated automatically for the experiment, but one can still be created manually to customize the alignment.

align

Aligns the parallel corpora for the designated experiments.

usage: python -m silnlp.alignment.align [-h] [--aligners [aligner [aligner ...]]]
[--skip-align] [--skip-extract-lexicon]
experiments

Arguments:

Argument	Purpose	Description
`experiments`	Experiment pattern	The pattern of the experiment subfolders where the configuration files will be generated. The subfolders must be located in the `SIL_NLP_DATA_PATH > Alignment > experiments` folder.
`--aligners [aligner [aligner ...]]`	List of aligners	List of aligners to use to align each corpus.
`--skip-align`	Skip aligning corpora	Skip aligning corpora.
`--skip-extract-lexicon`	Skip extracting lexicons	Skip extracting lexicons.

bulk_align

Aligns source Bible to defined set of Bibles.

usage: python -m silnlp.alignment.bulk_align [-h] src_path trg_dir
output_dir [--aligner ALIGNER] [--multiprocess]

Arguments:

Argument	Purpose	Description
`src_path`	Path to source Bible text	Path to source Bible text.
`trg_dir`	Folder of Bibles to align to	Folder of Bibles to align to.
`output_dir`	Folder to contain Bible alignments	Folder to contain Bible alignments.
`--aligner ALIGNER`	Aligner to use	Aligner to use for extraction. Default is "fast_align".
`--multiprocess`	Use multiple processes	Use multiple processes, that is if the chosen alignement algorithm does not do so already.

test

Tests generated alignments against gold standard alignments.

usage: python -m silnlp.alignment.test [-h] [--combine-pattern PATTERN]
[--test-size SIZE] [--books [book [book ...]]] [--by-book]
experiments

Arguments:

Argument	Purpose	Description
`experiment`	Experiment name	The name of the experiment to test. The experiment name must correspond to a subfolder in the `SIL_NLP_DATA_PATH > Alignment > experiments` folder.
`--combine-pattern PATTERN`	Combine pattern	Combine pattern.
`--test-size`	Test size	Set the number of verse alignments to test. If test size is greater than the total number of verses, the verses tested will be selected randomly.
`--books [book [book ...]]`	Books to score	Specifies one or more books to be scored. When this option is used, the test tool will generate predictions for the entire target language test set, but provide a score only for the specified book(s). Book must be specified using the 3 character abbreviations from the USFM 3.0 standard (e.g., "GEN" for Genesis)
`--by-book`	Score individual books	In addition to providing an overall score for all the books in the test set, provide individual scores for each book in the test set. If this option is used in combination with the `--books` option, individual scores are provided for each of the specified books.

Miscellaneous commands

preprocess

Preprocesses Clear gold standard alignments.

usage: python -m silnlp.alignment.preprocess [-h] experiments

Arguments:

Argument	Purpose	Description
`experiments`	Experiment pattern	The pattern of the experiment subfolders where the configuration files will be generated. The subfolders must be located in the `SIL_NLP_DATA_PATH > Alignment > experiments` folder.

generate_clear_models

Generates translation model for Clear from an alignment model.

usage: python -m silnlp.alignment.preprocess [-h] --aligner ALIGNER --output PATH experiments

Arguments:

Argument	Purpose	Description
`experiments`	Experiment pattern	The pattern of the experiment subfolders where the configuration files will be generated. The subfolders must be located in the `SIL_NLP_DATA_PATH > Alignment > experiments` folder.
`--aligner ALIGNER`	Aligner	Aligner to use.
`--output PATH`	Output directory	Output directory.

test_size

Finds the optimal size for a gold standard.

usage: python -m silnlp.alignment.test_size [-h] [--threshold THRESHOLD]
[--test-size SIZE] [--books [book [book ...]]] experiments

Arguments:

Argument	Purpose	Description
`experiments`	Experiment pattern	The pattern of the experiment subfolders where the configuration files will be generated. The subfolders must be located in the `SIL_NLP_DATA_PATH > Alignment > experiments` folder.
`--threshold THRESHOLD`	Similarity threshold	Similarity threshold.
`--test-size`	Test size	Set the number of verse alignments to test. If test size is greater than the total number of verses, the verses tested will be selected randomly.
`--books [book [book ...]]`	Books to score	Specifies one or more books to be scored. When this option is used, the test tool will generate predictions for the entire target language test set, but provide a score only for the specified book(s). Book must be specified using the 3 character abbreviations from the USFM 3.0 standard (e.g., "GEN" for Genesis)

visualize_similarity

Visualize similarity of languages/projects.

usage: python -m silnlp.alignment.visualize_similarity [-h] --corpus PATH --metadata PATH
--scores PATH [--image PATH] [--country COUNTRY] [--family FAMILY]
[--aligner ALIGNER] [--recompute] [--graph-type TYPE]
[--data-type TYPE] [--threshold THRESHOLD]

Arguments:

Argument	Purpose	Description
`--corpus PATH`	The corpus folder	The corpus folder.
`--metadata PATH`	The metadata file	The metadata file.
`--scores PATH`	The similarity scores file	The similarity scores file.
`--image PATH`	The image file	The image file.
`--country COUNTRY`	The country to include	The country to include.
`--family FAMILY`	The language family to include	The language family to include.
`--aligner ALIGNER`	The alignment model	The alignment model.
`--recompute`	Recompute similarity scores	Recompute similarity scores.
`--graph-type`	Type of graph	Type of graph. Can be "tree" or "network".
`--data-type`	Type of data	Type of data. Can be "language" or "project".
`--threshold THRESHOLD`	Similarity threshold	Similarity threshold.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alignment: Usage

Aligning texts

align

bulk_align

test

Miscellaneous commands

preprocess

generate_clear_models

test_size

visualize_similarity

Clone this wiki locally