Skip to content

A lightweight command-line frontend to OpusCleaner

License

Notifications You must be signed in to change notification settings

hplt-project/clianer

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CLI-a-ner

image

Installation

First, have the OpusCleaner installed on your system.

Then, clone this repository and install the additional requirements (at this point it's only urwid beyond what you already need to install to get a working install of OpusCleaner)

Usage

Set up the DATA_PATH (and perhaps the SAMPLE_SIZE) environment variables (these are used by OpusCleaner as usual). Then, run the app with ./main.py.

For example:

export DATA_PATH='/home/helcl/hplt/translation-models/en-cs/*.*.gz'
export SAMPLE_SIZE=100
cd path/to/clianer/
./main.py

Controls

Most of the controls are listed in the bottom bar of the app frame. However, there are some other controls depending the current application focus. Move focus between filter view and dataset view using left and right arrow.

Common controls

These work independently or whether focus is in the filter view or in the dataset view.

  • F2 opens up a new dataset
  • F3 adds a new filter
  • F6 show clean version of the data in the dataset view
  • F7 assign categories to current dataset
  • F10, q exit the application
  • Down, Up move within the focused window (PgUp and PgDn also work)

Filter view controls

  • F4 edit filter
  • F5 import filter pipeline from a different dataset (careful, this overwrites whatever is the current pipeline)
  • F8 remove filter
  • w, s move selected filter up or down
  • d mark filter for diffing
  • r reset diffing

Dataset view controls

  • F4 show diff (select which filter steps to diff in the filter view)
  • F5 show clean version of the data

About

A lightweight command-line frontend to OpusCleaner

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%