Installation

pip install pnu-prep

PREP(1)

NAME

prep - prepare text for statistical processing

SYNOPSIS

DESCRIPTION

prep reads each file in sequence and writes it on the standard output, one lowercase `word' per line. A word is a string of alphabetic characters and embedded apostrophes, delimited by space or punctuation. Hyphenated words are broken apart; hyphens at the end of lines are removed and the hyphenated parts are joined. Strings of digits are discarded.

When no files are given as arguments, standard input is read (until a Control-D (Unix) or Control-Z (Windows) character is sent).

The following option letters may appear in any order:

OPTIONS

Options	Use
-a\|--ascii	Try to convert Unicode letters to ASCII.
-d\|--number	Print the word number (in the input stream) with each word.
-h\|--hyphen	Don't break words on hyphens.
-i\|--ignore	Take the next file as an `ignore' file. These words will not appear in the output. (They will be counted, for purposes of the -d numbering.)
-o\|--only	Take the next file as an `only' file. Only these words will appear in the output. (All other words will also be counted for the -d numbering.)
-p\|--ponctuate	Include punctuation marks (single nonalphanumeric characters from the "!(),.:;?" set) as separate output lines. The punctuation marks are not counted for the -d numbering.
--debug	Enable debug mode
--help\|-?	Print usage and a short help message and exit
--version	Print version and exit
--	Options processing terminator

FILES

Ignore and only files contain words, one per line.

The file /usr/local/etc/eign was originally provided in /usr/lib as an example or default ignore file.

EXIT STATUS

The prep utility exits 0 on success, and >0 if an error occurs.

STANDARDS

The prep utility is a deprecated UNIX 7th edition command (it also appeared in Unix V7M, Ultrix 3.1, 2.9BSD and 2.11BSD).

Our implementation tries to follow the PEP 8 style guide for Python code.

PORTABILITY

Tested OK under Windows.

HISTORY

This utility was made for the PNU project, out of historical curiosity and for fun, though it doesn't seem very useful...

Some features were added compared to the original command:

Unicode letters are now supported by default (the original command predated Unicode by 12 years).
It is now possible to use the -i and -o options at the same time.
The -h option was added to avoid breaking word on hyphens, which makes sense in French.
The -a option was added to try to convert Unicode accented letters to their ASCII equivalent.

Several bugs from the original prep command were corrected:

A display bug on hyphenated words inside a line when used with the combined -d and -p options.
A bug with lines starting by an apostrophe.
A bug with the character following an apostrophe.

LICENSE

This utility is available under the 3-clause BSD license.

AUTHORS

Hubert Tournier

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
etc		etc
man		man
src/prep		src/prep
tests		tests
License		License
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Installation

PREP(1)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

FILES

EXIT STATUS

SEE ALSO

STANDARDS

PORTABILITY

HISTORY

LICENSE

AUTHORS

About

Releases 4

Languages

License

HubTou/prep

Folders and files

Latest commit

History

Repository files navigation

Installation

PREP(1)

NAME

SYNOPSIS

DESCRIPTION

OPTIONS

FILES

EXIT STATUS

SEE ALSO

STANDARDS

PORTABILITY

HISTORY

LICENSE

AUTHORS

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Languages