Command-line Chinese-English dictionary development kit for Pinyinbase, CC-CEDICT and more.
pbj
- Pinyinbase Joinerpbjson
- Pinyinbase JSON document generator- Making a custom CEDICT dictionary using the
xargs
command
NOTE:
pbj
is baking cakes from ingredients.pbjson
is putting baked cakes into JSON packages.
- Requires NodeJS.
From within your home (~
) directory, install Pinyinbase:
$ npm i -g git+https://github.com/pffy/pbjkit.git
Alternatively, you can can install Pinyinbase like this:
$ npm i -g github:pffy/pbjkit
After installation, make sure everything is peachy keen:
$ pbj
You are all set!
On Linus for Chromebooks or Virtual Machines (VMs) you may have restricted user accounts. This may prevent the global installation of npm in the default locations.
This issue may be called an EACCES error or something similar.
To fix this issue, first make a new local npm global folder in your home directory.
$ mkdir ~/.npm-global
Then, tell npm to use that new global folder:
$ npm config set prefix '~/.npm-global'
If you have a ~/.profile
file in your home directory, then add this line:
export PATH=~/.npm-global/bin:$PATH
If there is no ~/profile
file, you can simply create a new file:
$ echo export PATH=~/.npm-global/bin:$PATH > ~/.profile
Then, on the command line, enter this:
$ source ~/.profile
Now, try to reinstall:
$ npm i -g github:pffy/pbjkit
Then, check your install:
$ pbj
NOTE: Details about this and similar fixes for the EACCES error issue on Linux for Chromebook.
1. Download Pinyinbase or any other distributed CEDICT-format dictionary into your working directory:
$ git clone https://github.com/pffy/pinyinbase
$ pbj -i ./pinyinbase/vocab/*.txt
The output file is pb.txt (a CEDICT-format dictionary file).
- Command-line Pinyinbase Joiner utility
- Combine multiple Pinyinase glossaries into a single data source file
Displays the version, then exits:
$ pbj --version
Displays the help information, then exits:
$ pbj --help
Input Pinyinbase glossary files to be validated, processed, and combined.
$ pbj -i file1 file2 file3
pbj
performs validation:
- Each file must be a valid Pinyinbase glossary.
- Each line of a valid Pinyinbase glossary must be a CEDICT-formatted dictionary entry.
All three files will be combined into a single CEDICT-formatted file called pb.txt.
Adds epoch date to output filename.
$ pbj -d -i file1 file2
Output file is similar to pb-1605898061.txt.
Quiet processing. 🤐
$ pbj -q -i ./folder/*.txt
NOTE: For examples of glossary files, please visit the Pinyinbase repo.
-
The filename must exactly match the first comment line to be a valid glossary file.
- Otherwise, the file will be ignored by
pbj
.
- Otherwise, the file will be ignored by
-
EXAMPLE:
file1.txt
- The following is a blank AND perfectly valid glossary file.
- Notice there are no entries in this file.
# file1.txt
- EXAMPLE:
file2.txt
- The following glossary is invalid. The filenames and first line comment do not exactly match.
# file3.txt
- EXAMPLE:
file8.txt
- The following glossary is valid. The filenames and first line comment do exactly match.
- There are no valid CEDICT-formatted entires in this file.
# file8.txt
some text here whatever
more text here whatever
this is not not valid text
- Command-line Pinyinbase JSON document generator
- Convert a compiled Chinese-English dictionary into a JSON file
- Adds metapinyin data for better entry storage and retrieval
Shows version, then exits.
$ pbjson --version
Shows help, then exits.
$ pbjson --help
Input CEDICT-formatted dictionary files to be validated, processed, and combined.
$ pbjson -i file1 file2 file3
pbjson
performs validation:
- Each line of the input dictionary file must be a CEDICT-formatted dictionary entry; otherwise, the line is ignored.
All three files will be combined into a single JSON document called pb.json.
Adds epoch date to output file.
$ pbjson -d -i file.*txt
$ pbjson -d -i cedict1 cedict2 ./folder/*.txt
Output file is similar to pb-1605898061.json.
Quiet processing. 🤐
$ pbjson -q -i ./folder/*.txt
- In a working directory containing Pinyinbase glossaries.
$ cd ./pinyinbase/vocab
- With a custom list of glossaries in a text file called
astro1.txt
:
vocab-cmn-astronomy-planets-earth-moons.txt
vocab-cmn-astronomy-planets-jupiter-moons.txt
vocab-cmn-astronomy-planets-mars-moons.txt
vocab-cmn-astronomy-planets.txt
vocab-cmn-astronomy-thesun.txt
NOTE: The text file
astro1.txt
should only contains glossary file names -- not CEDICT-formatted entries.
For example, in the pinyinbase/vocab
folder containing a new file called astro1.txt
:
parent/
- pinyinbase/
- vocab/ <-- YOU ARE HERE
- astro1.txt
- ...
$ xargs pbj -i < astro1.txt
Only the glossaries in the file astro1.txt
are processed. The glossaries are combined into a single file called pb.txt.
This command pattern also works for pbjson
.
PBJ vs PBJSON:
pbj
validates Pinyinbase glossaries and CEDICT syntax and combines files.pbjson
only validates CEDICT syntax and converts CEDICT files into JSON documents.
For example, in the pinyinbase/vocab
folder, where the file astro2.txt
is in the parent folder:
parent/
- astro2.txt
- pinyinbase/
- vocab/ <-- YOU ARE HERE
- ...
$ xargs pbjson -i < ../../astro2.txt
Only the dictionary files in the file astro2.txt
are processed. The dictionary files are combined into a single file called pb.json.
As alternative to xargs
, you can use cat
.
In this example, the file astro2.txt
is the parent directory along with pinyinbase
folder:
parent/ <-- YOU ARE HERE
- astro2.txt
- pinyinbase/
- vocab/
- ...
We can extract the file list for pbjson
using cat
as follows:
$ pbjson -i ./pinyinbase/vocab/$(cat astro2.txt)
As you can see, you can build several custom CEDICT dictionaries very quickly by simply using dictionary file lists as recipes.
- MIT License: https://opensource.org/licenses/MIT