Skip to content

Commit

Permalink
[kosha] Substantially increase size and coverage
Browse files Browse the repository at this point in the history
This commit uses data from the Upasargartha-candrika to create a large
number of prefixed tinantas and krdantas.

In addition, I've fixed a few minor bugs and added more documentation to
these crates.
  • Loading branch information
akprasad committed Nov 6, 2024
1 parent fb374ef commit 2b5a980
Show file tree
Hide file tree
Showing 46 changed files with 1,342 additions and 764 deletions.
13 changes: 8 additions & 5 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 3 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,9 @@ create_sandhi_rules:
# Creates a koshas and write it to disk.
create_kosha:
RUST_LOG=info cargo run --release --bin create_kosha -- \
--input-dir data/raw/lex --output-dir data/build/vidyut-latest
--input-dir data/raw/lex \
--dhatupatha vidyut-prakriya/data/dhatupatha.tsv \
--output-dir data/build/vidyut-latest

# Trains a padaccheda model and saves important features to disk.
# NOTE: when training, exclude the file paths used in `make eval`.
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,9 @@ In Rust, components of this kind are called *crates*.

### [`vidyut-chandas`][vidyut-chandas]

`vidyut-chandas` is an experimental classifier for Sanskrit meters.
`vidyut-chandas` identifies the meter in some piece of Sanskrit text. This
crate is experimental, and while it is useful for common and basic use cases,
it is not a state-of-the-art solution.

For details, see the [vidyut-chandas README][vidyut-chandas].

Expand Down
2 changes: 2 additions & 0 deletions scripts/create_all_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,8 @@ else
echo "Training data does not exist -- fetching."
mkdir -p "data/raw/dcs"
git clone --depth 1 https://github.com/OliverHellwig/sanskrit.git dcs-data
# Use a fixed commit to avoid breakages from later changes.
pushd dcs-data && git reset --hard 1bc281e && popd
mv dcs-data/dcs/data/conllu data/raw/dcs/conllu
rm -Rf dcs-data
fi
Expand Down
Loading

0 comments on commit 2b5a980

Please sign in to comment.