VISL/PennTreebank to DCG converter

This source code converts a given corpus in the PennTreebank format to the DCG format, being appropriate to run in Prolog.

Adjustments and improvements

The project is still in development and upcoming updates will address the following tasks:

Enable PennTreebank format
Compute probability and frequency count for rules
Reorder the rules for better efficiency and remove loops
Generate the probability for the parse tree
Generate the grammar with argument structure
Add option for rule cut, pruning the rules with a frequency below a given threshold.

💻 Requirements

This project was tested with Python 3.8. To install the dependencies install the requirements:

pip install -r requirements.txt

☕ Using the DCG converter

To use the DCG converter just run the main.py script with the following arguments:

usage: main.py [-h] --file_path FILE_PATH --file_format {VISL,PennTreebank,TigerXML} --output_folder OUTPUT_FOLDER [--graphviz]

optional arguments:
  -h, --help            show this help message and exit
  --file_path FILE_PATH
                        File path in the specified format.
  --file_format {VISL,PennTreebank,TigerXML}
                        File format.
  --output_folder OUTPUT_FOLDER
                        Output folder.
  --graphviz            A boolean switch to render the tree in graphviz

Example of usage:

python main.py --file_path ../dataset/Bosque_CF_8.0.PennTreebank_utf8.ptb --file_format PennTreebank --output_folder ../output

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VISL/PennTreebank to DCG converter

Adjustments and improvements

💻 Requirements

☕ Using the DCG converter

Files

README.md

Latest commit

History

README.md

File metadata and controls

VISL/PennTreebank to DCG converter

Adjustments and improvements

💻 Requirements

☕ Using the DCG converter