NOTE: This project is a work in progress and will not currently run properly.
Extracting data from Japanese texts is a complicated topic. In this project, we do not try any fancy machine learning, but rather try to extract and process data using regular expressions.
I recommend setting up a project in PyCharm and pulling the sources from git. Run 'pip install requirements.txt' to install all required packages. Go to Edit Configurations in PyCharm and specify the below scripts. (PyCharm will add the project root to PYTHONPATH, otherwise you will have to do this manually.)
The executable scripts are per below: OBS: This project is still in early stages and have no executable scripts yet
This should run fine on any environment that supports Python 3.6.
Development tools
- Python 3.6 - Language runtime
- PyCharm - IDE by JetBrains
Key Libraries
- regex - Regex library that extends the standard re-library that is the default library that comes with Python.
See requirements.py for all libraries used.
- Krister S Jakobsson - Implementation and pretty much everything else
This project is licensed under the Boost License - see the license file for details
- Regular-Expression.info - Great page explaining regex in general and differences between platforms and libraries in particular. Link
- regex101.com - Great online tool for playing around with and learning about regex. Link
Disclaimer: I am in no way associated with above mentioned homepages and tools, and take no responsibility for how they use data you input on their platforms. Use them at your own risk.