This projects is dedicated to an University Assignment related with Natural Language Processing. The application was designed in python 2.7 with Django 1.9 and is composed by:
- Tokenization and Morfological Analisys module (called morfo) using freeling and Python 2.7. This app takes a raw text and performs the corresponding Morfoligical Analysis.
- The second module (textparser) covers Syntactic Analisys. It deals with the generation of syntactic trees using probabilistic models (Stanford and Bikel) given a raw text.
To getting this projecto working we need to setting up the morfo and textparser modules. The configuration
TextAnalyser
│ README.md
│ requirements.txt
│
└───tkmorfo
applications
|
└───morfo
|
└───textparser
tools
|
└───helpers
00-raw
00
dbparser
parseval
stanford-parserfull-2015-12-09
stanford-postagger-2015-12-09
utils.py
This projects was designed into a container, The first module Tokenization and Morfological Analisys depends on freeling and python 2.7. You can find those package installed on this docker image.
The second module Syntatic Analisys depends of the following libraries
-
Dan Bikel’s Parsing Engine: dbparser.tar.gz
-
Penn Treebank based Trainning set: wsj-02-21.mrg.tar.gz
-
Evaluate the accurancy of the model: parseval.tar.gz
-
Test set: 00-raw.tar.gz
Those files can be found this. Other needed files are:
-
Stanford Statistical Parser: stanford-parser-full-2015-12-09.zip
-
Stanford Postagger: stanford-postagger-2015-12-09.zip
To run the Syntactic Analisys module the container needs to be able to "show" or "create" grafical UIS. This allow the app to create the parse tree images generated with nltk.
apt-get install python-tk
apt-get update
apt-get install xvfb
apt-get install imagemagick
Then you need to run the following command every time that the container starts.
Xvfb :1 -screen 0 1024x768x16 &> xvfb.log &
DISPLAY=:1.0
export DISPLAY
echo deb http://http.debian.net/debian jessie-backports main >> /etc/apt/sources.list
apt-get update && apt-get install openjdk-8-jdk
update-alternatives --config java
[2] Running a GUI Application in a Docker Container
[3] Draw Parse Trees with NLTK
[5] ImagViwer