This software supports Python version 2.7, and it was tested only on POSIX–compliant operating systems (Linux, Mac OS X, FreeBSD, etc.).
The CitationExtractor
relies on TreeTagger for the PoS tagging of input texts.
There is a handy script to install it.
To run it without having to clone this repo:
wget -O install_treetagger.sh https://raw.githubusercontent.com/mromanello/CitationExtractor/master/install_treetagger.sh
chmod a+x install_treetagger.sh
./install_treetagger.sh
rm install_treetagger.sh
otherwise:
git clone https://github.com/mromanello/CitationExtractor.git
cd CitationExtractor
chmod a+x install_treetagger.sh
./install_treetagger.sh
To install the CitationExtractor
first run:
$ pip install http://www.antlr3.org/download/Python/antlr_python_runtime-3.1.3.tar.gz#egg=antlr_python_runtime-3.1.3
$ pip install https://github.com/mromanello/treetagger-python/archive/master.zip#egg=treetagger-1.0.1
followed by:
$ pip install citation-extractor
NB: the installation of all other dependencies is handled by setup.py
but for some reason
(that I'm still trying to figure out) it does not pick up these two.
To double check that everything was installed correctly, try running the following lines (it should take ~20s):
from citation_extractor.settings import crfsuite
from citation_extractor.pipeline import get_extractor
extractor = get_extractor(crfsuite)
assert extractor is not None
If the code above runs without throwing exceptions means you managed to install the library!
I'm working on it ;-)
For the time being, you can find a concrete example of how to use the library in this notebook.