Skip to content

Stanford CoreNLP

André Pires edited this page Jun 18, 2017 · 6 revisions

Main steps to run Stanford CoreNLP with HAREM dataset

  1. First download Stanford CoreNLP tool jar from its webpage.
  2. Navigate to the path of the stanford-corenlp.jar.
  3. Run command java -cp stanford-corenlp.jar edu.stanford.nlp.ie.crf.CRFClassifier -prop <file.prop>
    1. This command trains and generates a CRF model according to file.prop.
    2. file.prop specifies the training file and the features to be used in the training process.
  4. Run command java -cp stanford-corenlp.jar edu.stanford.nlp.ie.crf.CRFClassifier -loadClassifier <ner-model.ser.gz> -testFile <file_test.txt>
    1. This command classifies the file file_test.txt using the CRF model generated in step 3.
    2. It also presents the evaluation regarding the Precision, Recall and F1 for the multiple classes.

Check folder for more information.

Stanford dataset format

File with each line having a token and the respective entity type, being O the tokens which are not an entity. Example:

"I complained to Microsoft about Bill Gates."

I O

complained O

to O

Microsoft ORGANIZATION

about O

Bill PERSON

Gates PERSON

. O

Convert HAREM dataset in Stanford NER input

New version

In order to evaluate using conlleval script, the same tokenization has to be present either in the golden data and the output of Stanford. So, for that to happen, I used the StanfordCoreNLP tokenizer (edu.stanford.nlp.process.PTBTokenizer) in both the training and testing (golden and output) dataset. Also, I converted the tokenized text into conll format using this script and added IOB tags using this script.

Previous version

In order to be able to run Stanford NER with the HAREM dataset as input, the dataset has to be converted in the correct format. For conversion, used corpus-processor. Download

Steps:

  1. Install ruby
  2. Install corpus-processor ruby-gem
  3. Change categories to be recognised (example)
  4. Run command: corpus-processor process <input-file> <output-file> --categories=<file.yml>

Check folder for more information.

Average results

Check all the results here.

Results after 4 repeats:

Level Precision Recall F-measure
Categories 58.84% 53.60% 56.10%
Types - - -
Subtypes - - -
Filtered 69.97% 54.23% 61.10%

Note: Since types and subtypes were too computationally demanding to run, a different prop file was used in order to decrease the number of features and thus reduce the number of computational variables. However, since different features were used, it wouldn't be comparable to the other tools, so it is not displayed here.

Hyperparameter study

For this tool, I decided to check the influence of the following hyperparameters: tolerance, epsilon, MaxNGramLeng. The results are the following:

Tolerance (default: 1e-4)

Value Categories Filtered
1e-5 54.07% 58.94%
5e-5 54.02% 59.00%
1e-4 54.15% 58.84%
5e-4 54.02% 58.72%
1e-3 54.31% 58.86%
5e-3 54.12% 58.81%

Stanford CoreNLP Tolerance values

Epsilon (default: 0.01)

Value Categories Filtered
0.005 54.15% 58.84%
0.01 54.15% 58.84%
0.015 54.15% 58.84%
0.02 54.15% 58.84%

MaxNGramLeng (default: 6)

Value Categories Filtered
4 53.47% 58.31%
5 53.77% 58.66%
6 54.15% 58.84%
7 54.37% 58.97%

Results for SIGARRA News Corpus

Repeated holdout

Tolerance Precision Recall F-measure
1e-4 90.09% 83.41% 86.62%
1e-3 90.26% 83.31% 86.64%

Repeated 10-fold cross validation

Tolerance Precision Recall F-measure
1e-4 89.80% 84.10% 86.86%
1e-3 89.81% 83.95% 86.78%

Resources

Get the generated models in the Resources page.