Outcomes and methods

Datasets Used :

Enron emails - https://www.cs.cmu.edu/~enron
OntoNotes 5.0

Tools :

Evaluation metrics used:

Accuracy score
ROC
F1 score
Confusion matrix

Progress so far

I have annotated the OntoNotes dataset using spacy NER, then treating the pre-annotations in the dataset as labels for performance and accuracy evaluation. Since we have to focus on only organization labelling, I divided the dataset into three categories with -1, 0, 1 as labels, where -1 means spacey didn’t give any entity name to the sentence,0 where the entity label was not ORG and 1 means where entity was ORG.

Performance results on OntoNotes dataset

Accuracy score : 0.9676
Evaluating the OntoNotes dataset I found out F1 score for 3 categories of data as follows : [0. , 0.98101418, 0.91691935]

Score of -1 is 0 since pre-annotated data had no -1 labels.

Confusion matrix evaluated as follows:

-1	0	1
0	0	0
25	4082	98
4	35	756

So in total correct classification of 'ORG' labels are 4082 .

ROC :

fpr : [0. , 0.02330559, 0.9940547 , 1. ]

tpr : [0. , 0.9509434 , 0.99496855, 1. ]

Currently simplified enron data in readable dataframe

To do

Further simplification of enron dataset is needed to properly evaluate spacy's performance on it

Conclusions so far

Also by observing the 'ORG' labels which were not classified by spacy the issue can be solved by removing punctuations from sentences or specifing some peculiar organisation name(ex.The Truth Squad).
The documentaion of organization can be improved by adding some research institutes and scientific publications such as Astrobiology Journal.
Some of the errors made were due to missing context in sentences.
The crowdsourced can be used ,but I found a lot of redundant labeling for example some of human labels should be empty as in agreeing with machine label but are give exact same values as the machine labeled dictionary .Which hampered processing of dataset for evaluation.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Enron_sentences.json		Enron_sentences.json
OntoNotes.json		OntoNotes.json
README.md		README.md
eval.ipynb		eval.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Outcomes and methods

Datasets Used :

Tools :

Evaluation metrics used:

Progress so far

To do

Conclusions so far

About

Releases

Packages

Languages

dixitishan811/spaCy_eval

Folders and files

Latest commit

History

Repository files navigation

Outcomes and methods

Datasets Used :

Tools :

Evaluation metrics used:

Progress so far

To do

Conclusions so far

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages