Semantic textual similarity computes the equivalence of two sentences on the basis of its conceptual similarity. It is widely used in natural languages processing tasks such as essay scoring, machine translation, text classification, information extraction, and question answering. This project focuses on one of the applications of semantic textual similarity known as automatic short answer grading (ASAG). It assigns a grade to a response provided by a student by comparing with one or more model answers. In particular, we selected one of the state-of-the-art short answer grading approaches that use Stanford CoreNLP library, and we used the same approach with the help of two open source libraries; Natural Language ToolKit (NLTK) and Spacy. For evaluation, Texas dataset and an in-house benchmarking ASAG dataset based on Mathematics for Robotics and Control (MRC) course were considered. Performances among all three libraries were evaluated using Pearson correlation coefficient, root mean square error (RMSE), and the runtime. Results based on Texas dataset showed that Stanford CoreNLP library has better Pearson correlation coefficient(0.66) and lowest RMSE(0.85) than NLTK and Spacy libraries. While using MRC dataset, all 3 libraries showed the comparative results on evaluated metrics.
This repository contains:
Exercises related to textual similarity using NLTK and SPACY libraries that can help for short answer grading
Comparison of spell corrector approaches using:
- Spell corrector using Ngrams,Jaccard coefficient and Minimum edit distance
- Spell corrector using Minimum Edit Distance(MED)
Create jupyter notebooks for each student from Mohler data set for short questions and answers
Create instructor version of assignments using nbgrader
Create student version of assignments using nbgrader
Wiki contains theoretically concepts: https://github.com/rameshjesswani/Semantic-Textual-Similarity/wiki
Word Aligner using NLTK and Spacy libraries
ASAG based Sultan et al. (2016) approach using NLTK And Spacy libraries
It can used as individual module. For more usage, check here: Word Aligner using NLTK and Spacy
Install nltk library(procedure given below)
Setup Stanford Parser, NER, PosTagger(link to setup in nltk given below)
Details about Asag can be found here: ASAG
Install NLTK library
sudo pip install -U nltk
Install packages of NLTK
import nltk
nltk.download()
Install SPACY(code works with version 2.0.12) library
pip install -U spacy
After spacy installation you need to download a Language model
python -m spacy download en
pip install nbgrader
if you are using Anaconda:
conda install jupyter
conda install -c conda-forge nbgrader
To install nbgrader extensions:
jupyter nbextension install --user-prefix --py nbgrader --overwrite
jupyter nbextension enable --user-prefix --py nbgrader
jupyter serverextension enable --user-prefix --py nbgrader
For more docs about nbgrader:
http://nbgrader.readthedocs.io/en/stable/user_guide/installation.html
To use Stanford Parser, NER, PosTagger in NLTK check files:
https://github.com/rameshjesswani/Semantic-Textual-Similarity/blob/master/monolingualWordAligner/stanfordParser_setup.txt
https://github.com/rameshjesswani/Semantic-Textual-Similarity/blob/master/monolingualWordAligner/stanfordNERTagger_setup.txt
https://github.com/rameshjesswani/Semantic-Textual-Similarity/blob/master/monolingualWordAligner/stanfordPOSTagger_setup.txt
@unpublished{[RnD]Kumar,
Authors = {Ramesh Kumar},
Month = {January},
Note = {WS17
H-BRS - Evaluation of Semantic Textual Similarity Approaches for Automatic Short Answer Grading
Ploeger, Nair supervising},
Title = {Evaluation of Semantic Textual Similarity Approaches for Automatic Short Answer Grading},
Year = {2017/18}}