English proficiency prediction NLP

Description :

IAS module project at ENIB (SP9 - 2021)

Basically the idea of the project is to predict the someone's English proficiency based on a text input.

We used the The NICT JLE Corpus available here : https://alaginrc.nict.go.jp/nict_jle/index_E.html

The source of the corpus data is the transcripts of the audio-recorded speech samples of 1,281 participants (1.2 million words, 300 hours in total) of English oral proficiency interview test. Each participant got a SST (Standard Speaking Test) score between 1 (low proficiency) and 9 (high proficiency) based on this test.

Tasks :

Pre-process the dataset: extract the participant transcript (all <B><B/> tags). Inside participant transcript, you can remove all other tags and extract only English words.
Process the dataset: extract features with the Bag of Word (BoW) technique
Train a classifier to predict the SST score
Compute the accuracy of your system (the number of participant classified correctly) and plot the confusion matrix.
Try to improve your system (for example you can try to use GloVe instead of BoW).

Supervisor :

Olivier Augereau

Authors :

CORREA, Elias

GASSIBE, Franco

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
preProcessing_NPL.ipynb		preProcessing_NPL.ipynb
projet_IAS_partie2.ipynb		projet_IAS_partie2.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

English proficiency prediction NLP

About

Releases

Packages

Languages

Eliascc5/English_proficiency_prediction_NLP

Folders and files

Latest commit

History

Repository files navigation

English proficiency prediction NLP

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages