Repository for TCS Enterprise Intelligent Automation – ARITIFICIAL INTELLIGENCE Competition.
datasets are taken from Kaggle Quora competition.
We extract doc2vec features on sentences using the gensim library and a pretrained doc2vec (DBOW) model trained on the English Wikipedia dataset. The pretrained model is available here.
The code for feature extraction is here: feat.py.
To use it:
python feat.py
The code is here: train.py.
To log per iteration loss use:
bash train_with_logging.sh
Training logs are available in log.info.
The code is here: test.py.
To use it:
python test.py
The output (id_,prob_) is saved in the text file test_probs.txt.
- We eliminate all punctuation marks from the sentences in a preprocessing step
- We use input flipping dataset augmentation scheme: the network is robost to the order in which the two sentences are presented to it
- At test time we compute probabilities corresponding to the two input flips and these are averaged
- The network definition is available in helpers/network.py