TCS Enterprise Intelligent Automation – ARITIFICIAL INTELLIGENCE Competition

Repository for TCS Enterprise Intelligent Automation – ARITIFICIAL INTELLIGENCE Competition.

datasets are taken from Kaggle Quora competition.

Dependencies

gensim
torch

Overview

Feature Extraction on Sentences

We extract doc2vec features on sentences using the gensim library and a pretrained doc2vec (DBOW) model trained on the English Wikipedia dataset. The pretrained model is available here.

The code for feature extraction is here: feat.py.

To use it:

python feat.py

Network Training using doc2vec features with Softmax CrossEntropy Loss

The code is here: train.py.

To log per iteration loss use:

bash train_with_logging.sh

Training logs are available in log.info.

Testing the dataset using doc2vec features with trained network

The code is here: test.py.

To use it:

python test.py

The output (id_,prob_) is saved in the text file test_probs.txt.

Low Level Details

We eliminate all punctuation marks from the sentences in a preprocessing step
We use input flipping dataset augmentation scheme: the network is robost to the order in which the two sentences are presented to it
At test time we compute probabilities corresponding to the two input flips and these are averaged
The network definition is available in helpers/network.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TCS Enterprise Intelligent Automation – ARITIFICIAL INTELLIGENCE Competition

Dependencies

Overview

Feature Extraction on Sentences

Network Training using doc2vec features with Softmax CrossEntropy Loss

Testing the dataset using doc2vec features with trained network

Low Level Details

Files

README.md

Latest commit

History

README.md

File metadata and controls

TCS Enterprise Intelligent Automation – ARITIFICIAL INTELLIGENCE Competition

Dependencies

Overview

Feature Extraction on Sentences

Network Training using doc2vec features with Softmax CrossEntropy Loss

Testing the dataset using doc2vec features with trained network

Low Level Details