Observing the semantics of textual data through embedding coupled with t-SNE
We implemented our own t-SNE method, following the Scikit-Learn ways. To create an instance and fit it onto data:
from TSNE_code.TSNE_utils import TSNE
custom_tsne = TSNE(n_components=2, perplexity=15,
adaptive_learning_rate=True, patience=50,
n_iter=1000, early_exaggeration=4)
custom_embedding = custom_tsne.fit_transform(X, verbose=3)
The t-SNE functions are all explained through docstrings. We also leveraged the Sphinx library to create a numpy-like HTML documentation, making it more easily readable.
To view this documentation, navigate to the build/html/index.html
file or click here.
We applied our t-SNE implementation on the MNIST and Olivetti datasets, to verify that our implementation was correct, and to compare it to Scikit-Learn's.
Those comparisons can be found in the folders MNIST and Olivetti, which contain notebooks with our comparative studies.
The training of the LSTM model on the IMDB dataset can be found in the notebook LSTM.ipynb. It also contains the visualization of the word embedding through different t-SNE instances.
The interactive 3D plot of the 3D t-SNE applied to our word embedding can be found in interactive_3d_plot.html.
Visualizing Data using t-SNE, Laurens van der Maaten, Geoffrey Hinton; 9(86):2579−2605, 2008.