Word Embedding Reconstruction with Subword Embeddings for Language Modelling

This repository is the main hub of work my MSc Dissertation "Word Embedding Reconstruction with Subword Embeddings for Language Modelling", available here

Repository Contents

This repository is a mix of my own work and forks from other repositories.

charCNN is a reimplementation of Learning to Generate Word Representations using Subword Information, which I use as a reconstruction method.

cr_clean is a fork of Compact Reconstruction, which I use as a reconstruction method.

pyTorch_LM is a fork of a word level LM pyTorch tutorial, which I include the ability to use pre-trained embeddings and then use to build a language model and evaluate perplexity of various embeddings.

distance_measuring has inital tests of the quality of the reconstructed embeddings compared to reference embeddings, such as cosine similarity and P@n

results/embedding_comparison is where I put my results!

utils has a few repo-wide scripts for making baseline embeddings, limiting the character set in embeddings, etc.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
charCNN		charCNN
cr_clean		cr_clean
distance_measuring		distance_measuring
plotting		plotting
pyTorch_LM		pyTorch_LM
results/embedding_comparison		results/embedding_comparison
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
dissertation_Taylor_2023.pdf		dissertation_Taylor_2023.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word Embedding Reconstruction with Subword Embeddings for Language Modelling

Repository Contents

About

Languages

Jamesetay1/word-embedding-reconstruction

Folders and files

Latest commit

History

Repository files navigation

Word Embedding Reconstruction with Subword Embeddings for Language Modelling

Repository Contents

About

Topics

Resources

Stars

Watchers

Forks

Languages