Introducing language models and word embeddings 🧑‍💻

This repo contains a Jupyter notebook introducing to language models and word embeddings by training a word2vec model relying on datasets of 100K and 1M sentences from German news articles.

Prerequisites

Python and JupyterLab installed on your machine.

Instructions

Run jupyterlab in your terminal.
Clone this repo.
Download this folder from Wortschatz Leipzig, unpack it and save the file "deu_news_2022_1M-sentences.txt" in the "data" folder. It is not provided in this repo as it exceeds 100 MB.
Navigate to this repo using the file manager inside JupyterLab.
Open "Notebook.ipynb" and enjoy!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Introducing language models and word embeddings 🧑‍💻

Prerequisites

Instructions

Files

README.md

Latest commit

History

README.md

File metadata and controls

Introducing language models and word embeddings 🧑‍💻

Prerequisites

Instructions