Skip to content

Latest commit

 

History

History
15 lines (10 loc) · 763 Bytes

File metadata and controls

15 lines (10 loc) · 763 Bytes

Introducing language models and word embeddings 🧑‍💻

This repo contains a Jupyter notebook introducing to language models and word embeddings by training a word2vec model relying on datasets of 100K and 1M sentences from German news articles.

Prerequisites

Python and JupyterLab installed on your machine.

Instructions

  1. Run jupyterlab in your terminal.
  2. Clone this repo.
  3. Download this folder from Wortschatz Leipzig, unpack it and save the file "deu_news_2022_1M-sentences.txt" in the "data" folder. It is not provided in this repo as it exceeds 100 MB.
  4. Navigate to this repo using the file manager inside JupyterLab.
  5. Open "Notebook.ipynb" and enjoy!