Skip to content

Commit

Permalink
lecture4
Browse files Browse the repository at this point in the history
  • Loading branch information
tsamardzic committed Oct 12, 2023
1 parent e2a33f8 commit f1ed098
Show file tree
Hide file tree
Showing 4 changed files with 64 additions and 0 deletions.
64 changes: 64 additions & 0 deletions 4.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
## 4 Representing the meaning of words with word2vec

> Explanations, visualisations and formulas:
> - Jurafsky-Martin [6](https://web.stanford.edu/~jurafsky/slp3/6.pdf)
> - Lena Voita: [Word embeddings](https://lena-voita.github.io/nlp_course/word_embeddings.html)
> - Jay Alammar: [The Illustrated Word2vec](http://jalammar.github.io/illustrated-word2vec/)
> - Xin Rong: [word2vec Parameter Learning Explained](https://arxiv.org/pdf/1411.2738.pdf)


 

### Word identity: one-hot encoding

<img src="figures/w2v_one-hot.png" alt="splits" width="300"/>


In one-hod encodings no meaning is represented: all words are equally distant from each other, each word is one dimension in a high-dimensional space, each orthogonal to each


&nbsp;

### Word embedding: a continuous representation

<img src="figures/w2v_embeddings.png" alt="splits" width="300"/>


When we embed a word in a space, each word is a data point in the space, the number of dimensions can be lower as we don't need one dimension per word. When embedded, words with similar meanings will be positioned close to each other in the embedding space.


&nbsp;

### Target vs. context words

<img src="figures/w2v_ctc.png" alt="splits" width="250"/>

We regard all words that appear in a text from two sides. On one side, each word carries its own distinct meaning, which we want to represent. When we look at the words from this side, we call them *target*. On the other side, each word is also a feature or a component of the meaning of other words. When a word describes other words, we call it *context*. What role a word has depends on where we are currently looking.

**Important**: Each word has a target and a context representation.

&nbsp;

### word2vec training objectives

- cbow: *p(t|c)*
- skip-gram: *p(c|t)*

word2vec is a neural language model.

**Important**: The set of labels is huge: each word is a label!

&nbsp;

### word2vec trick 1

*p(t|c)* and *p(c|t)* are estimated with a neural network. For a neural network to predict a word, it has to have representations of both the input and the output. These representations are the meanings of words!


&nbsp;

### word2vec trick 2

In the skip-gram version, we replace the objective *p(c|t)* with *p(yes|t.c)*. So, instead of predicting the word, we just decide whether the target-context relation holds between two words. We go from thousands of labels to only 2. But for this to work, we need examples for both cases *yes* and *no*. We get *yeses* from the training text and we randomly sample *nos*.

Binary file added figures/w2v_ctc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figures/w2v_embeddings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figures/w2v_one-hot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit f1ed098

Please sign in to comment.