Skip to content

Commit

Permalink
math fix
Browse files Browse the repository at this point in the history
  • Loading branch information
vaibhavad committed Apr 9, 2024
1 parent cca8e0d commit c92f47f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion docs/_pages/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ class BiLlamaForMNTP(LlamaForCausalLM):

We can now use this model for training with masked next token prediction.

In our work, predicting a masked token at position $`i`$, we compute the loss based on the logits obtained from the token representation at the previous position $`i-1`$. This shifting is automatically handled by the forward function of `LlamaForCausalLM` as similar shifting is required in the next token prediction task.
In our work, predicting a masked token at position `i`, we compute the loss based on the logits obtained from the token representation at the previous position `i-1`. This shifting is automatically handled by the forward function of `LlamaForCausalLM` as similar shifting is required in the next token prediction task.

For training, we adapt the huggingface example script for masked language modeling - [examples/pytorch/language-modeling/run_mlm.py](https://github.com/huggingface/transformers/blob/v4.39.3/examples/pytorch/language-modeling/run_mlm.py). The only change required is to define a mask token, as decoder-only models do not have a mask token by default. We can use the padding token as the mask token. In our work we used underscore `_` as the mask token.

Expand Down

0 comments on commit c92f47f

Please sign in to comment.