diff --git a/vision_encoder_decoder_blog.md b/vision_encoder_decoder_blog.md
index facfbbc..ba60a9c 100644
--- a/vision_encoder_decoder_blog.md
+++ b/vision_encoder_decoder_blog.md
@@ -45,11 +45,11 @@ The above models and their variations focus on pretraining either the encoder or
-
+
| |
|:--:|
| Figure 3: The 3 pretraining paradigms for Transformer models [[4]](https://arxiv.org/abs/1810.04805) [[5]](https://openai.com/blog/language-unsupervised/) [[6]](https://arxiv.org/abs/1910.13461)|
-
+
In 2020, the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) studied the effectiveness of initializing sequence-to-sequence models with pretrained encoder/decoder checkpoints for sequence generation tasks. It obtained new state-of-the-art results on machine translation, text summarization, etc.