From 0721857c265de4b2bf9506ff6ffa910ba5d08070 Mon Sep 17 00:00:00 2001
From: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Date: Sun, 6 Feb 2022 21:14:54 +0100
Subject: [PATCH] center tables
---
backup.md | 98 +++++++++++++++++++++++++++++++++++++------------------
1 file changed, 66 insertions(+), 32 deletions(-)
diff --git a/backup.md b/backup.md
index a1af188..6d37b5b 100644
--- a/backup.md
+++ b/backup.md
@@ -27,17 +27,37 @@ The encoder-decoder architecture was proposed in 2014, when several papers ([Cho
-| |
-|:--:|
-| Figure 1: RNN-based encoder-decoder architecture [[1]](https://arxiv.org/abs/1409.3215) [[2]](https://arxiv.org/abs/1409.0473)
Left: without attention mechanism \| Right: with attention mechism|
+
+
+
+ |
+
+
+
+
+Figure 1: RNN-based encoder-decoder architecture [1] [2]
Left: without attention mechanism | Right: with attention mechism |
+
+
+
+
In 2017, Vaswani et al. published a paper [Attention is all you need](https://arxiv.org/abs/1706.03762) which introduced a new model architecture called `Transformer`. It still consists of an encoder and a decoder, however instead of using RNN/LSTM for the components, they use multi-head self-attention as the building blocks. This innovate attention mechanism becomes the fundamental of the breakthroughs in NLP since then, beyond the NMT tasks.
-| |
-|:--:|
-| Figure 2: Transformer encoder-decoder architecture [[3]](https://arxiv.org/abs/1706.03762)|
+
+
+
+ |
+
+
+
+
+Figure 2: Transformer encoder-decoder architecture [3] |
+
+
+
+
Combined with the idea of pretraining and transfer learning (for example, from [ULMFiT](https://arxiv.org/abs/1801.06146)), a golden age of NLP started in 2018-2019 with the release of OpenAI's [GPT](https://openai.com/blog/language-unsupervised/) and [GPT-2](https://openai.com/blog/better-language-models/) models and Google's [BERT](https://arxiv.org/abs/1810.04805) model. It's now common to call them Transformer models, however they are not encoder-decoder architecture as the original Transformer: BERT is encoder-only (originally for text classification) and GPT models are decoder-only (for text auto-completion).
@@ -45,10 +65,20 @@ The above models and their variations focus on pretraining either the encoder or
-| |
-|:--:|
-| Figure 3: The 3 pretraining paradigms for Transformer models [[4]](https://arxiv.org/abs/1810.04805) [[5]](https://openai.com/blog/language-unsupervised/) [[6]](https://arxiv.org/abs/1910.13461)|
-
+
+
+
+ |
+
+
+
+
+Figure 3: The 3 pretraining paradigms for Transformer models [4] [5] [6] |
+
+
+
+
+
In 2020, the paper [Leveraging Pre-trained Checkpoints for Sequence Generation Tasks](https://arxiv.org/abs/1907.12461) studied the effectiveness of initializing sequence-to-sequence models with pretrained encoder/decoder checkpoints for sequence generation tasks. It obtained new state-of-the-art results on machine translation, text summarization, etc.
Following this idea, 🤗 [transformers](https://huggingface.co/docs/transformers/index) implements [EncoderDecoderModel](https://huggingface.co/docs/transformers/model_doc/encoderdecoder) that allows users to easily combine almost any 🤗 pretrained encoder (Bert, Robert, etc.) with a 🤗 pretrained decoder (GPT models, decoder from Bart or T5, etc.) to perform fine-tuning on downstream tasks. Instantiate a [EncoderDecoderModel](https://huggingface.co/docs/transformers/model_doc/encoderdecoder) is super easy, and finetune it on a sequence-to-sequence task usually obtains descent results in just a few hours on Google Cloud TPU.
@@ -151,9 +181,19 @@ The obtained sequence of vectors plays the same role as token embeddings in [BER
-| |
-|:--:|
-| Figure 4: BERT v.s. ViT |
+
+
+
+ |
+
+
+
+
+Figure 4: BERT v.s. ViT |
+
+
+
+
2 This is just the concept. The actual implementation uses convolution layers to perform this computation efficiently.
@@ -369,9 +409,19 @@ We have learned the encoder-decoder architecture in NLP and the vision Transform
-| |
-|:--:|
-| Figure 5: Vision-Encoder-Decoder architecture |
+
+
+
+ |
+
+
+
+
+Figure 5: Vision-Encoder-Decoder architecture |
+
+
+
+
### **Vision-Encoder-Decoder in 🤗 transformers**
@@ -567,14 +617,6 @@ display(df[:3].style.set_table_styles([{'selector': 'td', 'props': props}, {'sel
-
@@ -659,14 +701,6 @@ display(df[3:].style.set_table_styles([{'selector': 'td', 'props': props}, {'sel
-