-
Notifications
You must be signed in to change notification settings - Fork 0
/
_Classifier-Guidance.qmd
109 lines (84 loc) · 5.36 KB
/
_Classifier-Guidance.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
## Conditional Diffusion Model {#sec-cdm}
@dhariwal2021diffusion
- 一般沒有限定條件的 diffusion model,我們無法去控制想生成的東西。這明顯無法滿足我們的需求。
比如說在 mnist 之中,我們想要去控制生成 0~9 的是哪個數字。
又比如說 celebA 這資料集中,我們想要去生成的大頭像有什麼特徵(比如說是男是女,有無戴眼鏡)。
所以自然而然會有所謂的 Conditional diffusion model。
- 我們先從簡單類別的說起,用 mnist 的數字來解釋。
我們現在有資料集 $X \times Y$ 的分佈
$$
\begin{aligned}
\widehat{q}(x_0,y), \quad x_0 \in \mathbb R^{w\times h}, \quad y\in \mathbb R^n,
\end{aligned}
$$
where
- $X_0$ 是數字圖片;
- $Y$ 是數字 label 在 $\mathbb R^n$ 的 embed
- That is, $\mathbb R^n$ is the embed space of labels.
- For this example, $0,1,\cdots, 9$ are `nn.Embedding(10,n)(torch.arange(10))`. (所以這裡 embed 也是要可學習的).
- Given the label $Y = y.$ We want to generate an image $x_0$ which has the label $y.$
- Assume that we already have $\widehat{q}(y\vert x_0).$
That is, when we have $x_0,$ we know the distribution of labels of $x_0.$
- 如果忽略掉 $Y,$ 只看 $X_0,$ 可視為之前的 unconditional diffusion model
- We define $q$ as before:
- $q(x_0)$: the distribution of $X_0$ (無表達式);
- $q(x_t\vert x_{t-1})= \mathcal{N}(\sqrt{\alpha_t}x_{t-1}, (1-\alpha_t)\mathbf{I}).$
#### Important
---
---
- 同樣地我們令 $\lbrace X_t \rbrace_{t=0}^T$ 為時間 $t$ 時的加噪圖片, 只是加噪方式是如下:
**Define** the forward process of $(X_{0:T},Y)$ by the following:
- $\widehat{q}(x_0):= q(x_0)$ (無表達式) (eq 28).
- So that we have $\widehat{q}(x_0,y)=\underbrace{q(x_0)}_{\text{無表達式}} \cdot \underbrace{\widehat{q}(y\vert x_0)}_{\text{有表達式}}.$
- $\widehat{q}(x_t\vert x_{t-1},y):= q(x_{t}\vert x_{t-1})$ (有表達式) (eq 30);
- $\widehat{q}(x_{1:T}\vert x_0,y):= \prod_{t=1}^T \widehat{q}(x_t\vert x_{t-1},y)$ (eq 31).
- Conditioned on $Y=y,$ the forward process $X_0,X_1,\cdots,X_T$ is a Markov chain with the transition density $q(x_t\vert x_{t-1}).$
Note that
$$
\begin{aligned}
\widehat{q}(x_{0:T},y)
&= \widehat{q}(x_0,y) \cdot \widehat{q}(x_{1:T}\vert x_0,y), \cr
&= \widehat{q}(x_0,y) \cdot \prod_{t=1}^T \widehat{q}(x_t\vert x_{t-1},y).
\end{aligned}
$$
- For this $\widehat{q},$ we have
- $\widehat{q}(x_{t}\vert x_{t-1})=\widehat{q}(x_{t}\vert x_{t-1},y)$ (eq 32~37) $= q(x_t\vert x_{t-1})$ (eq 30);
- $\widehat{q}(x_{1:T}\vert x_0)= q(x_{1:T}\vert x_0)$ (eq 38~44);
- $\widehat{q}(x_t)=q(x_t)$ (eq 45~50);
- $\widehat{q}(x_{t-1}\vert x_{t}) = q(x_{t-1}\vert x_{t})$;
- (上面四點說明 $\widehat{q}$ 在不考慮 label 時, 跟之前的 diffusion model $q$ 分佈完全一樣);
- $\widehat{q}(y\vert x_{t-1},x_{t}) = \widehat{q}(y\vert x_{t-1})$ (eq 51~54);
- $\widehat{q}(x_{t-1}\vert x_{t},y) = \underbrace{q(x_{t-1}\vert x_{t})}_{\approx p_{\theta}(x_{t-1}\vert x_{t})} \cdot \underbrace{\widehat{q}(y\vert x_{t-1})}_{\approx p_{\phi}(y\vert x_{t-1})} \Big/ \underbrace{\widehat{q}(y\vert x_{t})}_{\text{constant}}$ (eq 55~61).
- Note that $p_{\phi}(y\vert x_t)$ 是 $p_{\phi}(y\vert x_t,t)$ 的縮寫.
- Note that $p_{\theta}(x_{t-1}\vert x_{t}), p_{\phi}(y\vert x_{t-1})$ is our model.
- 這裡可以使用已經訓練好的 $p_{\theta}$ (純粹DDPM的) 和 分類器.
- Define $p_{\theta,\phi}(x_{t-1}\vert x_t,y) = \text{constant}\cdot p_{\theta}(x_{t-1}\vert x_{t}) \cdot p_{\phi}(y\vert x_{t-1}).$
So when given the label $y,$ we sample $x_0$ (with label $y$) by the following:
- **For** $t=T,T-1,\cdots,1,$
- Sample $x_t\sim p_{\theta,\phi}(x_{t-1}\vert x_t,y)$
- **EndFor**
We organize the formula $p_{\theta,\phi}(x_{t-1}\vert x_t,y).$
Consider $x_t, y$ as two given constants.
Using a Taylor expansion at $x_{t-1}=\mu$ (some constant), we have
$$
\begin{aligned}
\log p_{\phi}(y\vert x_{t-1})
&\approx \log p_{\phi}(y\vert x_{t-1})\Big\vert_{x_{t-1}=\mu} + (x_{t-1}-\mu) \nabla_{x_{t-1}} \log p_{\phi}(y\vert x_{t-1})\Big\vert_{x_{t-1}=\mu} \cr
% &\approx \log p_{\phi}(y\vert x_{t-1})\Big\vert_{x_{t-1}=\mu} + (x_{t-1}-\mu) \nabla_{x_{t}} \log p_{\phi}(y\vert x_{t})\Big\vert_{x_{t}=\mu} \cr
&= (x_{t-1}-\mu) \cdot
\end{aligned}
$$
#### Sampling (DDPM with classifier)
- **Given:** 訓練好的 $p_{\theta}(x_{t-1}\vert x_t)$ (DDPM) 和 分類器 $p_{\phi}(y\vert x_{t-1}).$
- **Input:** A label $y$ and a gradient scale $s\in (1,\infty)$
- Sample $x_T\sim \mathcal{N}(\mathbf{0},\mathbf{I}).$
- **For** $t=T,T-1,\cdots,1$
- $\mu,\Sigma \leftarrow \mu_{\theta}(x_t), \Sigma_{\theta}(x_t)$
- Sample $x_{t-1}\sim \mathcal{N}\bigl( \mu , \Sigma \bigr)$
- **Comment** Sample from unconditional diffusion model
- $x_{t-1}\leftarrow x_{t-1} + s \Sigma \nabla_{x_t} \log p_{\phi} (y\vert x_t)$
- **Comment** 有點像是對 $p_{\theta,\phi}(x_{t-1}\vert x_t,y)$ 做 gradient ascent, 增加 $y$ 的 log-likelihood. 引導 $x_{t-1}$ 向 label $y$ 的方向前進.
<!-- 可視為增加 $y$ 的影響力. $s=0$ 時就是 DDPM 的結果. -->
<!-- - Sample $x_{t-1}^{\text{uncond}}\sim \mathcal{N}\bigl( \mu + s \Sigma \nabla_{x_t} \log p_{\phi} (y\vert x_t) , \Sigma \bigr)$ -->
- **EndFor**
- **Return** $x_0$