Re-network the DIT, fix some parameters, and simplify the model networking code #632

chang-wenbin · 2024-07-29T04:10:20Z

Latest optimization: Re-network DIT, simplify the original model dynamic graph into a high-performance model network,

For the core that consumes more time: the transformer part uses paddle.incubate.jit.inference to do dynamic and static conversion, and removes redundant parts in the loop;
We also use some triton operators for artificial operator fusion;
We also use horizontal fusion operators to merge horizontal operators for calculation;
We use the cutlass library for optimization and acceleration;

Currently facebook-DIT takes: 219.936 ms

paddle-bot · 2024-07-29T04:10:24Z

Thanks for your contribution!

CLAassistant · 2024-07-29T04:10:25Z

All committers have signed the CLA.

zhoutianzi666 · 2024-07-29T14:24:39Z

ppdiffusers/ppdiffusers/models/transformer_2d.py

+                    if qkv is not None:
+                        state_dict[qkv_key_b] = paddle.concat([qkv, state_dict.pop(key)], axis=-1)
+
+        for key in list(state_dict.keys()):


518行以下改成

map_from_my_dit = {} for i in range(28): map_from_my_dit[f'tmp_ZKKFacebookDIT.qkv.{i}.weight'] = f'transformer_blocks.{i}.attn1.to_qkv.weight' map_from_my_dit[f'tmp_ZKKFacebookDIT.qkv.{i}.bias'] = f'transformer_blocks.{i}.attn1.to_qkv.bias' map_from_my_dit[f'tmp_ZKKFacebookDIT.out_proj.{i}.weight'] = f'transformer_blocks.{i}.attn1.to_out.0.weight' map_from_my_dit[f'tmp_ZKKFacebookDIT.out_proj.{i}.bias'] = f'transformer_blocks.{i}.attn1.to_out.0.bias' map_from_my_dit[f'tmp_ZKKFacebookDIT.ffn1.{i}.weight'] = f'transformer_blocks.{i}.ff.net.0.proj.weight' map_from_my_dit[f'tmp_ZKKFacebookDIT.ffn1.{i}.bias'] = f'transformer_blocks.{i}.ff.net.0.proj.bias' map_from_my_dit[f'tmp_ZKKFacebookDIT.ffn2.{i}.weight'] = f'transformer_blocks.{i}.ff.net.2.weight' map_from_my_dit[f'tmp_ZKKFacebookDIT.ffn2.{i}.bias'] = f'transformer_blocks.{i}.ff.net.2.bias' map_from_my_dit[f'tmp_ZKKFacebookDIT.fcs0.{i}.weight'] = f'transformer_blocks.{i}.norm1.emb.timestep_embedder.linear_1.weight' map_from_my_dit[f'tmp_ZKKFacebookDIT.fcs0.{i}.bias'] = f'transformer_blocks.{i}.norm1.emb.timestep_embedder.linear_1.bias' map_from_my_dit[f'tmp_ZKKFacebookDIT.fcs1.{i}.weight'] = f'transformer_blocks.{i}.norm1.emb.timestep_embedder.linear_2.weight' map_from_my_dit[f'tmp_ZKKFacebookDIT.fcs1.{i}.bias'] = f'transformer_blocks.{i}.norm1.emb.timestep_embedder.linear_2.bias' map_from_my_dit[f'tmp_ZKKFacebookDIT.fcs2.{i}.weight'] = f'transformer_blocks.{i}.norm1.linear.weight' map_from_my_dit[f'tmp_ZKKFacebookDIT.fcs2.{i}.bias'] = f'transformer_blocks.{i}.norm1.linear.bias' map_from_my_dit[f'tmp_ZKKFacebookDIT.embs.{i}.weight'] = f'transformer_blocks.{i}.norm1.emb.class_embedder.embedding_table.weight' for key in map_from_my_dit.keys(): state_dict[key] = paddle.assign(state_dict[map_from_my_dit[key]])

已更改！
感谢提供修改意见，辛苦！

zhoutianzi666 · 2024-08-07T03:53:21Z

ppdiffusers/ppdiffusers/models/simplified_facebook_dit.py

+    def __init__(self, num_layers: int, dim: int, num_attention_heads: int, attention_head_dim: int):
+        super().__init__()
+        self.num_layers = num_layers
+        self.dtype = "float16"


self.dtype = "float16"改成可配置的。

已更改！
感谢提供修改意见，辛苦！

zhoutianzi666 · 2024-08-07T03:54:24Z

ppdiffusers/ppdiffusers/models/modeling_utils.py

@@ -1130,6 +1134,8 @@ def _find_mismatched_keys(
                    error_msgs.append(
                        f"Error size mismatch, {key_name} receives a shape {loaded_shape}, but the expected shape is {model_shape}."
                    )
+                if os.getenv('Inference_Optimize'):


这里去掉，改在transformer_2d.py里面判断吧

已更改！
感谢提供修改意见，辛苦！

zhoutianzi666 · 2024-08-07T06:38:30Z

ppdiffusers/ppdiffusers/models/transformer_2d.py

@@ -28,11 +28,15 @@
    recompute_use_reentrant,
    use_old_recompute,
 )
+from .simplified_facebook_dit import Simplified_FacebookDIT


Simplified_FacebookDIT改成SimplifiedFacebookDIT

已更改！
感谢提供修改意见，辛苦！

zhoutianzi666 · 2024-08-07T07:09:50Z

ppdiffusers/ppdiffusers/models/transformer_2d.py

@@ -213,6 +219,8 @@ def __init__(
                for d in range(num_layers)
            ]
        )
+        if self.Inference_Optimize:
+           self.simplified_facebookDIT = SimplifiedFacebookDIT(num_layers, inner_dim, num_attention_heads, attention_head_dim)


这里del self.transformer_blocks吧

修改该项会引发相关报错，因为该方法还需要在其他位置调用，暂时不做更改！
感谢提供修改意见，辛苦！

zhoutianzi666 · 2024-08-07T07:49:23Z

ppdiffusers/ppdiffusers/models/transformer_2d.py

@@ -114,6 +118,8 @@ def __init__(
        self.inner_dim = inner_dim = num_attention_heads * attention_head_dim
        self.data_format = data_format

+        self.Inference_Optimize = bool(os.getenv('Inference_Optimize'))


self.Inference_Optimize = os.getenv('Inference_Optimize') == "True"

已更改！
感谢提供修改意见，辛苦！

vivienfanghuagood

应该把代码format一下

vivienfanghuagood · 2024-08-07T09:49:38Z

ppdiffusers/ppdiffusers/models/transformer_2d.py

+            return
+        map_from_my_dit = {}
+        for i in range(28):
+            map_from_my_dit[f'simplified_facebookDIT.q.{i}.weight'] = f'transformer_blocks.{i}.attn1.to_q.weight'


尽量减少代码的拷贝，例如公共的命名前缀应该抽出来，避免后续修改

尽量减少代码的拷贝，例如公共的命名前缀应该抽出来，避免后续修改

已更改，折叠了部分命名代码！
感谢提供修改意见，辛苦！

qingqing01 · 2024-08-07T10:01:51Z

ppdiffusers/examples/inference/class_conditional_image_generation-dit.py

 from ppdiffusers import DDIMScheduler, DiTPipeline

-dtype = paddle.float32
+os.environ["Inference_Optimize"] = "False"


环境变量全都大写吧

已更改！
感谢提供修改意见，辛苦！

qingqing01 · 2024-08-07T10:03:05Z

ppdiffusers/examples/inference/class_conditional_image_generation-dit.py

 pipe = DiTPipeline.from_pretrained("facebook/DiT-XL-2-256", paddle_dtype=dtype)
 pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
 set_seed(42)

 words = ["golden retriever"]  # class_ids [207]
 class_ids = pipe.get_label_ids(words)

+# warmup
+for i in range(5):
+    image = pipe(class_labels=class_ids, num_inference_steps=25).images[0]


这里只是为了测benchmark，实际用户并不需要warmpup。看下是否增加benchmark开关。

已更改，添加benchmark & inference_optimize 的相关开关！
感谢提供修改意见，辛苦！

qingqing01 · 2024-08-07T10:03:19Z

ppdiffusers/examples/inference/class_conditional_image_generation-dit.py

+
+
+import datetime
+import time


import移动到前面

已更改！
感谢提供修改意见，辛苦！

qingqing01 · 2024-08-07T10:03:42Z

ppdiffusers/examples/inference/class_conditional_image_generation-dit.py


-image = pipe(class_labels=class_ids, num_inference_steps=25).images[0]
+for i in range(repeat_times):
+    image = pipe(class_labels=class_ids, num_inference_steps=25).images[0]


同上，benchmark才需要，用户使用不需要

已更改！
感谢提供修改意见，辛苦！

qingqing01 · 2024-08-07T10:06:20Z

ppdiffusers/ppdiffusers/models/transformer_2d.py

+                                                                        enable_new_ir=True, 
+                                                                        cache_static_model=False,
+                                                                        exp_enable_use_cutlass=True,
+                                                                        delete_pass_lists=["add_norm_fuse_pass"],


遵守代码规范，一行不会要超过80字符

已使用pre-commit调整！
感谢提供修改意见，辛苦！

qingqing01 · 2024-08-07T10:07:11Z

ppdiffusers/ppdiffusers/models/transformer_2d.py

@@ -114,6 +118,8 @@ def __init__(
        self.inner_dim = inner_dim = num_attention_heads * attention_head_dim
        self.data_format = data_format

+        self.Inference_Optimize = os.getenv('Inference_Optimize') == "True"


self.inference_optimize ，遵守命名规范

已更改！
感谢提供修改意见，辛苦！

qingqing01 · 2024-08-07T10:07:46Z

ppdiffusers/ppdiffusers/models/simplified_facebook_dit.py

+import paddle.nn.functional as F
+import math
+
+class SimplifiedFacebookDIT(nn.Layer):


必须一定要简化这个模块吗？

必须一定要简化这个模块吗？

手工优化需要

手工优化需要对原动态图模型组网做高性能精简重组，这一模块还将transformer循环中的冗余计算部分提出，减少了部分计算量。
感谢提供修改意见，辛苦！

zhoutianzi666 · 2024-08-07T11:50:56Z

ppdiffusers/ppdiffusers/models/transformer_2d.py

@@ -221,7 +240,9 @@ def __init__(
            if use_linear_projection:
                self.proj_out = linear_cls(inner_dim, in_channels)
            else:
-                self.proj_out = conv_cls(inner_dim, in_channels, kernel_size=1, stride=1, padding=0, data_format=data_format)
+                self.proj_out = conv_cls(


格式修改请忽略

格式修改请忽略

采用pre-commit统一修改格式！
感谢提供修改意见，辛苦！

zhoutianzi666 · 2024-08-07T11:51:03Z

ppdiffusers/ppdiffusers/models/transformer_2d.py

@@ -154,11 +158,15 @@ def __init__(
        if self.is_input_continuous:
            self.in_channels = in_channels

-            self.norm = nn.GroupNorm(num_groups=norm_num_groups, num_channels=in_channels, epsilon=1e-6, data_format=data_format)
+            self.norm = nn.GroupNorm(


格式修改请忽略

zhoutianzi666 · 2024-08-07T11:51:08Z

ppdiffusers/ppdiffusers/models/transformer_2d.py

            if use_linear_projection:
                self.proj_in = linear_cls(in_channels, inner_dim)
            else:
-                self.proj_in = conv_cls(in_channels, inner_dim, kernel_size=1, stride=1, padding=0, data_format=data_format)
+                self.proj_in = conv_cls(


格式修改请忽略

merge develop

modified the dit

59f23a0

chang-wenbin changed the title ~~modified the dit~~ 对DIT重新组网，固定部分参数，简化模型组网代码 Jul 29, 2024

add zkk_facebook

5fee64b

chang-wenbin changed the title ~~对DIT重新组网，固定部分参数，简化模型组网代码~~ Re-network the DIT, fix some parameters, and simplify the model networking code Jul 29, 2024

update zkk_facebook_dit.py

f653a66

zhoutianzi666 reviewed Jul 30, 2024

View reviewed changes

chang-wenbin added 8 commits July 30, 2024 04:17

update transformer_2d

3b29d9d

update dit optimize

a88caea

update transformer_2d

54eeec2

rename facebook_dit

28a62c0

merge PR

884e29a

merge from develop

15d08b6

Fixed the original dynamic image bug

7d49c49

update triton op import paddlemix

b03aa8e

zhoutianzi666 reviewed Aug 7, 2024

View reviewed changes

update dit

cb86d17

zhoutianzi666 reviewed Aug 7, 2024

View reviewed changes

update transformer_2d & simplified_facebook_dit

dc0c45c

zhoutianzi666 reviewed Aug 7, 2024

View reviewed changes

update demo & implified_facebook_dit & transformer_2d

42f61bc

zhoutianzi666 reviewed Aug 7, 2024

View reviewed changes

chang-wenbin added 4 commits August 7, 2024 08:00

update Inference_Optimize

000dd80

update demo & simplified_facebook_dit

9bb9cde

update demo

d3de838

update demo simplified_facebook_dit transformer_2d

400ab19

vivienfanghuagood reviewed Aug 7, 2024

View reviewed changes

qingqing01 reviewed Aug 7, 2024

View reviewed changes

chang-wenbin added 3 commits August 7, 2024 11:04

update demo transformer_2d & simplified_facebook_dit

bfe8c41

test

8896057

add format

e9aa47d

zhoutianzi666 reviewed Aug 7, 2024

View reviewed changes

chang-wenbin added 6 commits August 7, 2024 11:52

add format

c8916f7

add Argument to the demo

a87f81b

update Argument to the demo

0a09bf2

Merge remote-tracking branch 'upstream/develop' into DIT_PaddleMIX_729

10e8c1f

merge develop

update transformer_2d

10953b5

update DIT_demo

922d7d0

nemonameless approved these changes Aug 28, 2024

View reviewed changes

Merge branch 'develop' into DIT_PaddleMIX_729

c4f8242

nemonameless merged commit aeee830 into PaddlePaddle:develop Aug 28, 2024
3 checks passed

Re-network the DIT, fix some parameters, and simplify the model networking code #632

Re-network the DIT, fix some parameters, and simplify the model networking code #632

Conversation

chang-wenbin commented Jul 29, 2024 • edited Loading

paddle-bot bot commented Jul 29, 2024

CLAassistant commented Jul 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vivienfanghuagood left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhoutianzi666 Aug 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chang-wenbin commented Jul 29, 2024 •

edited

Loading

CLAassistant commented Jul 29, 2024 •

edited

Loading

zhoutianzi666 Aug 7, 2024 •

edited

Loading