Add SUPIR Upscaler #7219

DN6 · 2024-03-05T09:32:38Z

Model/Pipeline/Scheduler description

SUPIR is a super-resolution model that looks like it produces excellent results

Github Repo: https://github.com/Fanghua-Yu/SUPIR

The model is quite memory intensive, so the optimisation features available in diffusers might be quite helpful in making this accessible to lower resource GPUs.

Open source status

The model implementation is available.
The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

No response

nxbringr · 2024-03-05T11:02:49Z

Hey @DN6, can I please work on this?

yiyixuxu · 2024-03-05T19:17:47Z

@ihkap11 hey! sure!

Bhavay-2001 · 2024-03-18T15:24:22Z

Hi @yiyixuxu, anyone working on this? Can I also contribute? Please let me know how may I proceed?

nxbringr · 2024-03-18T15:28:43Z

Hey @Bhavay-2001 I'm currently working on this. Will post the PR here soon.
I can tag you on the PR if I there is something I need help with :)

Bhavay-2001 · 2024-03-18T15:38:31Z

ok great. Pls let me know.
Thanks

landmann · 2024-03-29T06:47:53Z

@ihkap11 how's it going 😁 I'd loooooove to have this

nxbringr · 2024-03-29T06:48:59Z

Hey @landmann I'll post the PR this weekend and tag you if you want to contribute to it :) apologies for the delay, it's my first new model implementation PR

landmann · 2024-03-29T06:50:14Z

You a real champ 🙌
Happy Friday, my gal/dude!

nxbringr · 2024-03-31T06:13:48Z

Initial Update:

Understood the paper (paper highlights below)
Currently defining paper components that will become diffuser artefacts. (WIP: breaking down SUPIR code)

Paper Insights

Motivation:

IR methods based on generative priors leverage powerful pre-trained generative models to introduce high-quality generation and prior knowledge into IR, bringing significant progress in
perceptual effects and intelligence of IR results.
Continuously enhancing the capabilities of the generative prior is key to achieving more intelligent IR results, with model scaling being a crucial and effective approach.
The authors propose scaling up generative priors and training data to address these limitations.

Architecture Overview:

Generative Prior: The authors choose SDXL (Stable Diffusion XL) as the backbone for their generative prior due to its high-resolution image generation capability without hierarchical design.
Degradation-Robust Encoder: They fine-tune the SDXL encoder to make it robust to degradation, enabling effective mapping of low-quality (LQ) images to the latent space.
Large-Scale Adaptor: The author designed a new adaptor with network trimming and a ZeroSFT connector to control the generation process at the pixel level.
Issues with existing adaptors
- LoRA limits generation but struggles with LQ image control
- T2I lacks the capacity for effective LQ image content identification
- ControlNet’s direct copy is challenging for the SDXL model scale.
1. Network Trimming: Modify the adaptor architecture by trimming half of the ViT blocks in each encoder block (of SDXL) to achieve a balance between network capacity and computational feasibility.
2. Redesigning the Connector: The introduced ZeroSFT module is built upon zero convolution and incorporates an additional spatial feature transfer (SFT) operation and group normalization.
Why do we need this?
- The authors note that while SDXL's generative capacity delivers excellent visual effects, it also makes precise pixel-level control challenging.
- ControlNet uses zero convolution for generation guidance, but relying solely on residuals is insufficient for the level of control required by IR tasks.
Multi-Modality Language Guidance: They incorporate the LLaVA multi-modal large language model to understand image content and guide the restoration process using textual prompts.
Restoration-Guided Sampling: They propose a modified sampling method to selectively guide the prediction results to be close to the LQ image, ensuring fidelity in the restored image.

Thoughts on implementation details:

Trainable components are degradation robust encoder and trimmed ControlNet.
Extend the SDXL class from Diffusers and use SDXL checkpoint = sd_xl_base_1.0_0.9vae.safetensors as base pre-trained generative prior.
The SUPIR model will first load pre-trained weights from the SDXL checkpoint, then it will load SUPIR-specific weights, which include the modifications and additions made to adapt the SDXL model for image restoration tasks.
Trimmed ControlNet encoder which trims half of the ViT blocks from each encoder block. (Todo: Figure out where to make this change)
In the SUPIR model, SDXL (Stable Diffusion XL) is used as the backbone for the generative prior, and the GLVControl and LightGLVUNet modules are used as the adaptor to guide the SDXL model for image restoration. Todo: Convert to Diffusers Artifact
Probably, a dummy code would look like this:

class SUPIRModel(nn.Module):
    def __init__(self, sdxl_model_path):
        super().__init__()
        self.sdxl_pipeline = StableDiffusionXLPipeline.from_pretrained(sdxl_model_path)
        self.glv_control = GLVControl(in_channels=3, out_channels=64, context_dim=128)
        self.light_glv_unet = LightGLVUNet(in_channels=3, out_channels=3)
        
    def forward(self, lq_image, context, num_inference_steps=50):
        # Generate control signal using GLVControl
        control_signal = self.glv_control(lq_image, context)
        
        # Use SDXL pipeline for guided diffusion
        restored_image = self.sdxl_pipeline(
            prompt="",
            image=lq_image,
            control_image=control_signal,
            num_inference_steps=num_inference_steps,
            generator=None,
        ).images[0]
        
        # Refine the restored image using LightGLVUNet
        refined_image = self.light_glv_unet(restored_image, control_signal)
        
        return refined_image

ZeroFST acts as a connector. Todo: Convert to Diffusers Artifact

To cover later:

LLaVA for multi-modality language guidance.

I'm currently in the process of breaking down SUPIR code into diffusers artefacts and figuring out optimization techniques to make it compatible with low-resource GPUs.

Feel free to correct me or start a discussion on this thread. Let me know if you wish to collaborate, I'm happy to set up discussions and work on it together :).

landmann · 2024-04-01T09:58:41Z

Looks fantastic! How far along did you get, @ihkap11 ?

Btw, a good reference for the input parameters are here https://replicate.com/cjwbw/supir?prediction=32glqstbvpjjppxmvcge5gsncu

landmann · 2024-04-03T06:27:15Z

@ihkap11 how you doing? Which part are you stuck?

nxbringr · 2024-04-03T19:29:46Z

Hey @landmann, I'm finding it hard to map a few components from the paper's network architecture details to the codebase they provided.

Currently, I'm stuck on understanding how they are trimming the VIT blocks when using the modified ControlNet adapter with ZeroFST connector at the code level. They seem to use GLVControl but no VIT component and network trimming that I can spot in the codebase.

I sent an email to one of the authors last week. If I don't hear back, I plan to follow up with more specific questions this week. (Also check this issue here) I'm playing with the code in my repo atm here

If interested would you want to take a second look at their code and share your thoughts?

landmann · 2024-04-04T09:16:15Z

@ihkap11 are you following the paper and trying to code it? Why not just make a wrapper around what they have? It's pytorch after all, no? I haven't read the paper much in depth, but was able to run SUPIR locally!

@austinfujimori you should take a look if you're free 🙂

CuddleSabe · 2024-04-08T02:06:59Z

Hey @landmann, I'm finding it hard to map a few components from the paper's network architecture details to the codebase they provided.

Currently, I'm stuck on understanding how they are trimming the VIT blocks when using the modified ControlNet adapter with ZeroFST connector at the code level. They seem to use GLVControl but no VIT component and network trimming that I can spot in the codebase.

I sent an email to one of the authors last week. If I don't hear back, I plan to follow up with more specific questions this week. (Also check this issue here) I'm playing with the code in my repo atm here

If interested would you want to take a second look at their code and share your thoughts?

Hi, I tried don't load the VIT ckpt, and it has no influence !!!

CuddleSabe · 2024-04-10T06:40:58Z

@ihkap11 are you following the paper and trying to code it? Why not just make a wrapper around what they have? It's pytorch after all, no? I haven't read the paper much in depth, but was able to run SUPIR locally!

@austinfujimori you should take a look if you're free 🙂

the "ZeroSFT" is to replace the concat[hidden_states, res_sample] in AttnUpBlock and CrossAttnUpBlock, so we can't use diffusers's sdxl pipeline to implement it.

CuddleSabe · 2024-04-10T06:42:54Z

@ihkap11 are you following the paper and trying to code it? Why not just make a wrapper around what they have? It's pytorch after all, no? I haven't read the paper much in depth, but was able to run SUPIR locally!
@austinfujimori you should take a look if you're free 🙂

the "ZeroSFT" is to replace the concat[hidden_states, res_sample] in AttnUpBlock and CrossAttnUpBlock, so we can't use diffusers's sdxl pipeline to implement it.

because the different between the sgm and diffusers arch, its difficult

nxbringr · 2024-04-10T07:08:47Z

@CuddleSabe would you like to connect on discord to discuss SUPIR and possibly collaborate in figuring out how to support it in diffusers? You can find me by the username tortillachips11.

CuddleSabe · 2024-04-10T07:30:01Z

@CuddleSabe would you like to connect on discord to discuss SUPIR and possibly collaborate in figuring out how to support it in diffusers? You can find me by the username tortillachips11.

well, cant use discord. I write a train script and the model file to train sd1.5 supir. however, I cant publish it because the Company Confidentiality Law

CuddleSabe · 2024-04-10T07:39:00Z

@CuddleSabe would you like to connect on discord to discuss SUPIR and possibly collaborate in figuring out how to support it in diffusers? You can find me by the username tortillachips11.

in sgm, it like this

the "hs" is the res_sample from the down_blocks.
https://github.com/Stability-AI/generative-models/blob/fbdc58cab9f4ee2be7a5e1f2e2787ecd9311942f/sgm/modules/diffusionmodules/openaimodel.py#L849

but in diffusers, it like this
the "res_sample" in diffusers equal the "hs" in sgm

diffusers/src/diffusers/models/unets/unet_2d_condition.py

Line 1189 in 66f94ea

res_hidden_states_tuple=res_samples,

so you need to rewrite the AttnUpBlock2D 、CrossAttnUpBlock2D and UNetMidBlock2DCrossAttn in diffusers

gitlabspy · 2024-06-21T09:24:47Z

Any progress👀?

elismasilva · 2024-06-27T11:38:32Z

hi @ihkap11, have news?

nxbringr · 2024-06-27T11:51:51Z

Hey! I tried but couldn't get this working. Feel free to take over the implementation for this Issue.

elismasilva · 2024-06-27T14:33:59Z

Hey! I tried but couldn't get this working. Feel free to take over the implementation for this Issue.

But do you have a branch where we can continue where you left off? I might try this after I finish a project I'm involved with.

sayakpaul · 2024-06-29T13:34:16Z

Cc: @asomoza

asomoza · 2024-06-30T00:10:11Z

Just in case, this is not an easy task, everything is in the sgm format so there's a lot of conversion involved. It requires a deep understanding of the original code and diffusers.

Probably the best choice here is to start as a research project and convert all the sgm code to diffusers, and then when stuck, get help from the maintainers and the community.

zdxpan · 2024-09-04T02:05:01Z

accoding to the paper and the Comfyui
will impliment below:

SUPIR MODEL_LOADER -> SUPIR_MODEL, SUPIR_vae

- SUPIR_VAE = vae.from_config and load conveted state_dict - --

SUPIR_FIRSTSTAGE Denoiser : take Low quality image in and blur or smooth image out and it`s latent

this stage include an SUPIR VAE include vae-encoder vae-decoder
SUPIR_VAE.encoder(LQ_image) ->supir_latent -> SUPIR_VAE.decoder (supir_latent).

3 SUPIR-controlnet : which take latents in and time stpes in , generate controlnet residuals dowsamples and midsample out

Class trim_controlnet(Controlnet_normal):
not impliment

4 An hacked Unet which modify the connector of each dow and up blocks use ZeroSFT

replace Unet-zero_conv-connector

DN6 added help wanted Extra attention is needed New pipeline/model contributions-welcome labels Mar 5, 2024

sayakpaul mentioned this issue Mar 18, 2024

Open source upscaler: Clarity #7367

Open

sflindrs mentioned this issue Apr 18, 2024

[Feature Request]: SUPIR upscaler AUTOMATIC1111/stable-diffusion-webui#15066

Open

1 task

Guaishou74851 mentioned this issue Sep 2, 2024

Request for diffusers Library Support and LoRA Integration in SUPIR Fanghua-Yu/SUPIR#143

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SUPIR Upscaler #7219

Add SUPIR Upscaler #7219

DN6 commented Mar 5, 2024

nxbringr commented Mar 5, 2024 •

edited

Loading

yiyixuxu commented Mar 5, 2024

Bhavay-2001 commented Mar 18, 2024

nxbringr commented Mar 18, 2024

Bhavay-2001 commented Mar 18, 2024

landmann commented Mar 29, 2024

nxbringr commented Mar 29, 2024 •

edited

Loading

landmann commented Mar 29, 2024 •

edited

Loading

nxbringr commented Mar 31, 2024 •

edited

Loading

Motivation:

Architecture Overview:

landmann commented Apr 1, 2024

landmann commented Apr 3, 2024

nxbringr commented Apr 3, 2024 •

edited

Loading

landmann commented Apr 4, 2024 •

edited

Loading

CuddleSabe commented Apr 8, 2024 •

edited

Loading

CuddleSabe commented Apr 10, 2024

CuddleSabe commented Apr 10, 2024

nxbringr commented Apr 10, 2024 •

edited

Loading

CuddleSabe commented Apr 10, 2024

CuddleSabe commented Apr 10, 2024

gitlabspy commented Jun 21, 2024

elismasilva commented Jun 27, 2024

nxbringr commented Jun 27, 2024

elismasilva commented Jun 27, 2024

sayakpaul commented Jun 29, 2024

asomoza commented Jun 30, 2024

zdxpan commented Sep 4, 2024 •

edited

Loading

Add SUPIR Upscaler #7219

Add SUPIR Upscaler #7219

Comments

DN6 commented Mar 5, 2024

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

nxbringr commented Mar 5, 2024 • edited Loading

yiyixuxu commented Mar 5, 2024

Bhavay-2001 commented Mar 18, 2024

nxbringr commented Mar 18, 2024

Bhavay-2001 commented Mar 18, 2024

landmann commented Mar 29, 2024

nxbringr commented Mar 29, 2024 • edited Loading

landmann commented Mar 29, 2024 • edited Loading

nxbringr commented Mar 31, 2024 • edited Loading

Motivation:

Architecture Overview:

Thoughts on implementation details:

landmann commented Apr 1, 2024

landmann commented Apr 3, 2024

nxbringr commented Apr 3, 2024 • edited Loading

landmann commented Apr 4, 2024 • edited Loading

CuddleSabe commented Apr 8, 2024 • edited Loading

CuddleSabe commented Apr 10, 2024

CuddleSabe commented Apr 10, 2024

nxbringr commented Apr 10, 2024 • edited Loading

CuddleSabe commented Apr 10, 2024

CuddleSabe commented Apr 10, 2024

gitlabspy commented Jun 21, 2024

elismasilva commented Jun 27, 2024

nxbringr commented Jun 27, 2024

elismasilva commented Jun 27, 2024

sayakpaul commented Jun 29, 2024

asomoza commented Jun 30, 2024

zdxpan commented Sep 4, 2024 • edited Loading

nxbringr commented Mar 5, 2024 •

edited

Loading

nxbringr commented Mar 29, 2024 •

edited

Loading

landmann commented Mar 29, 2024 •

edited

Loading

nxbringr commented Mar 31, 2024 •

edited

Loading

nxbringr commented Apr 3, 2024 •

edited

Loading

landmann commented Apr 4, 2024 •

edited

Loading

CuddleSabe commented Apr 8, 2024 •

edited

Loading

nxbringr commented Apr 10, 2024 •

edited

Loading

zdxpan commented Sep 4, 2024 •

edited

Loading