[Double Control] What model is most needed? #30
Replies: 23 comments 28 replies
-
I am very interested in depth-aware inpainting, which I also mentioned in #24. Additionally, img2img with additional pose/depth control would be amazing! A big use-case for this would be for rotoscoped animations with (potentially) temporal coherence. By using the previous frame (with background removed) as the input to the next and having the additional pose control layer, we may be able to change the fewest pixels possible. Previously this has been impossible and all Stable Diffusion animations have their trademark flicker and lack of coherence. I think "double controls" may allow us to crack that nut! This was accomplished by adding controls from the pose model to an anime model, running each frame through a background remover, and a little prompt engineering: Screenshare.-.2023-02-13.2_45_57.AM.mp4Great coherence on outfit, but terribly flickery! Something to fix that flicker would be incredible. |
Beta Was this translation helpful? Give feedback.
-
I think canny-edge-aware inpainting would be very useful for better shape control with inpainting. |
Beta Was this translation helpful? Give feedback.
-
Great idea - I would be very interested to see any combinations of depth, semantic segmentation, and surface normals. I'm very interested to see how the model handles trade-offs between them, or if it can still generate diverse images. Also, great work! |
Beta Was this translation helpful? Give feedback.
-
Sizleri rahatsız edici bir durum söz konusu değildir bu platformu bilmiyorum fakat benim özel hayatımda ciddi etki yarattı bir bilen benim bir kaç sorumu yanıtlarmi rica ediyorum |
Beta Was this translation helpful? Give feedback.
-
awesome!I look forward to "depth-aware inpainting" or "canny-edge-aware inpainting". I think these will be very useful for build 3d texture map for a pre-build mesh. |
Beta Was this translation helpful? Give feedback.
-
I think we also need inpainting model support in general, if possible. They don't work with ControlNet right now, but these models' ability to recognise the surroundings of masked area to generate seamless output is very useful. The problem is, they generate whatever they see fit, and if you inpaint a complicated image, you can only prompt it. |
Beta Was this translation helpful? Give feedback.
-
Is it possible to make a controlnet for colors? |
Beta Was this translation helpful? Give feedback.
-
I was thinking about instruct pix2pix version of Controlnet.And of course waiting for 2.1 versions. |
Beta Was this translation helpful? Give feedback.
-
除了目前的轮廓控制外,还希望有额外的颜色标注,图片内对象内容的多颗粒度标注(比如这个是一个瓶子,这个是一只狗,这是一个人的头部、表情微笑),景深标记(这个位于顶层,这个位于底层,这个位于第x层),物体对象的姿势骨骼标注 我觉得通过构图轮廓、颜色标记、内容标记(多颗粒度多参数的)、景深标记,物体的姿势骨骼标记 这些弄下来,基本就把扩散模型驯化成完全可控的了吧😂 感觉已经很完整了,想不出额外的了 |
Beta Was this translation helpful? Give feedback.
-
Another idea, would it be possible to add an additional input for a clip image embedding? That way potentially a similar feature to Midjourney's image prompts could be achieved... |
Beta Was this translation helpful? Give feedback.
-
Would love to see a model that controls luminosity. To be able to take an image and apply its luminosity and/or tone to a new diffusion. |
Beta Was this translation helpful? Give feedback.
-
A plain inpainting model, capable of turning any fine-tuned version of SD into something that works like 1.5-inpainting, would be amazing. I would also like to see a model that accepts an image and a number indicating a time offset or something like that, to allow some form of interpolation between previous and future frames in a video. People are already applying multiple ControlNets to a single generated image, so it could be nice if you could just stack multiple instances of this net with different frames. The same as above could also probably be reused for generating another view of a building/object from another angle, as this would just be equivalent to the camera moving over that time offset. If this could be used to add consistency to Stable Diffusion, it would allow some great use cases for 3D art too. |
Beta Was this translation helpful? Give feedback.
-
I am sorry if I don't understand, but isn't inpainting + scribbles (Fig 16) already double control? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Inpainting + Depth is definitely a must for me :) |
Beta Was this translation helpful? Give feedback.
-
actually add up two control net simply could realize the DuoControl effect. Here's an example: the jupyter nb is provided here: |
Beta Was this translation helpful? Give feedback.
-
@lllyasviel do you have an approximate ETA? This is not a demand at all, just a polite ask for when you think you'll have a new model on this, specifically depth + inpainting. Even just a rough estimate would be nice to know more or less how many weeks. Thanks! |
Beta Was this translation helpful? Give feedback.
-
I'm not sure I understand the technical aspect of the question fully, but here are a few things that may trigger some ideas:
|
Beta Was this translation helpful? Give feedback.
-
Hi, is any "double controlnet training" code released now? I am trying to train the controlnet to disentangle some attributes in the image and control them while training a single controlnet cannot disentangle some attributes, since one attribute may be based on another attribute, so training two separate controlnet is not a good idea. I think it is neccesary to train "double controlnet". If I want to achieve this function, the change in code is that we concatenate the second controlnet latent into the original one ? |
Beta Was this translation helpful? Give feedback.
-
I just stumpled upon this "Color-Canny ControlNet" I'd like to share. It was trained on images with canny and color mushed together. One could argue: "what's the point? we can use multi-controlnet for this" and I agree but I think it's a good example for further discussion.
Me neither. To clarify I want to outline different "terms":
but I quite don't understand the difference between "multi-channel" and "multi-control". or is this really just about "yet another clever way of preparing a dataset to get a more specialized control net"? Related work:
|
Beta Was this translation helpful? Give feedback.
-
Additional tabular control for Brain-to-idea human models A permutation of every control net I know of which could help to come up with further ideas. Now we just have to fill out the boxes. Let the first row be the "xyz-aware" part, e.g. "Depth-aware ... Inpainting":
TODO:
|
Beta Was this translation helpful? Give feedback.
-
also see huggingface/diffusers#5406 (comment) (canny + inpaint) |
Beta Was this translation helpful? Give feedback.
-
For anyone who might be looking at this in the future. I looked into training a ControlNet model along side a second ControlNet model and I found that it was just much slower to train without a much noticeable difference in the end model. Perhaps it could be worth training models separately and then fine-tuning together, but I haven't tried it. I also attempted to train ControlNet for something like TryOnDiffusion but found that the architecture for ControlNet just isn't well suited for that kind of task and it seems to be more suited for learning structural features and pixelwise comparisons. It's worth noting I was training models on a single 4090, so I didn't push training to it's absolute limits, but once it looked like it wasn't really learning after a day or so, I gave up. I've had some better initial luck training IP Adapters for models that are more focused on semantic meaning rather than pixelwise comparison. I hope to get to the point where I can implement/train models directly from papers for Stable Diffusion but I'm not there yet. Before looking into IP Adapters I briefly looked into training a single ControlNet where instead of a text prompt for cross attention I used an image embedding, but my initial tests were unsuccessful. I might revisit this, since I think there might be some use for that and my hunch is that I just had the wrong implementation the first time. |
Beta Was this translation helpful? Give feedback.
-
https://huggingface.co/xinsir/controlnet-union-sdxl-1.0 |
Beta Was this translation helpful? Give feedback.
-
We plan to train some models with "double controls", use two concat control maps and we are considering using images with holes as the second control map. This will lead to some model like "depth-aware inpainting" or "canny-edge-aware inpainting". Please also let us know if you have good suggestions.
Beta Was this translation helpful? Give feedback.
All reactions