Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cache of load checkpoints #3545

Closed
wants to merge 3 commits into from
Closed

add cache of load checkpoints #3545

wants to merge 3 commits into from

Conversation

efwfe
Copy link

@efwfe efwfe commented May 23, 2024

Hello

I used my method to reduce the loss of multiple switching and loading of the model. Although comfyui's current use of memory is efficient enough, I still dare to try it. I found that the effect is quite obvious. I hope it can be used by the author. See if this can usually be applied to the trunk branch.

Here is an example.
aa

print(f"[==Cache==] Removed {k} from cache.")

def add_cache(self, k, item):
"""模型patch会修改key的名字,缓存之前保留一份key的数据和值的引用"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should keep comments in English for PRs to the main repo

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, I fixed already.

# get matched ckpt, vae, controlnet queue
cache_mapper = caches_mapping.get(dir_name, None)

if cache_mapper is not None and device in (None, torch.device("cpu"), {}):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the device is different, but the model is in cache, wouldn't it be better to use the cached copy and .to(device)?

Copy link
Author

@efwfe efwfe May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will try it later, Some tensor are loaded to GPU.

@ltdrdata
Copy link
Collaborator

ltdrdata commented May 23, 2024

I don't think the checkpoint cache should be managed automatically.

If the cache is released too lazily, even in a typical memory environment with around 32GB, using a few checkpoints can quickly lead to a shortage of RAM. On the other hand, if the cache is released too aggressively, checkpoints that shouldn't be released may be unloaded, requiring them to be reloaded.

The release and loading of checkpoints should be very deliberate and manually controllable.

That's why, in the Inspire Pack, I manage the checkpoint cache through keys and provide a feature to manually remove keys.

@mcmonkey4eva
Copy link
Contributor

mcmonkey4eva commented May 23, 2024

There's a place for both systems imo -- manual is great for people like you and me that know exactly what we're doing and are willing to monitor it closely, but automatic is better for less experienced users, or shared instances (ie you're not the only person mucking with what's loaded or not), or just people who want things to load fast without having to think about it.

There's a great point about considering memory load -- the existing ComfyUI code has a VRAM management module that automatically unloads from VRAM when it's too full to be able to fit things in. The cache here should probably operate similarly -- automatically unload cached data from RAM to make room when things might get tight. Probably there should even be a variable to set of the minimum amount of free space the cache needs to guarantee (since of course the cache can't track every memory allocation, a minimum free amount would reduce the risk of overloading).

@efwfe
Copy link
Author

efwfe commented May 24, 2024

I don't think the checkpoint cache should be managed automatically.

If the cache is released too lazily, even in a typical memory environment with around 32GB, using a few checkpoints can quickly lead to a shortage of RAM. On the other hand, if the cache is released too aggressively, checkpoints that shouldn't be released may be unloaded, requiring them to be reloaded.

The release and loading of checkpoints should be very deliberate and manually controllable.

That's why, in the Inspire Pack, I manage the checkpoint cache through keys and provide a feature to manually remove keys.

Actually not much memory cost here, It's just cache the references, Reduce the MAX_CACHE_xxx environment can reduce the cost.

@ltdrdata
Copy link
Collaborator

I don't think the checkpoint cache should be managed automatically.
If the cache is released too lazily, even in a typical memory environment with around 32GB, using a few checkpoints can quickly lead to a shortage of RAM. On the other hand, if the cache is released too aggressively, checkpoints that shouldn't be released may be unloaded, requiring them to be reloaded.
The release and loading of checkpoints should be very deliberate and manually controllable.
That's why, in the Inspire Pack, I manage the checkpoint cache through keys and provide a feature to manually remove keys.

Actually not much memory cost here, It's just cache the references, Reduce the MAX_CACHE_xxx environment can reduce the cost.

Oh, I didn't look at the code properly last night and mistakenly thought caching was based on the count. If caching is managed based on the model size, it shouldn't be a big issue.

efwfe added 2 commits May 24, 2024 15:34
comment to english
Do not filter the load device
@efwfe efwfe closed this May 29, 2024
@efwfe
Copy link
Author

efwfe commented May 29, 2024

Sorry, some errors here, Update soon.

@efwfe
Copy link
Author

efwfe commented May 30, 2024

Sorry, some errors here, Update soon.

Please check it here. #3605

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants