Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream models rather than load them completely into RAM. #785

Merged
merged 6 commits into from
Oct 21, 2024

Conversation

50h100a
Copy link
Collaborator

@50h100a 50h100a commented Oct 21, 2024

Change only applied to LLaMA, and only when using DefaultModelLoader.

Requestin' inspection to see if this is practical to do everywhere.

@50h100a 50h100a merged commit be4fa5b into PygmalionAI:main Oct 21, 2024
5 checks passed
@50h100a 50h100a deleted the loadfix branch October 21, 2024 20:15
@BBC-Esq
Copy link

BBC-Esq commented Oct 23, 2024

Is this kind of like AirLLM?

@AlpinDale
Copy link
Member

Is this kind of like AirLLM?

No, this only applies to the model weight loading stage, to avoid excessive CPU RAM usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants