Stream models rather than load them completely into RAM. #785

50h100a · 2024-10-21T01:34:01Z

Change only applied to LLaMA, and only when using DefaultModelLoader.

Requestin' inspection to see if this is practical to do everywhere.

….load_weights

do not force all weights to be loaded at once

BBC-Esq · 2024-10-23T22:47:40Z

Is this kind of like AirLLM?

AlpinDale · 2024-10-23T23:34:54Z

Is this kind of like AirLLM?

No, this only applies to the model weight loading stage, to avoid excessive CPU RAM usage.

50h100a requested a review from AlpinDale October 21, 2024 01:34

50h100a added 3 commits October 21, 2024 01:35

filesize-driven progress bar for loading tensors

371d57a

gather estimated weight bytes alongside iterator and pass it to model…

a061d75

….load_weights

use tensor progress bar

1ca5f47

do not force all weights to be loaded at once

50h100a force-pushed the loadfix branch from 6e06c3b to 1ca5f47 Compare October 21, 2024 01:36

50h100a added 3 commits October 21, 2024 02:04

pass tensor progress bar as weight iterator

1a5e612

iterate over weights normally

9576096

remove progress_bar imports

9022c6d

50h100a merged commit be4fa5b into PygmalionAI:main Oct 21, 2024
5 checks passed

50h100a deleted the loadfix branch October 21, 2024 20:15

Provide feedback