SHA256 checksums housekeeping #433

anzz1 · 2023-03-21T23:05:19Z

anzz1
Mar 21, 2023

Outdated original post for posterity

Not all of these checksums seem to be correct. Are they calculated with the "v2" new model format after the tokenizer change? PR: #252 Issue: #324

For example, "models/alpaca-7B/ggml-model-q4_0.bin"

v1: 1f582babc2bd56bb63b33141898748657d369fd110c4358b2bc280907882bf13
v2: 8d5562ec1d8a7cfdcf8985a9ddf353339d942c7cf52855a92c9ff59f03b541bc

The SHA256SUMS file has the old v1 hash.
Maybe using a naming scheme like "ggml2-model-q4_0.bin" would be good to differentiate between the versions and avoid confusion.

Originally posted by @anzz1 in #338 (comment)

edit: After converting the models to the new format, I found out that the "v2" hash above is also incorrect.
The sha256 for ./models/alpaca-7B-ggml/ggml-model-q4_0.bin is supposed to be 2fe0cd21df9c235c0d917c14e1b18d2d7320ed5d8abe48545518e96bb4227524

This is now a general discussion about keeping sha256 checksums updated and maybe have some sort of standardisation.

gjmulder · 2023-03-22T11:02:55Z

gjmulder
Mar 22, 2023
Collaborator

I'm still in the process of finding/converting the 7B and 13B alpaca models to ggml2

I'll then recompute all the hashes with the latest build, and also provide a file with the magic numbers and versions for each.

0 replies

Green-Sky · 2023-03-22T13:14:12Z

Green-Sky
Mar 22, 2023
Collaborator

the new ggml file format has the version number 1. calling it ggml2 or "v2" is going to cause confusion. the new file format switched the file magic from "ggml" to "ggmf", maybe we should lean into that.

0 replies

anzz1 · 2023-03-23T07:34:03Z

anzz1
Mar 23, 2023
Author

Some checksums (q4_0 and gptq-4b quantizations, new tokenizer format)

ggml-q4-checksums.zip

e: added more checksums

0 replies

gjmulder · 2023-03-23T14:32:18Z

gjmulder
Mar 23, 2023
Collaborator

Some checksums (q4_0 quantization, new tokenizer format)

ggml-q4_0-checksums.zip

I'd trust your checksums for the alpaca models over mine.

$ cat SHA256SUMS.gary
alpaca-13B-ggml/ggml-model-q4_0.bin: FAILED
alpaca-13B-ggml/params.json: FAILED open or read
alpaca-13B-ggml/tokenizer.model: FAILED open or read
alpaca-30B-ggml/ggml-model-q4_0.bin: OK
alpaca-30B-ggml/params.json: OK
alpaca-30B-ggml/tokenizer.model: FAILED open or read
alpaca-7B-ggml/ggml-model-q4_0.bin: FAILED
alpaca-7B-ggml/params.json: FAILED open or read
alpaca-7B-ggml/tokenizer.model: FAILED open or read
llama-13B-ggml/ggml-model-q4_0.bin: OK
llama-13B-ggml/ggml-model-q4_0.bin.1: OK
llama-13B-ggml/params.json: OK
llama-13B-ggml/tokenizer.model: FAILED open or read
llama-30B-ggml/ggml-model-q4_0.bin: OK
llama-30B-ggml/ggml-model-q4_0.bin.1: OK
llama-30B-ggml/ggml-model-q4_0.bin.2: OK
llama-30B-ggml/ggml-model-q4_0.bin.3: OK
llama-30B-ggml/params.json: OK
llama-30B-ggml/tokenizer.model: FAILED open or read
llama-65B-ggml/ggml-model-q4_0.bin: OK
llama-65B-ggml/ggml-model-q4_0.bin.1: OK
llama-65B-ggml/ggml-model-q4_0.bin.2: OK
llama-65B-ggml/ggml-model-q4_0.bin.3: OK
llama-65B-ggml/ggml-model-q4_0.bin.4: OK
llama-65B-ggml/ggml-model-q4_0.bin.5: OK
llama-65B-ggml/ggml-model-q4_0.bin.6: OK
llama-65B-ggml/ggml-model-q4_0.bin.7: OK
llama-65B-ggml/params.json: OK
llama-65B-ggml/tokenizer.model: FAILED open or read
llama-7B-ggml/ggml-model-q4_0.bin: OK
llama-7B-ggml/params.json: OK
llama-7B-ggml/tokenizer.model: FAILED open or read

0 replies

Green-Sky · 2023-03-23T14:52:27Z

Green-Sky
Mar 23, 2023
Collaborator

the problem with the alpaca models is, that there are alot of different once, by different peoples.

0 replies

gjmulder · 2023-03-23T15:05:22Z

gjmulder
Mar 23, 2023
Collaborator

the problem with the alpaca models is, that there are alot of different once, by different peoples.

Yes. However we're supporting them, so we need to decide what we can support.

0 replies

gjmulder · 2023-03-23T15:18:23Z

gjmulder
Mar 23, 2023
Collaborator

Upvote for @anzz1's new naming convention for the various model subdirs.

0 replies

Green-Sky · 2023-03-23T15:41:59Z

Green-Sky
Mar 23, 2023
Collaborator

@anzz1 why is the tokenizer.model duplicated everywhere, afaik there is only 1

0 replies

anzz1 · 2023-03-23T16:12:35Z

anzz1
Mar 23, 2023
Author

@Green-Sky Yeah there is only one, i might be thinking ahead too much. 😄

also added some more checksums for gptq-4b models above #374 (comment)

0 replies

Green-Sky · 2023-03-23T16:31:15Z

Green-Sky
Mar 23, 2023
Collaborator

IMHO, I think we should move the alpaca checksums to a discussion, with a thread for each indiviual model, with source and credits and converted checksums.
I don't think we can tame the diverse 🦙 hoard otherwise.

0 replies

gjmulder · 2023-03-23T17:13:38Z

gjmulder
Mar 23, 2023
Collaborator

How about an individual SHA256SUMS.model_type file per model type?

That way we have some granularity and it is self-documenting for new users who don't know a llama from an alpaca.

0 replies

anzz1 · 2023-03-23T17:44:57Z

anzz1
Mar 23, 2023
Author

yes it might be good to differentiate ones as some have short fur and some long and some are more friendly than others.
but llamas will always be the llamas and alpacas will be many. llamas are stable, but alpacas are wild cards. i don't see much value in documenting a million different alpaca variations, there should be a standard set to test against but otherwise no point in trying to document every grain of sand at the beach

1 "standard" sum per 1 model type seems to make the most sense. i cant see why they would need to be their own files though, as i'm not big fan of the idea of littering a repo with dozens of files when the same thing can be achieved with dozens of lines in a single file.

i agree this should be moved to discussions as it will be a ongoing thing

0 replies

Green-Sky · 2023-03-23T18:10:54Z

Green-Sky
Mar 23, 2023
Collaborator

alpaca (LoRa?) 30B q4_0 by Pi3141
https://huggingface.co/Pi3141/alpaca-30B-ggml/tree/main

as of time of writing, uses the old unversioned file format.

converted (ggmf v1) sha256sum:

969652d32ce186ca3c93217ece8311ebe81f15939aa66a6fe162a08dd893faf8  ggml-model-q4_0.bin

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SHA256 checksums housekeeping #433

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 13 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

SHA256 checksums housekeeping #433

anzz1 Mar 21, 2023

Replies: 13 comments

gjmulder Mar 22, 2023 Collaborator

Green-Sky Mar 22, 2023 Collaborator

anzz1 Mar 23, 2023 Author

gjmulder Mar 23, 2023 Collaborator

Green-Sky Mar 23, 2023 Collaborator

gjmulder Mar 23, 2023 Collaborator

gjmulder Mar 23, 2023 Collaborator

Green-Sky Mar 23, 2023 Collaborator

anzz1 Mar 23, 2023 Author

Green-Sky Mar 23, 2023 Collaborator

gjmulder Mar 23, 2023 Collaborator

anzz1 Mar 23, 2023 Author

Green-Sky Mar 23, 2023 Collaborator

anzz1
Mar 21, 2023

gjmulder
Mar 22, 2023
Collaborator

Green-Sky
Mar 22, 2023
Collaborator

anzz1
Mar 23, 2023
Author

gjmulder
Mar 23, 2023
Collaborator

Green-Sky
Mar 23, 2023
Collaborator

gjmulder
Mar 23, 2023
Collaborator

gjmulder
Mar 23, 2023
Collaborator

Green-Sky
Mar 23, 2023
Collaborator

anzz1
Mar 23, 2023
Author

Green-Sky
Mar 23, 2023
Collaborator

gjmulder
Mar 23, 2023
Collaborator

anzz1
Mar 23, 2023
Author

Green-Sky
Mar 23, 2023
Collaborator