-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A general question about the terminology of a content-addressable storage #137
Comments
The hash in funflow is precisely calculated from the output data. The hash in nix is calculated from just the input expression. This is sometimes known as the difference between an "intensional" and "extensional" store (see section 6 of the thesis) and is one of the things that differentiates funflow's store from nix's store. |
@mpickering Oh in that case, how come it says this on your README:
|
@CMCDragonkai funflow uses two kinds of hashes. It hashes the inputs to a task and links them to the resulting output so that we can cache computations. It also uses content hashes to deduplicate the store. The next paragraph in the README points to the CAS aspect
|
I've reread the nix phd thesis on intensional store, I understand what you mean now. @aherrmann Can you point out in the code where is the table/datastructure that links the extensional hash with the intensional hash? |
@CMCDragonkai Sure, the links are stored on the file system as symbolic links. These are created here. On lookup we check for such a symbolic link and if it exists resolve it to determine the CAS item here. |
The content store used here and in Nix, is a store where the data is addressed by some hash. But this hash is not calculated by the content of the data, but by some reproducible expression that produces the data.
Is it still correct to call it a content-addressable storage, or is there some better name for this or is there some qualifier that qualifies this content-addressable storage as something different? The wikipedia article for CAS only mentions that the hash is built from the content that is being stored, not some input expression that produces the content.
Some other thoughts that I'm hoping others can help put in a more formal way:
Looking at this in a more general way. There is some notion of equivalence between the expression that produces a value, and the value itself. However the mapping is surjective, as many kinds of input expressions may map to the same value (which is why canonicalisation/normalisation of the input expression is useful). We assume that hashing the expression and looking it up is faster than running the expression and hashing the resulting value and looking it up.
The text was updated successfully, but these errors were encountered: