Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generic cas-hashable contentHash result change if you move your type to another module / package #202

Open
guibou opened this issue Dec 30, 2021 · 3 comments

Comments

@guibou
Copy link
Contributor

guibou commented Dec 30, 2021

Describe the bug

The result of for contentHash depends on the name of module and package where the type is defined, see for example the instance for [a]:

https://github.com/tweag/funflow/blob/master/cas/hashable/src/Data/CAS/ContentHashable.hs#L385

It depends on contentHashUpdate_fingerprint which takes the module name and package name into account.

It leads to a change of hash if you move your type to another module / package.

This is super surprising.

To Reproduce

Create a type such as:

data MyType = MyType Int
   deriving (ContentHashable m, Generic)

and call contentHash ([MyType 5])

Move it to different module or different package and see the different hash.

Expected behavior

Either it should be documented it should not depend on the package or module name.

Additional context

Note that running the compiled code or running the code from GHCi will also change the hash, because in GHCi, package name is 'main', when it is different in a compiled code.

@guibou
Copy link
Contributor Author

guibou commented Dec 30, 2021

At the same time, the code of https://github.com/tweag/funflow/blob/master/cas/hashable/src/Data/CAS/ContentHashable.hs#L466-L468 looks suspicious, it does the same hash 3 time on datatypeName

@guibou
Copy link
Contributor Author

guibou commented Sep 1, 2023

Actually, the same thing is happening if you change some base library.

For example contentHash (mempty :: Data.HashMap.Strict.HashMap () ()) returns a different result on two different GHC environment. I cannot really detail because I don't understand yet why.

The only thing I'm sure is that it is due to the typeRepFingerprint (typeOf (mempty :: HashMap () ())) which is different.

guibou added a commit to guibou/funflow that referenced this issue Sep 1, 2023
guibou added a commit to guibou/funflow that referenced this issue Sep 1, 2023
Close tweag#202

The problem was two folds:

- GHC internal fingerprint computation may change, leading to an
  invalidation of (some) of the hash computed. Instead of using the type
  fingerprint, we only use the `show $ typeRef x` (which is roughly a
  string representation of the type). It will be more stable and should
  discriminate enough.
- GHC generic was also including module and package name, hence it was
  not possible to move a type to another module/package without
  invalidating hashs. This is removed. It means that two isomorphic
  types with same name but from different package would lead to the same
  hashs, I don't thing that's a real problem.
guibou added a commit to guibou/funflow that referenced this issue Sep 1, 2023
Close tweag#202

- GHC internal fingerprint computation may change, leading to an
  invalidation of (some) of the hash computed. Instead of using the type
  fingerprint, we only use the `show $ typeRef x` (which is roughly a
  string representation of the type). It will be more stable and should
  discriminate enough.
@guibou
Copy link
Contributor Author

guibou commented Sep 1, 2023

The branch https://github.com/guibou/funflow/tree/stable_unordered-containers-hash-cas-hashable contains two commits to fix this problem:

Both solutions are discutable. I'm available to discuss them, and open proper MR if we agree on the direction to use.

However, removing cas-hashable from our codebase took me less time than fixing this (I'm now using a one line function which compute the sha256 sum of the output of cereal serialization, which is enough for my needs and should be more robust and easier to maintain), so I won't pushing toward a fix here unless you ask / continue the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants