Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GIFs getting the same hash when the first frame is identical #33

Open
anatolyra opened this issue Feb 24, 2019 · 4 comments
Open

GIFs getting the same hash when the first frame is identical #33

anatolyra opened this issue Feb 24, 2019 · 4 comments
Labels
question Further information is requested

Comments

@anatolyra
Copy link

Hi,

When generating hash values for two GIF files. I'm getting the same hash value for both if the first frame in both is identical.
two_dogs_1
two_dogs_2

Is that the expected behavior?

Thanks!

@KilianB
Copy link
Owner

KilianB commented Feb 24, 2019

it currently is, but maybe we can alter it to something you seem appropriate.

What behavior would you like?

Create a single hash for the entire gif?

  • Concatenate different hashes for each frame? (Same gif will match but different order won't, this will prevent comparison of gifs with different number of frames.
  • Creating a hash object for each individual image and group it in some kind of way to be able to compare all images and a single image within the gifs? (Search if individual images match within a collection?)
  • ....

@KilianB KilianB added the question Further information is requested label Feb 24, 2019
@KilianB
Copy link
Owner

KilianB commented Feb 24, 2019

My suggestion is to create a "gif hash collection" allowing for different similarity distances.

intersect find image matches contained in both gif collections
distinct 1 - intersect
totalDistance summed distance frame by frame
minDistance summed distance for each frame to the closest frame
distanceShifted total distance but shifted to create the lowest value

@anatolyra
Copy link
Author

I like what you suggest. A couple of things:

  1. intersect - to find image matches in both collections, you'll have to allow for giving a minimum similarity value
  2. What do you mean by distinct?
  3. Maybe give a result of average distance and variance?

Thanks!

@KilianB
Copy link
Owner

KilianB commented Feb 24, 2019

I define distinct as the inverse operation of intersection. Return all images which are unique to one collection.

The issue tracker serves as notes and comments, therefore don't worry if it gets a bit messy. I am just writing down random thoughts.

Coding all of this is trivial and can be done within a short time, the issue arrises from a design point perspective:

  • For my liking I would create a new class similar to FuzzyHash which groups hashes together. The hash object can be returned from the hashing algorithm easily if it extends Hash. If this is done maybe it's time to implement a new abstract super class hash collection.
  • Searching for images is more a feature of an `ImageMatcher' rather than a hash object. Semantically creating a hash object bothers me a tiny bit (We could query if an image is contained in the gif).
  • The base functionalities of the default hash object is still valid, therefore inheritance is the way to go but at the same time it's also a composition.

Note: This link explains how frames can be extracted from gif images: https://stackoverflow.com/questions/8933893/convert-each-animated-gif-frame-to-a-separate-bufferedimage . This method requires a file as an input, we should also support a utility loader for gif images to not require the user to perform the same FileIO multiple times if he want's to hash the same gif with multiple algorithms. Are there any gif containers available or should we create our own bufferedImagecollection?

Do we want to overload the hash method of hashing algorithms checking if the supplied image is a gif and create the appropriate GifHashCollection, or create an entirely new method hashGif?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants