You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I implemented a simple bloomfilter on my own and wondered, why your implementation was by a factor of ~2.5 faster than my implementation.
So I started to inspect your code and figured out that there is a serious issue:
Your different hashfunctions, implemented as HashIter, are not independent from each other.
Due to cyclic groups algebra, some values may be occur much more often than others. A fix for this problem would be to use independent RandomStates for each hashfunction.
This results in vastly bigger false-positive rates than intended.
See the following code snippet to understand what I mean:
for _ in0..30{let rnd_st1 = RandomState::new();let rnd_st2 = RandomState::new();let hash_iter = HashIter::from(42u32,1000000u32,&rnd_st1,&rnd_st2);letmut arr = vec![0;1<<16];for val in hash_iter {
arr[val asu16asusize] +=1;}letmut hashmap :HashMap<usize,usize> = HashMap::new();for val in arr {let x :usize;ifletSome(_x) = hashmap.get_mut(&val){
x = *_x + 1;}else{
x = 1;}
hashmap.insert(val, x);}println!("HashMap {{(value, occurences)}} || {:?}",hashmap);}
prints (not always the same due to some randomness):
Hi,
I implemented a simple bloomfilter on my own and wondered, why your implementation was by a factor of ~2.5 faster than my implementation.
So I started to inspect your code and figured out that there is a serious issue:
Your different hashfunctions, implemented as
HashIter
, are not independent from each other.Due to cyclic groups algebra, some values may be occur much more often than others. A fix for this problem would be to use independent
RandomState
s for each hashfunction.This results in vastly bigger false-positive rates than intended.
See the following code snippet to understand what I mean:
prints (not always the same due to some randomness):
main.rs.txt
The text was updated successfully, but these errors were encountered: