-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data-discovery and index #19
Labels
enhancement
New feature or request
help wanted
Extra attention is needed
question
Further information is requested
Comments
Indexing is now so fast (2-300 ms) usually that it is not necessary to keep a full cache of this locally. To speed up discovery it is probably best to not pre-compute the DAS and DDS requests either. I think maybe a good solution could be:
Then we are less dependent on the latency to the database server which seems to be in the 100s of ms range for a kubernetes cluster, but we can still use a standard setup. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
enhancement
New feature or request
help wanted
Extra attention is needed
question
Further information is requested
In gauteh/hidefix#8 a couple of different DBs have been benchmarked. The deserialization of the full index of a large file (4gb) takes about 8 us (on my laptop), its about 8 mb, and takes about 100-150 ns to read from memory-mapped type local databases (sled, heed). Reading it (8 mb binary) from redis, sqlite or similar takes about 3 to 6 ms which is maybe a bit too high. It would be interesting to also try postgres.
A solution could be:
Unfortunately this complicates things significantly, but I don't see how to avoid it when scaling up. It would be nice to still support a stand-alone server that does not need a central db, but just caches locally and discovers datasets itself in some way. That would make it significantly easier to test the server out.
Some reasons:
Since data is usually on network disks, caching data could possibly be done using large file system cache or maybe something like https://docs.rs/freqfs/latest/freqfs/index.html.
@magnusuMET
The text was updated successfully, but these errors were encountered: