-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clipmenud: add option to deduplicate entries #227
base: develop
Are you sure you want to change the base?
Conversation
I was thinking, maybe it might be worth implementing tombstone now and instead of memmoving, simply marking the older entry as "dead" to be removed later. Though I'd assume that the linear search is probably the bigger issue. Maybe maintaining a hashmap of A simpler method might be to have a probabilistic cache of For now though, I haven't really noticed much of an issue in practice. Though I use a fairly low max clip size of 128. And both the linear search + memmove only ever happens when there's an actual duplicate, not on each insertion. If performance becomes an issue, I can look into doing the probabilistic cache or tombstone, whichever is the bottleneck. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
src/store.c
Outdated
@@ -546,7 +566,7 @@ int cs_add(struct clip_store *cs, const char *content, uint64_t *out_hash) { | |||
*out_hash = hash; | |||
} | |||
|
|||
return cs_snip_add(cs, hash, line, nr_lines); | |||
return (ret == 1) ? 0 : cs_snip_add(cs, hash, line, nr_lines); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once you use expect()
you can avoid this and just pass through again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cs_snip_add
resizes the store and increments nr_snips
by +1. We don't really want that in the dedup case.
Unless by "pass through" you meant passing through the return value ret
as is.
Also to note: I in general am a bit worried about performance here, so I should be clear I am not certain this will be merged yet. But also this should not be the regular case: we already do cheap "last dupe" checks up front. |
I did a bit of micro benchmarking a couple days ago. It's not super scientific but the observation are as follows:
These numbers are with the default of 1k max clips. All of this seems to suggest to me that the linear search is not the bottleneck but the The rest of the review all seems very reasonable. I will amend the patch and address them sometime tomorrow. Cheers. |
I think that sounds fine: my main concern was the nominal case (dupe at head), but that's already handled by is_possible_partial in a faster way, so that shouldn't present a problem. Thanks! |
fe4b794
to
e25c379
Compare
uses a linear search to find the duplicate and a memmove to move it over to the end.
not perticularly efficient since each snip is 256 bytes while we're only interested in the first 8. and so iterating over it linearly like this isn't very cache friendly. the memmove worst case can also be around 250KiB with the default config.
Closes: #224