Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Precompute compressed lenghts of the training data #2

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rdno
Copy link

@rdno rdno commented Jan 14, 2024

Hi,

I came across this implementation. I had an idea to speed up the computations. I don't expect you to merge it.

By pre-computing and storing the compressed lengths of the training data, one deflate call can be avoided in ncd function. I've observed ~33% performance increase.

Great project.

Thanks.

By precomputing and storing the compressed lengths one deflate call
can be avoided in `ncd` function.
@gyreas
Copy link

gyreas commented Jan 28, 2024

What about precomputing the compressed lengths of the test data while keeping the original text around (same for the training data) as well? (Possibly pouring some threads for that.) So, the only final computation will happen in the combined. I'm not too familiar with C, but I used a similar albeit naive approach in Kotlin, which is pathetically slow.

[edit]
I poured actual threads, got ~5secs per test sample (still slow for me) using my suggestion. will try SIMD next

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants