Precompute compressed lenghts of the training data #2

rdno · 2024-01-14T21:06:30Z

Hi,

I came across this implementation. I had an idea to speed up the computations. I don't expect you to merge it.

By pre-computing and storing the compressed lengths of the training data, one deflate call can be avoided in ncd function. I've observed ~33% performance increase.

Great project.

Thanks.

By precomputing and storing the compressed lengths one deflate call can be avoided in `ncd` function.

gyreas · 2024-01-28T12:02:02Z

What about precomputing the compressed lengths of the test data while keeping the original text around (same for the training data) as well? (Possibly pouring some threads for that.) So, the only final computation will happen in the combined. I'm not too familiar with C, but I used a similar albeit naive approach in Kotlin, which is pathetically slow.

[edit]
I poured actual threads, got ~5secs per test sample (still slow for me) using my suggestion. will try SIMD next

Precompute compressed lenghts of the training data.

fb3504e

By precomputing and storing the compressed lengths one deflate call can be avoided in `ncd` function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precompute compressed lenghts of the training data #2

Precompute compressed lenghts of the training data #2

rdno commented Jan 14, 2024

gyreas commented Jan 28, 2024 •

edited

Loading

Precompute compressed lenghts of the training data #2

Are you sure you want to change the base?

Precompute compressed lenghts of the training data #2

Conversation

rdno commented Jan 14, 2024

gyreas commented Jan 28, 2024 • edited Loading

gyreas commented Jan 28, 2024 •

edited

Loading