Weighted version of the KLL sketch? #157

thvasilo · 2020-06-09T16:33:49Z

Hello,

There's a consideration at XGBoost about potentially using the KLL sketch to represent feature value histograms.

One potential blocker is the need for a weighted version of the sketch, this would allow us to use data points that are weighted, and adjust their feature contributions accordingly (See Appendix A of XGBoost paper).

I remember discussing in the past the possibility of using data weights with KLL, is that still an option?

leerho · 2020-06-09T17:48:37Z

The only option we have thought about would be restricted to positive integer weights and >= 1.

My interpretation of weights in a quantiles context is that an item with a weight of 2 is equivalent to updating the sketch with two identical items with a weight of 1. Is this your understanding?

thvasilo · 2020-06-09T18:15:47Z

Yes, I think the outcome would be the same in this case. I think for this to be used in XGBoost it would require real-valued weights.

I found the paper that I had in mind when talking to Zohar Karnin a couple of years ago, that includes a weighted extension for KLL (Section 4).

I'll ping @trivialfis in case he wants to chime in.

DanielTing · 2020-06-10T17:29:25Z

@thvasilo: Is it possible to assume that all the weights in the data sum up to at least 1? (or some other known constant.) There are no assumptions on individual weights or on weights being integral here, but it does mean you need to know something about the overall scaling of the weights.

There are several possible implementations that incorporate weighting. If you can assume this, then you get the simplest (and perhaps most performant) implementation, but it's certainly possible to handle weights without this assumption as well.

thvasilo · 2020-06-10T17:57:40Z

@DanielTing The XGBoost use-case would be for batch training, so we can assume to know all weights in advance and normalize them so they sum to 1.

I'm having a chat with the first author of the linked paper tomorrow, I'll update with any progress. I plan to work on this codebase so I hope we can come up with something we can contribute back.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weighted version of the KLL sketch? #157

Weighted version of the KLL sketch? #157

thvasilo commented Jun 9, 2020

leerho commented Jun 9, 2020

thvasilo commented Jun 9, 2020

DanielTing commented Jun 10, 2020

thvasilo commented Jun 10, 2020

Weighted version of the KLL sketch? #157

Weighted version of the KLL sketch? #157

Comments

thvasilo commented Jun 9, 2020

leerho commented Jun 9, 2020

thvasilo commented Jun 9, 2020

DanielTing commented Jun 10, 2020

thvasilo commented Jun 10, 2020