Replies: 1 comment 1 reply
-
I'm not 100% sure how to attack this issue, but I think this can be lazily implemented using a column, see: which is used for vaex.vrange, and several other places. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi! Thanks for your help.
So I currently have a need for "making the cartesian product, elementwise, of two columns of lists", but I think the general case of this is if I need to transform a DF in a way that modifies the number of rows. I have something working, but only for DFs that fit in memory:
Starting with:
My goal is to get:
Here is an implementation that works:
So the issue here is when I have to call the Expression.evaluate(), and the entire result is materialized. I can't add that expression back into the original dataframe, because the cartesian_product() increases the number of rows. I think this also is a problem if rows are thrown out, or if rows are transposed, or basically if there isn't a 1:1 mapping of rows in input to output.
Is there a way to create a new DF by evaluating the expression in chunks? I don't want to, but I could evaluate it in chunks and stream the results to a file using pyarrow, and then read this file back in with vaex. Seems like a vaex.from_expression() constructor could do this? But I'm assuming that this doesn't exist because it feels against the design principles, and would encourage people to greedily evaluate expressions all the time, which is not the style of vaex?
Beta Was this translation helpful? Give feedback.
All reactions