-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test polars support #826
Test polars support #826
Conversation
…ions for pd or pl
…drop() got an unexpected argument 'axis'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job, @TheooJ, on this PR! Here are some first remarks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Théo, I think we're almost there.
Could you replace:
px.__name__ == "polars"
by a new function in skrub._dataframe._namespace.py
:
def is_namespace_polars(px):
if "polars" not in sys.modules:
return False
import polars as pl
return px is pl
So that in your tests you would simply call:
if is_namespace_polars(px):
Let's also define a similar (but simpler) function for Pandas:
def is_namespace_pandas(px):
return px is pd
…xfail specific tests
out of curiosity in |
To stay consistent with what we already have (e.g., |
my question was not about using a named function vs inlining it, but rather about checking I was just interested to know if there is some situation you have encountered where comparing |
I saw similar patterns in scikit-learn to avoid weird behaviors like duck-typing. Happy to reverse if this brings unnecessary complexity, though. |
I saw similar patterns in scikit-learn to avoid weird behaviors like duck-typing. Happy to reverse if this brings unnecessary complexity, though.
I see something similar in utils.validation._is_pandas_df but in that case they need to check if the input is a pandas dataframe rather than if a module is pandas so it is a bit different.
btw in _is_pandas_df they read the module directly from sys.modules rather than re-importing
I don't really have a preference here I'm just trying to learn in which situations I shouldn't rely on modules' __name__ attribute
(as ATM we just use for testing though, I would be in favor of putting it in a private module like testing_utils and starting with the simplest implementation. In any case +1 for your suggestion to put it in a function so we can easily change it in one place only)
|
Let's apply this advice using |
@jeromedockes, for the sake of argument, this is also used in this exact form in the array API. The |
I'm rewriting the function as
in |
Please put it in a new file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @TheooJ !
maybe |
AFAICT this is also for checking a dataframe rather than a module directly? |
I think it discovers |
I think it discovers `test_*.py` or `*_test.py` files but not `_test_*.py` files
yes sorry I read too fast and missed the '_'!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks a lot @TheooJ ! it's super useful. and all those xfails show us what still needs to be done for polars support :)
from polars.testing import assert_frame_equal as assert_frame_equal_pl | ||
|
||
MODULES.append(pl) | ||
ASSERT_TUPLES.append((pl, assert_frame_equal_pl)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's leave it for another pr but I think we should reorganize this a bit to put this kind of setup either in the _test_utils.py
or in conftest.py
and avoid duplicating it for each test module
@TheooJ sorry about that but we now have a small git merge conflict. could you fix that and then I will merge it |
thanks a lot! |
To check a module directly, they use |
What is this PR addressing?
Testing polars inputs (closing issue #825)
Method
@pytest.mark.parametrize
with the two modules.assert_frame_equal
testing function, I’ve created lists of tuples to store both methods and call the needed one in the testing functions.Example
Comments/Discussion
→ None
, some didn’t. I removed all occurrences of these in the tests I’ve coveredpd.DataFrame([[1, 3], [2, 4]], columns=['a', 'b'])
. This method isn’t supported by Polars, so I’ve replaced it bypd.DataFrame({'col1': [1, 2], 'col2': [3, 4]})
when applicablepd.NA
. I usedpl.Null
for the Polars equivalentpl.DataFrames
don’t have areset_index()
method, and thatpl.DataFrame.drop()
doesn't have anaxis
argument