-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug: .drop()
causing mismatch in RecordBatchReader schema
#8393
Comments
This issue does not appear to be present in the Pandas backend. |
Thanks for the issue. What version of pyarrow are you using? Copypasting (thank you!) your code works for me on 91f46b9 using:
|
❯ pip list | grep -e pyarrow -e ibis-framework -e duckdb
duckdb 0.9.2
duckdb-engine 0.9.2
ibis-framework 8.0.0
pyarrow 15.0.0
pyarrow-hotfix 0.4
❯ python minimal_example.py
a_very_very_very_very_very_very_very_very_very_very_very__1: int64
Traceback (most recent call last):
File "<REDACTED>/minimal_example.py", line 25, in <module>
assert record_batch_reader.schema == record_batch_reader.read_next_batch().schema
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError |
That commit also is working for me! So it looks like this has been fixed on |
Bisecting will be tough, there are a number of commits where the codebase was not in a state where any CI was being run due to the refactor coming from We did that deliberately to incrementally re-enable each backend in CI as we ported them to sqlglot. I will take a look at what's happening in 8.0.0 and see if I can track down the cause of the issue at least. |
I was able to reproduce this without Ibis and without an explicit PyArrow import, but not with DuckDB alone. This seems like an issue with import sqlalchemy as sa
long_name = "a_very_very_very_very_very_very_very_very_very_long_column_name"
eng = sa.create_engine("duckdb:///test.db")
with eng.begin() as con:
con.exec_driver_sql(f"CREATE OR REPLACE TABLE t ({long_name} INT)")
t = sa.Table("t", sa.MetaData(), autoload_with=eng)
t1 = sa.select(t.c[0]).select_from(t).subquery()
query = sa.select(t1.c[long_name])
with eng.begin() as con:
result = con.execute(query)
batch = result.connection.connection.fetch_record_batch()
assert batch.schema.names[0] == long_name, batch.schema.names[0] One of the things we've done between 8.0.0 and now is completely remove Ibis's explicit dependency on SQLAlchemy, so I'm not sure the backport is worth the effort. I'd rather not introduce a workaround for a problem that doesn't exist. Instead, we're happy to help get your code working against a specific commit, or you can use one of the pre-release wheels that we publish from |
@judahrand I filed Mause/duckdb_engine#910. |
What happened?
Calling
.drop()
on aTable
which contains a column name longer than 59 characters and then calling.to_pyarrow_batches()
results in a brokenRecordBatchReader
where the containedRecordBatch
es do not match the schema of theRecordBatchReader
due to column name truncations. This breaksRecordBatchReader.read_all()
.What is surprising, however, is that if the
.drop()
is the first method called afterconn.register()
the truncation does not seem to occur.What version of ibis are you using?
❯ pip list | grep -e ibis -e duckdb duckdb 0.10.0 duckdb-engine 0.9.2 ibis-framework 8.0.0
What backend(s) are you using, if any?
DuckDB
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: