-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read a parquet file error #108
Comments
both 'duckdb' and 'polars' can read the same file. sql.execute('SELECT max(id) a,min(id) b FROM t',eager=True)
shape: (1, 2)
┌──────────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i32 ┆ i32 │
╞══════════╪═════╡
│ 30000000 ┆ 1 │
└──────────┴─────┘ |
Interesting find. Can you share the file with us? |
it's too big(2.3GB), and I am trying to find a smaller exmaple. |
I got some info of the bad file import pyarrow.parquet as pq
parquet_file = 'c:/t/t.parquet'
>>> metadata = pq.ParquetFile(parquet_file).metadata
>>> print(metadata)
<pyarrow._parquet.FileMetaData object at 0x0000000004FE0AE0>
created_by: Arrow2 - Native Rust implementation of Arrow
num_columns: 83
num_rows: 5000000
num_row_groups: 95
format_version: 2.6
serialized_size: 518729 another good file info >>> metadata = pq.ParquetFile(parquet_file).metadata
>>> print(metadata)
<pyarrow._parquet.FileMetaData object at 0x000000000500AAE0>
created_by: DuckDB
num_columns: 6
num_rows: 50000000
num_row_groups: 499
format_version: 1.0
serialized_size: 275300 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
when count(*)
when fetch first few lines
The text was updated successfully, but these errors were encountered: