Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add pyarrow.Table support #487

Merged
merged 16 commits into from
Nov 25, 2024
5 changes: 3 additions & 2 deletions great_tables/_gt_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
copy_data,
create_empty_frame,
get_column_names,
_get_column_dtype,
n_rows,
to_list,
validate_frame,
Expand Down Expand Up @@ -173,7 +174,7 @@ def render_formats(self, data_tbl: TblData, formats: list[FormatInfo], context:
# TODO: I think that this is very inefficient with polars, so
# we could either accumulate results and set them per column, or
# could always use a pandas DataFrame inside Body?
_set_cell(self.body, row, col, result)
self.body = _set_cell(self.body, row, col, result)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spending time working through this quirk in the formatter 😬

What if we did this for now?:

  • if _set_cell returns something that is not None, we set it to self.body
  • We restore the original behavior of _set_cell returning None for pandas and polars, and mutating them (just to ensure this PR stays scoped to pyarrow).
  • After merging, I can open an issue to refactor this bit of code so it works columnwise (rather than on each individual row, col cell)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds a reasonable approach


return self

Expand Down Expand Up @@ -323,7 +324,7 @@ def align_from_data(self, data: TblData):
# a Pandas DataFrame or a Polars DataFrame
col_classes = []
for col in get_column_names(data):
dtype = data[col].dtype
dtype = _get_column_dtype(data, col)

if dtype == "object":
# Check whether all values in 'object' columns are strings that
Expand Down
Loading