Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor tbl data #9

Merged
merged 9 commits into from
Oct 12, 2023
Merged

Refactor tbl data #9

merged 9 commits into from
Oct 12, 2023

Conversation

machow
Copy link
Collaborator

@machow machow commented Oct 11, 2023

This PR works toward #7 by..

  • Copying in https://github.com/machow/databackend, to allow instance checks without importing pandas or polars
  • Removing the TblData class, in favor of generic functions (e.g. _tbl_data.n_rows)
  • Adding tests for the generic functions against pandas and polars

I noticed pandas DataFrames are created in some places (e.g. Spanners). It might be useful for us to keep the pandas DataFrame as frame used within some of these classes (at least for now?), just to make life simpler while getting all the different features out.

Happy to make any changes, discuss anything!

@machow
Copy link
Collaborator Author

machow commented Oct 12, 2023

I hope it's okay I lumped in a doc example fix, and added a test of fmt_number!

@machow machow marked this pull request as ready for review October 12, 2023 16:46
@machow
Copy link
Collaborator Author

machow commented Oct 12, 2023

@rich-iannone I think this is ready--happy to walk through while pairing tomorrow! I think it might be easier to merge this in (happy to add any tweaks, clarify anything), then implement / add tests for some of things in examples-qmd. I added tests of the dataframe generic functions, and we should know whether we need to tweak once we test some of the functions that call them

@@ -29,7 +29,7 @@ jobs:
if [ -f requirements-dev.txt ]; then pip install -r requirements-dev.txt; fi
- name: Install
run: |
make install
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what to add to requirements-dev.txt, so switched to installing the package with dev dependencies. Happy to tweak!

df = pd.DataFrame({"x": [1.234, 2.345], "y": [3.456, 4.567]})
gt = GT(df).fmt_number(columns="x", decimals=2)

# TODO: is 2.35 below the intended result?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure if gt deliberately rounds, so that 2.345 rounded is 2.35

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if it doesn't through the formatter we should deliberately pre-round before formatting (then we could expose the rounding method too).

@rich-iannone
Copy link
Member

This looks great! Just going to review, approve, and you can merge in whenever!

@rich-iannone rich-iannone self-requested a review October 12, 2023 16:52
Copy link
Member

@rich-iannone rich-iannone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

# TODO: I think that this is very inefficient with polars, so
# we could either accumulate results and set them per column, or
# could always use a pandas DataFrame inside Body?
_set_cell(self.body, row, col, result)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting individual cells is inefficient in polars (I think it sets a whole column copy), and it looks like they're not planning on changing that (since they're more analytic / column oriented).

Shouldn't be an issue for prototyping, but wanted to flag

@machow machow merged commit 4c8677d into main Oct 12, 2023
4 checks passed
@machow
Copy link
Collaborator Author

machow commented Oct 12, 2023

Ahhh yeeeeaaahhhhhh!!!

@rich-iannone rich-iannone deleted the refactor-tbl-data branch October 12, 2023 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants