HTML table for the view of Grouped Dataset/GatherBy Dataset in Jupyter Notebook #83

Kexi002 · 2022-09-07T08:28:01Z

Hi,

I found there is no HTML table display for the view Grouped Dataset/GatherBy Dataset, which is generated by groupby() or gatherby(). These views are still displayed in plain text, like:

So I think it will be a good idea to add an HTML show function for GroupBy and GatherBy. Since GroupBy and GatherBy do not belong to AbstractDataset, we cannot simply modify the show function of AbstractDataset, but its code could be reused.

It will be easy to fix this issue by passing the view(gds.parent, gds.perm, :) into an AbstractDataset show function's copy, whose summary message is modified. This is my improvement:

And it also works well with large Dataset:

(omitted rows)

The text was updated successfully, but these errors were encountered:

Kexi002 · 2022-09-07T08:29:14Z

I'm checking this solution for some details. In order to modify the displayed summary message, I need to add some new show functions. Should I place them in io.jl or groupby.jl/gatherby.jl?

sl-solution · 2022-09-07T09:51:54Z

It should be good to fix this issue.

I think, one way to fix this issue is to modify _show(io::IO, ::MIME"text/html", ds::AbstractDataset;...) in such away that it can accept a keyword argument which is the permutation of observations. By default, it can be 1:n (n is the number of rows), however, for GroupBy and GatherBy you can pass the observations permutation.

Kexi002 · 2022-09-07T10:47:40Z

Just note that for very long ds view(ds, perm, :) is very slow (it is PrettyTables.jl issue).

You mean Dataset with many rows? I made some test for large Dataset:

10,000,000 rows * 10 columns:

(omitted rows)

1,000,000 rows * 100 columns:

(omitted rows)

100,000 rows * 1,000 columns:

(omitted rows)

It seems like these large Dataset are displayed normally (By default displaing 30 rows). But once the entries of the Dataset exceeds 100,000,000, my program will crash. I'm not sure if my computer is running out of memory or it's a display issue.

I think, to fix this issue, you may modify _show(io::IO, ::MIME"text/html", ds::AbstractDataset;...) in such away that it can accept a keyword argument which is the permutation of observations. By default, it can be 1:n (n is the number of rows), however, for GroupBy and GatherBy you can pass the observations permutation.

Add a permutation argument might be a good idea, so we don't need to add additional show functions. I will try that.

My question is that I am worried that modifying the existing _show() function will affect many places. But if just copy a _show() function specifically for GroupBy, it would be less risky, but might add more code. How to make a trade-off?

Kexi002 · 2022-09-07T13:22:44Z

I just thought of another problem. If we pass a permutation argument, we can judge that this is a GroupBy/GatherBy, not an AbstractDataset, and print the correct summary message.

However, how can we distinguish GroupBy and GatherBy if we only pass a permutation argument? This would be a problem unless they all print the same summary message.

sl-solution · 2022-09-07T19:51:57Z

adding a keyword argument is one way, another one way can be modifying _show to work on permutations of observations. We have an internal function _get_perms that you can use for this purpose.

However, I think changing _show is better compared to adding a new similar function.

Kexi002 · 2022-09-13T08:20:26Z

Hi, I'm trying to add some functions for GroupBy/GatherBy, so the _show() function could be reused to output the HTML table for these two types.

I'm not sure how to implement eachcol() for GroupBy/GatherBy will be reasonable. The implementation of AbstractDataset eachcol() create a DatasetColumns for that data set, which just a structure contains the data set.

Since the permutation will affect the max widths of the first N rows of a column, I can't just use something like eachcol(ds.parent) to solve this problem. Could you give me any suggestions? @sl-solution

Kexi002 · 2022-09-14T07:04:47Z

Another question is, should we be able to access a GroupBy entry by gds[!, :x1] (exclamation mark)? Because actually GroupBy doesn't have a real original Dataset.

Should we be able to modify its parent Dataset by itself?

sl-solution · 2022-09-14T08:08:33Z

We should not define eachcol or getindex or setindex! for GroupBy/GatherBy. GroupBy and GatherBy are supposed to be consumed immediately and never being saved or modified, because, small changes on them invalidate their properties. So we like to define as less functions as possible for them.

sl-solution added the enhancement New feature or request label Sep 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HTML table for the view of Grouped Dataset/GatherBy Dataset in Jupyter Notebook #83

HTML table for the view of Grouped Dataset/GatherBy Dataset in Jupyter Notebook #83

Kexi002 commented Sep 7, 2022 •

edited

Loading

Kexi002 commented Sep 7, 2022 •

edited

Loading

sl-solution commented Sep 7, 2022 •

edited

Loading

Kexi002 commented Sep 7, 2022 •

edited

Loading

Kexi002 commented Sep 7, 2022 •

edited

Loading

sl-solution commented Sep 7, 2022

Kexi002 commented Sep 13, 2022 •

edited

Loading

Kexi002 commented Sep 14, 2022

sl-solution commented Sep 14, 2022

HTML table for the view of Grouped Dataset/GatherBy Dataset in Jupyter Notebook #83

HTML table for the view of Grouped Dataset/GatherBy Dataset in Jupyter Notebook #83

Comments

Kexi002 commented Sep 7, 2022 • edited Loading

Kexi002 commented Sep 7, 2022 • edited Loading

sl-solution commented Sep 7, 2022 • edited Loading

Kexi002 commented Sep 7, 2022 • edited Loading

Kexi002 commented Sep 7, 2022 • edited Loading

sl-solution commented Sep 7, 2022

Kexi002 commented Sep 13, 2022 • edited Loading

Kexi002 commented Sep 14, 2022

sl-solution commented Sep 14, 2022

Kexi002 commented Sep 7, 2022 •

edited

Loading

Kexi002 commented Sep 7, 2022 •

edited

Loading

sl-solution commented Sep 7, 2022 •

edited

Loading

Kexi002 commented Sep 7, 2022 •

edited

Loading

Kexi002 commented Sep 7, 2022 •

edited

Loading

Kexi002 commented Sep 13, 2022 •

edited

Loading