Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[experimental] Pivot table improvements #669

Merged
merged 5 commits into from
Jun 29, 2023
Merged

[experimental] Pivot table improvements #669

merged 5 commits into from
Jun 29, 2023

Conversation

JanMarvin
Copy link
Owner

@JanMarvin JanMarvin commented Jun 27, 2023

  # install from this development branch
  remotes::install_github("JanMarvin/openxlsx2#669")
  
  library(openxlsx2)

  ## example code
  df <- data.frame(
    Plant = c("A", "C", "C", "B", "B", "C", "C", "C", "A", "C"),
    Location = c("E", "F", "E", "E", "F", "E", "E", "G", "E", "F"),
    Status = c("good", "good", "good", "good", "good", "good", "good", "good", "good", "bad"),
    Units = c(0.95, 0.95, 0.95, 0.95, 0.89, 0.89, 0.94, 0.94, 0.9, 0.9),
    stringsAsFactors = FALSE
  )

  ## Create the workbook and the pivot table
  wb <- wb_workbook()$
    add_worksheet("Data")$
    add_data(x = df, startCol = 1, startRow = 2)

  df <- wb_data(wb, 1, dims = "A2:D10")
  wb$
    add_pivot_table(df, dims = "A3", rows = "Plant",
                    filter = c("Location", "Status"), data = "Units")$
    add_pivot_table(df, dims = "A3", rows = "Plant",
                    filter = c("Location", "Status"), data = "Units",
                    param = list(numfmts = c(formatCode = "#,###0"), sort_row = "ascending"))$
    add_pivot_table(df, dims = "A3", rows = "Plant",
                    filter = c("Location", "Status"), data = "Units",
                    param = list(numfmts = c(numfmt = 10), sort_row = "descending"))

@JanMarvin
Copy link
Owner Author

@lauraearle515 you could have a look if this solves your issue. This provides a sort and a numfmt option. Most of the pivot arguments are not checked, due to the massive overhead this would create and because many are not testable in R only.

@lauraearle515
Copy link

@JanMarvin Thank you!!! This is amazing. Any chance there's a way to put the sort on the Sum of Units column instead of the Row Labels column?

@JanMarvin
Copy link
Owner Author

According to a famous advertising slogan nothing is impossible. But for now I don't want to research and implement it. It requires some kind of id field and maybe that's unique or always the same ... if your interested in this and want to research it a bit, please go ahead. You'd have to create a few pivot tables in Excel with various orders and read the auto sort... XML field from wb$pivotTables.

@JanMarvin
Copy link
Owner Author

If you are going to check these out, I'm interested in the following. (This is using a local example)

library(openxlsx2)
wb <- wb_load("/tmp/test.xlsx")
wb$pivotTables %>% 
  xml_node("pivotTableDefinition", "pivotFields", "pivotField") %>% 
  as_xml()
#> <pivotField axis="axisRow" showAll="0" sortType="ascending">
#>  <items count="4">
#>   <item x="0" />
#>   <item x="2" />
#>   <item x="1" />
#>   <item t="default" />
#>  </items>
#>  <autoSortScope>
#>   <pivotArea dataOnly="0" outline="0" fieldPosition="0">
#>    <references count="1">
#>     <reference field="4294967294" count="1" selected="0">
#>      <x v="0" />
#>     </reference>
#>    </references>
#>   </pivotArea>
#>  </autoSortScope>
#>  [...]
Screenshot 2023-06-28 at 18 51 14

I have no clue what this number here field="4294967294" is supposed to mean. The same is referenced in the ECMA office open documentation, but is it always this number? Is the number the same for columns? Maybe you can have a look and try to research a few cases where you tried various things. After all that is the really time intensive part: researching how something is supposed to work. And since I do not need this feature, I'm reluctant to invest my time 😄

@JanMarvin
Copy link
Owner Author

I've pushed another commit, maybe you can test this commit with your data:

  ## sort by column and row
  df <- mtcars

  ## Create the workbook and the pivot table
  wb <- wb_workbook()$
    add_worksheet("Data")$
    add_data(x = df, startCol = 1, startRow = 2)

  df <- wb_data(wb)
  wb$add_pivot_table(df, dims = "A3", rows = "cyl", cols = "gear",
                     data = c("vs", "am"), 
                     # sort table: first rows ascending and second columns descending
                     param = list(sort_row = 1, sort_col = -2))

@JanMarvin JanMarvin changed the title Gh issue 667 [experimental] Pivot table improvements Jun 29, 2023
@lauraearle515
Copy link

@JanMarvin sort_row = -1 did exactly what I needed -- thank you!! Your comment about field="4294967294" goes a little over my head, but I definitely owe you my life after all this help so I'll see if I can't figure anything out 😊

@JanMarvin
Copy link
Owner Author

Well, that is not even remotely true. All I do is develop a small software tool and try to be nice to strangers on the Internet. 😌

I was hoping that you could create a few examples in Excel an check them using the code snippet above, to see if the field="4294967294" condition is always true. I'm going to add another check and merge this with the main branch, so that it will make the cut for the next release, but I do not have time to test this, therefore please be (extra) careful and check the files you create.

@JanMarvin JanMarvin merged commit 7a37393 into main Jun 29, 2023
@JanMarvin JanMarvin deleted the gh_issue_667 branch June 29, 2023 19:11
@lauraearle515
Copy link

lauraearle515 commented Jun 29, 2023

@JanMarvin Every version I've tried so far has had that same number. This link says "Now that we picked the right pivot field we can start manipulate its sortation. The attribute sortType sets the sort direction. Its possible values are "ascending" and "descending", so we just going to set it to "descending". Next, the element AutoSortScope class handles the pivot table sorting scope. Then, the element PivotArea class defines what part of the pivot table to handle. Then, the element PivotAreaReferences class defines a set of referenced fields. This is where we are going to put a reference to the Revenue data field. The count attribute specifies how many references there are, so for us it is just "1". The element PivotAreaReference class defines the field reference and its field attribute is the index of that referenced field. However, if the referenced field is a data field, the attribute value must be set to -2. From the specifications in the MSDN page, you can read that the data type of the field attribute is an unsigned int. What is the conversion from int to unsigned int for a negative number? The calculation is ((2^32)-2) = 4294967296-2 = 4294967294. The value 4294967294 is what we put in the attribute to indicate the referenced field is a data field. The XML element FieldItem class defines an index of a field. The field index is set in the v attribute. We need to find the index of the data field among all the data fields. Since we have only one data field - Revenue - that index will simply be 0."

@JanMarvin
Copy link
Owner Author

Thanks! Likely sorting a second column or row will not be possible, but that's some confirmation that the currently implemented approach should be fine. Still it might require a guard that it applies only to the first column or row

@JanMarvin
Copy link
Owner Author

Oh, well maybe it works. Like I've said I don't really have time to test this. I have created a pull request to clean a few things up (previously we assigned a sort order to every pivotField, even though only one field was used for sorting. Maybe it's just my paranoia, but this way it looks a little cleaner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants