Excel precision #308

JanMarvin · 2022-08-25T08:31:28Z

See this SO

JanMarvin · 2022-09-12T20:32:19Z

Well, this quickly became more complex than I'd hoped 😦

> library(openxlsx2)
> # this is what I want to get
> chr <- "433256.529740090366"
> chr
[1] "433256.529740090366"
>  # from this number
> num <- 433256.529740090366
> num
[1] 433256.5

What I had assumed

We write this tiny data frame and the value is just this. We open the file in Excel. We save it. The value magically changes. Solution: Excel is to blame. Unfortunately this is not entirely correct.

What actually happens

> wb <- wb_workbook()$add_worksheet()$add_data(x = num)
> wb$worksheets[[1]]$sheet_data$cc$v
[1] "433256.52974009"

This is a bit strange, why does this happen? Because we push the number through as.character().

openxlsx2/R/write.R

Line 234 in f6677bf

data[sel] <- lapply(data[sel], as.character)

This is completely unaffected by openxlsx_options() and here the truncation happens. So far, so bad. This brings up the question, can this be circumvented? And the answer to this is unfortunately not quite simple. We can use e.g. format(num, nsmall = 12) to get the exact string representation of the number we initially added. But, and this is the onus, it will affect every value we push through. E.g. if we have integer values, they will be written as e.g. 1.000000.... And in a world with floating point stuff, all the other nice floats will create wonderful values.

What to do?

Hence, the least thing we should do, is document this behaviour in the package. Unless whoever writes to Excel, converts their values to strings with a specific precision, there is no way it will survive.

Another possible way would be to provide a "nsmall" option and use lapply(..., format, nsmall) when writing. I've added this in an example file here: From_R.xlsx After saving it in Excel on Mac it becomes this From_Excel.xlsx: 433256.52974009037.

What happens on the Excel side of things?

Luckily, Excel doesn't do any better. If I simply copy & paste num to a cell in Excel, the value written is also slightly off. It returns:

"433256.52974009002"

So there is some truncation going on in Excel as well (as stated by me in the initial SO). And I'm unable to return the final 2:

> wb$worksheets[[1]]$sheet_data$cc
   r row_r c_r c_s c_t c_cm c_ph c_vm                   v               f f_t f_ref f_ca f_si is
1 A1     1   A                         433256.52974009002                                       
2 B1     1   B   1                    4.3325652974009E+16 A1*100000000000

Now, when converting this value into a number, the following happens in R:

> options(scipen = 200)
> as.numeric("433256.52974009002") * 1e12
[1] 433256529740090048

Ergo we are unable to recreate Excels precision and we should document this as well.

jmbarbone · 2022-09-14T13:38:00Z

You know it's complicated when there's an entire Wikipedia article on it: Numeric precision in Microsoft Excel.

For data going into Excel, can we just supply how R interprets the values and allow Excel to perform its own truncation? If we're writing these as strings, maybe leveraging that format() approach is better for maintaining as much information as possible.

For data coming out of Excel, maybe we can take from utils::read.table() which leverages utils::type.convert(numerals = "no.loss") to read in numbers as strings. If users are adjusting/using numeric data from Excel they should take caution in precision differences between software. However, an option to not touch the numbers may be good if what they're doing in {openxlsx2} doesn't require it.

JanMarvin · 2022-09-14T14:10:44Z

Well reading is not affected by this. If all you want to do is adding another sheet, there is no conversion to numeric. The conversion only occurs, if we convert the cell character to numeric. So truncation would only occur, if a cell is replaced with a numeric. I tend to not change reading from xlsx and converting to R data frame. The part where we write numbers can use format, that should be simple to integrate. I can open a pull request soon™

JanMarvin · 2022-09-14T14:56:12Z

format(..., nsmall = ..., drop0trailing = TRUE) might be what we want.

JanMarvin · 2022-09-14T18:01:46Z

I've pushed a pull request that uses formats, though I'm not convinced, that it is the right solution. As can be seen, we blow up our floating point data quite a bit, even with shorter digits. The underlying issue ofc is that with floating points it's always the question how many digits are required and from what point on is it all just fictional.

After all we need to make sure that the more digits we write actually serve a benefit and that we are not only blowing file size up to pad us on the back for our great precision, while opening the door for potential other corner cases.

> options(digits = 22)
> #' Convert number to character
> #' @param x input integer or numeric
> #' @noRd
> as_character <- function(x) {
+   is_na <- is.na(x) & !is.nan(x)
+   z <- format(x, trim = TRUE, digits = options()$digits, format = "g")
+   if (any(is_na)) z[is_na] <- NA_character_
+   z
+ }
> # want a bunch of different digits?
> sprintf("%.20f", as.numeric(openxlsx2:::as_character(3123231.32455432123)))
[1] "3123231.32455432135611772537"
> openxlsx2:::as_character(3123231.32455432123)
[1] "3123231.324554320424795"

JanMarvin mentioned this issue Sep 9, 2022

Roadmap for release 0.3 #317

Closed

8 tasks

JanMarvin mentioned this issue Sep 14, 2022

use format instead of character when writing data to worksheet #328

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excel precision #308

Excel precision #308

JanMarvin commented Aug 25, 2022

JanMarvin commented Sep 12, 2022 •

edited

Loading

jmbarbone commented Sep 14, 2022

JanMarvin commented Sep 14, 2022

JanMarvin commented Sep 14, 2022

JanMarvin commented Sep 14, 2022

Excel precision #308

Excel precision #308

Comments

JanMarvin commented Aug 25, 2022

JanMarvin commented Sep 12, 2022 • edited Loading

What I had assumed

What actually happens

What to do?

What happens on the Excel side of things?

jmbarbone commented Sep 14, 2022

JanMarvin commented Sep 14, 2022

JanMarvin commented Sep 14, 2022

JanMarvin commented Sep 14, 2022

JanMarvin commented Sep 12, 2022 •

edited

Loading