You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have written about this issue already in various places of the openxlsx issue tracker and probably in other places and this issue is just to raise awareness and not something to solve or something I'm going to work on.
The reason why formulas are not evaluated is, because there is nothing to evaluate them.
"But Excel shows me number if I open a file with formulas?" "Yes."
"And for some formulas wb_to_df() also shows values?" "Yes again."
Why is this the case?
The XML file structure of a formula written by us looks like this:
<c ...>
<f>SUM(A1:B1)</f>
</c>
Here we have a cell with a simple formula: SUM(A1:B1). If A1 is 1 and B1 is 1 the formula will evaluate 2. In your spreadsheet software you can see the value 2. If you save the file to disk and load it again, the XML structure of the cell has changed. Now it looks like this:
<c ...>
<f>SUM(A1:B1)</f>
<v>2</v>
</c>
It contains a value field now and this value is 2. Now if we print from openxlsx2 we also get a value 2. But only because this value is now written into the cell. It is somehow cached into the cell. And this might actually become an issue. Now image if you replace the cell A1 with a value of -1. Now your cell still prints 2 with wb_to_df(), but is not evaluated, because it should actually show a value of 0.
But why is the value not updated?
That is because, something - like your spreadsheet software - needs some way to evalute the formula in the cell with actual data in the spreadsheet. And this is somewhat simple with basic formulas A1 + A2 and SUM(A1:A2), but it will get tricky if you ever try to implement something like nested VLOOKUP() / CHOOSE() functions, functions where R and and your spreadsheet software produce different results and last but not least, anybody has to program all/many/most/at least a few of the functions and their quirks into something we can evaluate in R. And I do not want to do this and I do not want to be the one who does this and if you want me to be the one, lets talk about my salary ..., but good news!
You could be the one!
It is not really hard to begin with. First, most likely you do not want EVERY function (this could take a while, because there are a lot), but instead focus on only a few. Second, begin with something easy and not some overly complex 300 lines of nested INDEX() functions and maybe begin with something like this. The idea is get the sheet, reference and operators from the formula and evaluate everything in some R formula just like base R functions.
Draft code
Here I use spreadsheet formulas using + and SUM() and my code matches them with Rs + and sum(). Similar I could use MIN()/MAX() and other functions that have identical names. For AVERAGE() I might need something like average <- function(...) mean(...) and other more complex functions might require actual custom written functions:
library(openxlsx2)
dat<-data.frame(
num1=1,
num2=2,
fml1="A2 + B2",
fml2="SUM(A2, B2)",
fml3="'Sheet 1'!A2 + B2",
fml4="FST + B2"
)
class(dat$fml1) <-"formula"
class(dat$fml2) <-"formula"
class(dat$fml3) <-"formula"
class(dat$fml4) <-"formula"wb<- wb_workbook()$add_worksheet()$add_data(x=dat)
wb$add_named_region(dims="A2", name="FST")
# wb$open()dat<-data.frame(
x=0:10,
y=-5:5
)
dat_fml<-data.frame(
x="SUM(A2:A12)",
y="SUM(B2:B12)"
)
class(dat_fml$x) <-"formula"
class(dat_fml$y) <-"formula"wb$add_worksheet()$add_data(x=dat)$add_data(x=dat_fml, dims="A14")
wb_eval_excel_fml<-function(wb, sheet, dims) {
# example function that works with a tiny subset of formulas# if you use this function in production you are braver or# desparater than you look.wb<-wb$clone()
openxlsx2:::assert_workbook(wb)
sheetid<-wb$validate_sheet(sheet)
fmls<- as.character(wb$to_df(dims=dims, sheet=sheetid, show_formula=TRUE, col_names=FALSE))
message("Input formula is: ", fmls)
tkns<-tidyxl::xlex(fmls)
sel<-tkns$type=="ref"vars<-tkns$token[sel]
rnd<-openxlsx2:::random_string(n= length(vars), pattern="[a-z]")
sheets<-wb$get_sheet_names()[sheetid]
sheets<- rep(sheets, length(vars))
if (any(sel2<-tkns$type=="sheet")) {
shts<-tkns$token[sel2]
sel3<- which(tkns$type=="ref") %in% ( which(tkns$type=="sheet") +1)
sheets[sel3] <-stringi::stri_extract_first_regex(shts, "([^']+)")
# remove this from our formulatkns<- as.data.frame(tkns)
tkns<-tkns[!sel2, ]
sel<-tkns$type=="ref"
}
fml_env<- new.env()
for (iin seq_along(vars)) {
num<- wb_to_df(wb, dims=vars[i], sheet=sheets[i], col_names=FALSE)
assign(rnd[i], num, fml_env)
tkns$token[sel][i] <-rnd[i]
message("Var ", vars[i], " is: ", num)
}
lwr_fmls<- parse(text= tolower(paste0(tkns$token, collapse="")))
res<- as.numeric(eval(lwr_fmls, envir=fml_env))
message("Result: ", res)
sel<-wb$worksheets[[sheetid]]$sheet_data$cc$r==dimswb$worksheets[[sheetid]]$sheet_data$cc$v[sel] <-reswb
}
# wb$open()
wb_to_df(wb)
#> num1 num2 fml1 fml2 fml3 fml4#> 2 1 2 NA NA NA NAwb<- wb_eval_excel_fml(wb, sheet=1, dims="C2")
#> Input formula is: A2 + B2#> Var A2 is: 1#> Var B2 is: 2#> Result: 3wb<- wb_eval_excel_fml(wb, sheet=1, dims="D2")
#> Input formula is: SUM(A2, B2)#> Var A2 is: 1#> Var B2 is: 2#> Result: 3wb<- wb_eval_excel_fml(wb, sheet=1, dims="E2")
#> Input formula is: 'Sheet 1'!A2 + B2#> Var A2 is: 1#> Var B2 is: 2#> Result: 3# # one of many cases that does not work# wb <- wb_eval_excel_fml(wb, sheet = 1, dims = "F2")wb<- wb_eval_excel_fml(wb, sheet=2, dims="A15")
#> Input formula is: SUM(A2:A12)#> Var A2:A12 is: c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)#> Result: 55wb<- wb_eval_excel_fml(wb, sheet=2, dims="B15")
#> Input formula is: SUM(B2:B12)#> Var B2:B12 is: c(-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5)#> Result: 0
wb_to_df(wb, show_formula=TRUE)
#> num1 num2 fml1 fml2 fml3 fml4#> 2 1 2 A2 + B2 SUM(A2, B2) 'Sheet 1'!A2 + B2 FST + B2
wb_to_df(wb)
#> num1 num2 fml1 fml2 fml3 fml4#> 2 1 2 3 3 3 NA
wb_to_df(wb, sheet=2, show_formula=TRUE) |> tail()
#> x y#> 10 8 3#> 11 9 4#> 12 10 5#> 13 <NA> <NA>#> 14 x y#> 15 SUM(A2:A12) SUM(B2:B12)
wb_to_df(wb, sheet=2) |> tail()
#> x y#> 10 8 3#> 11 9 4#> 12 10 5#> 13 <NA> <NA>#> 14 x y#> 15 55 0
The text was updated successfully, but these errors were encountered:
I have written about this issue already in various places of the
openxlsx
issue tracker and probably in other places and this issue is just to raise awareness and not something to solve or something I'm going to work on.The reason why formulas are not evaluated is, because there is nothing to evaluate them.
wb_to_df()
also shows values?" "Yes again."Why is this the case?
The XML file structure of a formula written by us looks like this:
Here we have a cell with a simple formula:
SUM(A1:B1)
. If A1 is1
and B1 is1
the formula will evaluate 2. In your spreadsheet software you can see the value2
. If you save the file to disk and load it again, the XML structure of the cell has changed. Now it looks like this:It contains a value field now and this value is
2
. Now if we print fromopenxlsx2
we also get a value2
. But only because this value is now written into the cell. It is somehow cached into the cell. And this might actually become an issue. Now image if you replace the cell A1 with a value of-1
. Now your cell still prints2
withwb_to_df()
, but is not evaluated, because it should actually show a value of0
.But why is the value not updated?
That is because, something - like your spreadsheet software - needs some way to evalute the formula in the cell with actual data in the spreadsheet. And this is somewhat simple with basic formulas
A1 + A2
andSUM(A1:A2)
, but it will get tricky if you ever try to implement something like nestedVLOOKUP()
/CHOOSE()
functions, functions where R and and your spreadsheet software produce different results and last but not least, anybody has to program all/many/most/at least a few of the functions and their quirks into something we can evaluate in R. And I do not want to do this and I do not want to be the one who does this and if you want me to be the one, lets talk about my salary ..., but good news!You could be the one!
It is not really hard to begin with. First, most likely you do not want EVERY function (this could take a while, because there are a lot), but instead focus on only a few. Second, begin with something easy and not some overly complex 300 lines of nested INDEX() functions and maybe begin with something like this. The idea is get the sheet, reference and operators from the formula and evaluate everything in some R formula just like base R functions.
Draft code
Here I use spreadsheet formulas using
+
andSUM()
and my code matches them with Rs+
andsum()
. Similar I could useMIN()
/MAX()
and other functions that have identical names. ForAVERAGE()
I might need something likeaverage <- function(...) mean(...)
and other more complex functions might require actual custom written functions:The text was updated successfully, but these errors were encountered: