Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull table notes from tex glossary files and yaml #326

Merged
merged 30 commits into from
May 31, 2024
Merged

Pull table notes from tex glossary files and yaml #326

merged 30 commits into from
May 31, 2024

Conversation

kylebaron
Copy link
Contributor

@kylebaron kylebaron commented Feb 15, 2024

Summary

The PR adds functionality to create table notes from a tex glossary file.

An entry in a glossary file looks like this

\newacronym{label}{abbreviation}{definition}

The basic idea is to read and parse the glossary file to create a string of the form label: definition. Likely multiple terms will be selected in which the string will take the form label1: definition1; label2: definition2.

The PR also implements ability to read from glossary info in yaml-formatted file.

Objects

  • glossary - a list of glossary entries; the names of the list are the glossary labels
  • glossary_entry - a list containing the abbreviation and the definition

Functions

  • read_glossary() reads and parses the glossary file; returns a glossary object
  • glossary_notes() takes in the glossary file name or a glossary list as well as labels to select and returns a character vector that can be added to a table via st_notes()
  • st_notes_glo() takes a glossary list (from read_glossary()) and labels to select and adds the notes in a table pipeline
  • as_glossary() coerce a list to a glossary object
  • update_abbrev() - you can update the abbreviation for any entry (but can't change the label or the definition)

Reprex

library(reprex)
library(pmtables)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

Read in a glossary file

We can read from a .tex glossary file

glofile <- system.file("glo", "glossary.tex", package = "pmtables")

g <- read_glossary(glofile)

or a yaml file

gloyaml <- system.file("glo", "glossary.yaml", package = "pmtables")

y <- read_glossary(gloyaml)

The result is a glossary object

y
#> egfr : estimated glomerular filtration rate
#> bmi  : body mass index
#> wt   : weight
#> ht   : height
#> cmax : maximum concentration in the dosing inte...
#> cmin : minimum concentration in the dosing inte...
#> auc  : area under the concentration time curve

yaml format

The yaml file needs to be written to be read by yaml_as_df(); the
outer level is the labels and inner are the abbreviations and
definitions

cat(readLines(gloyaml), sep = "\n")
#> egfr:
#>   def: estimated glomerular filtration rate
#>   abb: eGFR
#> bmi:
#>   def: body mass index
#>   abb: BMI
#> wt:
#>   def: weight
#>   abb: WT
#> ht:
#>   abb: HT
#>   def: height
#> cmax:
#>   def: maximum concentration in the dosing interval
#>   abb: Cmax
#> cmin:
#>   def: minimum concentration in the dosing interval
#>   abb: Cmin
#> auc:
#>   abb: AUC
#>   def: area under the concentration time curve

Create a glossary object

x <- as_glossary(c = "cat", d = "dog", s = "snake")
x
#> c : cat
#> d : dog
#> s : snake

By default, the abbreviation is taken to be the label

Update abbreviation

x <- update_abbrev(x, s = "SNAKE")
x$s
#> snake (SNAKE)

Work with glossary object

A lot of this is driven by need to potentially combine glossary objects
or look into an object to see what is in there. I would have rather stuck
with a simpler data structure, but it needed to be more complex and I added
this functionality.

Extract and print

g[1:10]
#> ADA    : anti-drug antibodies
#> AE     : adverse event
#> AIC    : Akaike information criterion
#> ALAG   : oral absorption lag time
#> ASCII  : American Standard Code for Information I...
#> AST    : aspartate transaminase
#> AUC    : area under the concentration-time curve
#> AUCss  : area under the concentration-time curve ...
#> AUCC   : cumulative area under the concentration-...
#> AUCC50 : area under the concentration-time curve ...

g$WT
#> subject weight (WT)

Head

head(g)
#> ADA   : anti-drug antibodies
#> AE    : adverse event
#> AIC   : Akaike information criterion
#> ALAG  : oral absorption lag time
#> ASCII : American Standard Code for Information I...
#> AST   : aspartate transaminase

Select

g2 <- select_glossary(g, AIC, AST, ADA)

Combine

g3 <- c(g2, y)

Coerce

data frame

as.data.frame(g3)
#>    label                                   definition abbreviation
#> 1    AIC                 Akaike information criterion          AIC
#> 2    AST                       aspartate transaminase          AST
#> 3    ADA                         anti-drug antibodies          ADA
#> 4   egfr         estimated glomerular filtration rate         eGFR
#> 5    bmi                              body mass index          BMI
#> 6     wt                                       weight           WT
#> 7     ht                                       height           HT
#> 8   cmax maximum concentration in the dosing interval         Cmax
#> 9   cmin minimum concentration in the dosing interval         Cmin
#> 10   auc      area under the concentration time curve          AUC

list

as.list(g3[1:2])
#> $AIC
#> $AIC$abbreviation
#> [1] "AIC"
#> 
#> $AIC$definition
#> [1] "Akaike information criterion"
#> 
#> 
#> $AST
#> $AST$abbreviation
#> [1] "AST"
#> 
#> $AST$definition
#> [1] "aspartate transaminase"

Create notes

With a subset (expected most of the time)

glossary_notes(g3, AIC, wt, auc) 
#> [1] "AIC: Akaike information criterion; WT: weight; AUC: area under the concentration time curve"

With all entries

glossary_notes(g3)
#> [1] "AIC: Akaike information criterion; AST: aspartate transaminase; ADA: anti-drug antibodies; eGFR: estimated glomerular filtration rate; BMI: body mass index; WT: weight; HT: height; Cmax: maximum concentration in the dosing interval; Cmin: minimum concentration in the dosing interval; AUC: area under the concentration time curve"

In a pipeline

stdata() %>% 
  st_new() %>% 
  st_notes_glo(g3, AIC, wt, auc, width = 1) %>% 
  stable() %>% 
  st_as_image()

Alternatively

notes <- glossary_notes(g3, ht, wt, bmi)
stdata() %>% 
  st_new() %>% 
  st_notes(notes) %>% 
  st_panel("STUDY") %>% 
  stable() %>% 
  st_as_image()

Pass in names

labels <- c("AIC", "AST")
stdata() %>% 
  st_new() %>% 
  st_notes_glo(g3, labels = labels) %>% 
  stable() %>% 
  st_as_image()

Created on 2024-05-23 with reprex v2.0.2

@kylebaron kylebaron requested review from kyleam and KatherineKayMRG and removed request for kyleam and KatherineKayMRG February 15, 2024 19:28
@kylebaron kylebaron marked this pull request as draft February 15, 2024 19:28
@kylebaron kylebaron marked this pull request as ready for review February 15, 2024 19:44
@KatherineKayMRG
Copy link
Collaborator

@kylebaron I have a couple of questions about what this is doing (and if it's what we want). Above you say:

An entry in a glossary file looks like this

\newacronym{label}{abbreviation}{definition}

The basic idea is to read and parse the glossary file to create a string of the form label: definition. Likely multiple terms will be selected in which the string will take the form label1: definition1; label2: definition2.

So it's the information in the {label}{definition} that will go in the footer right?

Can these functions handle cases where you might was the {abbreviation}{definition} combo? Maybe as an optional extra. For example, the CV% that you may have in the table usually uses CVP in the glossary:

\newacronym{CVP}{CV\%}{percent coefficient of variation}

I just had a similar case on a project where tables used FAPα but we couldn't use the greek letter in the glossary label, so the label was FAPa and the abbreviation included the greek letter

It would be nice to be able to specify whether label or abbreviation get used.

Different question, will glossary entries like this (below) mess with your current code?

\newacronym[sort=f]{F}{\ensuremath{F}}{absolute bioavailability}

@kylebaron
Copy link
Contributor Author

Hey @KatherineKayMRG -

Good points. I think we want to refer to to the label, but agree it would be better to put the abbreviation in there. I can make that happen.

Different question, will glossary entries like this (below) mess with your current code?

\newacronym[sort=f]{F}{\ensuremath{F}}{absolute bioavailability}

I think the code handles this as it is.

@kylebaron kylebaron marked this pull request as draft February 15, 2024 21:41
@kylebaron
Copy link
Contributor Author

I'm going to refactor this ... will be more complicated but I think there is a need.

@kylebaron kylebaron changed the title Pull table notes form tex glossary files Pull table notes from tex glossary files and yaml May 21, 2024
@kylebaron kylebaron marked this pull request as ready for review May 22, 2024 04:13
R/glossary.R Outdated Show resolved Hide resolved
R/glossary.R Outdated Show resolved Hide resolved
R/glossary.R Outdated Show resolved Hide resolved
R/glossary.R Outdated Show resolved Hide resolved
R/glossary.R Outdated Show resolved Hide resolved
R/glossary.R Show resolved Hide resolved
R/glossary.R Outdated Show resolved Hide resolved
R/glossary.R Outdated Show resolved Hide resolved
R/glossary.R Outdated Show resolved Hide resolved
tests/testthat/test-glossary.R Outdated Show resolved Hide resolved
@kylebaron kylebaron requested a review from kyleam May 31, 2024 12:42
@kylebaron
Copy link
Contributor Author

@kyleam - I think I addressed everything here, adding tests where I missed badly on some of the implementation. But let me know if I didn't get one of the comments right or if I just overlooked anything.

Copy link
Collaborator

@KatherineKayMRG KatherineKayMRG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a user perspective, this looks good to me and I'm looking forward to using it. No more questions from me. I'll leave the code review to Kyle M.

@kylebaron
Copy link
Contributor Author

Thanks, @KatherineKayMRG. It sounds like @timwaterhouse is going to give this a spin on upcoming project and we can tweak and adjust some things from there.

@KatherineKayMRG
Copy link
Collaborator

@kylebaron - that project of @timwaterhouse's is the one I was mentioning on slack. I've been reworking their reports to use a shared glossary file and I'm looking forward to trying out this functionality on that project.

Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I have one follow-up comment about a case where print.glossary still errors, but otherwise this looks ready to go.

(I see there are merge conflicts, but those are just in the generated coverage reports.)

return(invisible(NULL))
}
label <- names(x)
def <- map_chr(x, "definition")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case I mentioned here will still fail:

as_glossary(ss = "steady state")[2]
#> Error in `map_chr()` at pmtables/R/glossary.R:311:3:
#> ℹ In index: 1.
#> Caused by error:
#> ! Result must be length 1, not 0.
#> Run `rlang::last_trace()` to see where the error occurred

The length guard won't catch that because of this named-list behavior:

x <- list(foo = "foo")
x[2]
#> $<NA>
#> NULL

length(x[2])
#> [1] 1

How about this guard instead?

diff --git a/R/glossary.R b/R/glossary.R
index 76fc618..3016dc9 100644
--- a/R/glossary.R
+++ b/R/glossary.R
@@ -303,8 +303,8 @@ print.glossary_entry <- function(x, ...) {

 #' @export
 print.glossary <- function(x, ...) {
-  if(!length(x)) {
-    cat("No glossary entries found.")
+  if(is.null(unlist(x))) {
+    cat("No glossary entries found.\n")
     return(invisible(NULL))
   }
   label <- names(x)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks; that looks like it works. Added tests for zero-length and mis-indexed objects.

@kylebaron kylebaron merged commit 4cde219 into main May 31, 2024
4 checks passed
@kylebaron kylebaron deleted the glossary branch May 31, 2024 14:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants