Pull table notes from tex glossary files and yaml #326

kylebaron · 2024-02-15T17:13:06Z

Summary

The PR adds functionality to create table notes from a tex glossary file.

An entry in a glossary file looks like this

\newacronym{label}{abbreviation}{definition}

The basic idea is to read and parse the glossary file to create a string of the form label: definition. Likely multiple terms will be selected in which the string will take the form label1: definition1; label2: definition2.

The PR also implements ability to read from glossary info in yaml-formatted file.

Objects

glossary - a list of glossary entries; the names of the list are the glossary labels
glossary_entry - a list containing the abbreviation and the definition

Functions

read_glossary() reads and parses the glossary file; returns a glossary object
glossary_notes() takes in the glossary file name or a glossary list as well as labels to select and returns a character vector that can be added to a table via st_notes()
st_notes_glo() takes a glossary list (from read_glossary()) and labels to select and adds the notes in a table pipeline
as_glossary() coerce a list to a glossary object
update_abbrev() - you can update the abbreviation for any entry (but can't change the label or the definition)

Reprex

library(reprex)
library(pmtables)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

Read in a glossary file

We can read from a .tex glossary file

glofile <- system.file("glo", "glossary.tex", package = "pmtables")

g <- read_glossary(glofile)

or a yaml file

gloyaml <- system.file("glo", "glossary.yaml", package = "pmtables")

y <- read_glossary(gloyaml)

The result is a glossary object

y
#> egfr : estimated glomerular filtration rate
#> bmi  : body mass index
#> wt   : weight
#> ht   : height
#> cmax : maximum concentration in the dosing inte...
#> cmin : minimum concentration in the dosing inte...
#> auc  : area under the concentration time curve

yaml format

The yaml file needs to be written to be read by yaml_as_df(); the
outer level is the labels and inner are the abbreviations and
definitions

cat(readLines(gloyaml), sep = "\n")
#> egfr:
#>   def: estimated glomerular filtration rate
#>   abb: eGFR
#> bmi:
#>   def: body mass index
#>   abb: BMI
#> wt:
#>   def: weight
#>   abb: WT
#> ht:
#>   abb: HT
#>   def: height
#> cmax:
#>   def: maximum concentration in the dosing interval
#>   abb: Cmax
#> cmin:
#>   def: minimum concentration in the dosing interval
#>   abb: Cmin
#> auc:
#>   abb: AUC
#>   def: area under the concentration time curve

Create a glossary object

x <- as_glossary(c = "cat", d = "dog", s = "snake")
x
#> c : cat
#> d : dog
#> s : snake

By default, the abbreviation is taken to be the label

Update abbreviation

x <- update_abbrev(x, s = "SNAKE")
x$s
#> snake (SNAKE)

Work with glossary object

A lot of this is driven by need to potentially combine glossary objects
or look into an object to see what is in there. I would have rather stuck
with a simpler data structure, but it needed to be more complex and I added
this functionality.

Extract and print

g[1:10]
#> ADA    : anti-drug antibodies
#> AE     : adverse event
#> AIC    : Akaike information criterion
#> ALAG   : oral absorption lag time
#> ASCII  : American Standard Code for Information I...
#> AST    : aspartate transaminase
#> AUC    : area under the concentration-time curve
#> AUCss  : area under the concentration-time curve ...
#> AUCC   : cumulative area under the concentration-...
#> AUCC50 : area under the concentration-time curve ...

g$WT
#> subject weight (WT)

Head

head(g)
#> ADA   : anti-drug antibodies
#> AE    : adverse event
#> AIC   : Akaike information criterion
#> ALAG  : oral absorption lag time
#> ASCII : American Standard Code for Information I...
#> AST   : aspartate transaminase

Select

g2 <- select_glossary(g, AIC, AST, ADA)

Combine

g3 <- c(g2, y)

Coerce

data frame

as.data.frame(g3)
#>    label                                   definition abbreviation
#> 1    AIC                 Akaike information criterion          AIC
#> 2    AST                       aspartate transaminase          AST
#> 3    ADA                         anti-drug antibodies          ADA
#> 4   egfr         estimated glomerular filtration rate         eGFR
#> 5    bmi                              body mass index          BMI
#> 6     wt                                       weight           WT
#> 7     ht                                       height           HT
#> 8   cmax maximum concentration in the dosing interval         Cmax
#> 9   cmin minimum concentration in the dosing interval         Cmin
#> 10   auc      area under the concentration time curve          AUC

list

as.list(g3[1:2])
#> $AIC
#> $AIC$abbreviation
#> [1] "AIC"
#> 
#> $AIC$definition
#> [1] "Akaike information criterion"
#> 
#> 
#> $AST
#> $AST$abbreviation
#> [1] "AST"
#> 
#> $AST$definition
#> [1] "aspartate transaminase"

Create notes

With a subset (expected most of the time)

glossary_notes(g3, AIC, wt, auc) 
#> [1] "AIC: Akaike information criterion; WT: weight; AUC: area under the concentration time curve"

With all entries

glossary_notes(g3)
#> [1] "AIC: Akaike information criterion; AST: aspartate transaminase; ADA: anti-drug antibodies; eGFR: estimated glomerular filtration rate; BMI: body mass index; WT: weight; HT: height; Cmax: maximum concentration in the dosing interval; Cmin: minimum concentration in the dosing interval; AUC: area under the concentration time curve"

In a pipeline

stdata() %>% 
  st_new() %>% 
  st_notes_glo(g3, AIC, wt, auc, width = 1) %>% 
  stable() %>% 
  st_as_image()

Alternatively

notes <- glossary_notes(g3, ht, wt, bmi)
stdata() %>% 
  st_new() %>% 
  st_notes(notes) %>% 
  st_panel("STUDY") %>% 
  stable() %>% 
  st_as_image()

Pass in names

labels <- c("AIC", "AST")
stdata() %>% 
  st_new() %>% 
  st_notes_glo(g3, labels = labels) %>% 
  stable() %>% 
  st_as_image()

^{Created on 2024-05-23 with reprex v2.0.2}

KatherineKayMRG · 2024-02-15T21:25:58Z

@kylebaron I have a couple of questions about what this is doing (and if it's what we want). Above you say:

An entry in a glossary file looks like this

\newacronym{label}{abbreviation}{definition}

The basic idea is to read and parse the glossary file to create a string of the form label: definition. Likely multiple terms will be selected in which the string will take the form label1: definition1; label2: definition2.

So it's the information in the {label}{definition} that will go in the footer right?

Can these functions handle cases where you might was the {abbreviation}{definition} combo? Maybe as an optional extra. For example, the CV% that you may have in the table usually uses CVP in the glossary:

\newacronym{CVP}{CV\%}{percent coefficient of variation}

I just had a similar case on a project where tables used FAPα but we couldn't use the greek letter in the glossary label, so the label was FAPa and the abbreviation included the greek letter

It would be nice to be able to specify whether label or abbreviation get used.

Different question, will glossary entries like this (below) mess with your current code?

\newacronym[sort=f]{F}{\ensuremath{F}}{absolute bioavailability}

kylebaron · 2024-02-15T21:31:09Z

Hey @KatherineKayMRG -

Good points. I think we want to refer to to the label, but agree it would be better to put the abbreviation in there. I can make that happen.

Different question, will glossary entries like this (below) mess with your current code?

\newacronym[sort=f]{F}{\ensuremath{F}}{absolute bioavailability}

I think the code handles this as it is.

kylebaron · 2024-02-15T22:33:27Z

I'm going to refactor this ... will be more complicated but I think there is a need.

R/glossary.R

tests/testthat/test-glossary.R

kylebaron · 2024-05-31T12:45:21Z

@kyleam - I think I addressed everything here, adding tests where I missed badly on some of the implementation. But let me know if I didn't get one of the comments right or if I just overlooked anything.

KatherineKayMRG

From a user perspective, this looks good to me and I'm looking forward to using it. No more questions from me. I'll leave the code review to Kyle M.

kylebaron · 2024-05-31T12:50:08Z

Thanks, @KatherineKayMRG. It sounds like @timwaterhouse is going to give this a spin on upcoming project and we can tweak and adjust some things from there.

KatherineKayMRG · 2024-05-31T12:59:19Z

@kylebaron - that project of @timwaterhouse's is the one I was mentioning on slack. I've been reworking their reports to use a shared glossary file and I'm looking forward to trying out this functionality on that project.

kyleam

Thanks for the updates. I have one follow-up comment about a case where print.glossary still errors, but otherwise this looks ready to go.

(I see there are merge conflicts, but those are just in the generated coverage reports.)

kyleam · 2024-05-31T13:33:35Z

R/glossary.R

+    return(invisible(NULL))
+  }
+  label <- names(x)
+  def <- map_chr(x, "definition")


The case I mentioned here will still fail:

as_glossary(ss = "steady state")[2] #> Error in `map_chr()` at pmtables/R/glossary.R:311:3: #> ℹ In index: 1. #> Caused by error: #> ! Result must be length 1, not 0. #> Run `rlang::last_trace()` to see where the error occurred

The length guard won't catch that because of this named-list behavior:

x <- list(foo = "foo") x[2] #> $<NA> #> NULL length(x[2]) #> [1] 1

How about this guard instead?

diff --git a/R/glossary.R b/R/glossary.R index 76fc618..3016dc9 100644 --- a/R/glossary.R +++ b/R/glossary.R @@ -303,8 +303,8 @@ print.glossary_entry <- function(x, ...) { #' @export print.glossary <- function(x, ...) { - if(!length(x)) { - cat("No glossary entries found.") + if(is.null(unlist(x))) { + cat("No glossary entries found.\n") return(invisible(NULL)) } label <- names(x)

Thanks; that looks like it works. Added tests for zero-length and mis-indexed objects.

kylebaron added 8 commits February 15, 2024 10:54

adding rd files for glossary feature

54c1ddf

clean up check

d67df6e

adding test glossary file

6651691

split up functionality a bit

8e8386a

glossary text parse

e8f8243

add test for st_notes_glo

04e344a

test parsing labels

2d23dfd

minor documentation fix

8b608c5

kylebaron requested review from kyleam and KatherineKayMRG and removed request for kyleam and KatherineKayMRG February 15, 2024 19:28

kylebaron marked this pull request as draft February 15, 2024 19:28

make_glossaries is generic

26578c9

kylebaron requested review from kyleam and KatherineKayMRG February 15, 2024 19:44

kylebaron marked this pull request as ready for review February 15, 2024 19:44

kylebaron marked this pull request as draft February 15, 2024 21:41

kylebaron added 7 commits February 15, 2024 17:44

move code to glossary.R file; refactor for more control

59dc8e1

continue refactor; tests

03a20ef

documentation; update_abbrev function

e1695ab

adding support for yaml glossary format

7c66d4e

more testing for glossary notes

0a9ad04

fill out glossary interaction

3289aaf

fix some check issues

9761d29

kylebaron changed the title ~~Pull table notes form tex glossary files~~ Pull table notes from tex glossary files and yaml May 21, 2024

adding Rd file

4681e4e

kylebaron added 3 commits May 21, 2024 09:58

refactor as_glossary

442b082

tweak documentation and tests

d33af79

tweak test message

7e0f697

kylebaron marked this pull request as ready for review May 22, 2024 04:13

update docs

f8cdbd7

kyleam requested changes May 31, 2024

View reviewed changes

kylebaron added 8 commits May 30, 2024 23:03

patch parser for spaces and comments

d9de721

add tests for spaces, escaped comment and braces in comment

37d2fb1

drop labels argument; users can use all_of() instead

55ec404

append new line

92530ee

drop $ method; allow for zero-lenghth object in print method

14aa478

update methods for [ and c

e55e792

fix glossary_notes.list arguments; add tests

0a4a223

check coverage

89bde6a

kylebaron requested a review from kyleam May 31, 2024 12:42

KatherineKayMRG approved these changes May 31, 2024

View reviewed changes

kyleam approved these changes May 31, 2024

View reviewed changes

Merge branch 'main' into glossary

2521b79

kylebaron merged commit 4cde219 into main May 31, 2024
4 checks passed

kylebaron deleted the glossary branch May 31, 2024 14:51

This was referenced May 31, 2024

Push last changes from previous PR for glossary functionality #336

Merged

Release/0.7.0 #337

Merged

kyleam mentioned this pull request Jun 12, 2024

Glossary returning unanticipated definitions, if leading characters match #338

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pull table notes from tex glossary files and yaml #326

Pull table notes from tex glossary files and yaml #326

kylebaron commented Feb 15, 2024 •

edited

Loading

KatherineKayMRG commented Feb 15, 2024

kylebaron commented Feb 15, 2024

kylebaron commented Feb 15, 2024

kylebaron commented May 31, 2024

KatherineKayMRG left a comment

kylebaron commented May 31, 2024

KatherineKayMRG commented May 31, 2024

kyleam left a comment

kyleam May 31, 2024

kylebaron May 31, 2024

Pull table notes from tex glossary files and yaml #326

Pull table notes from tex glossary files and yaml #326

Conversation

kylebaron commented Feb 15, 2024 • edited Loading

Summary

Objects

Functions

Reprex

Read in a glossary file

yaml format

Create a glossary object

Update abbreviation

Work with glossary object

Extract and print

Head

Select

Combine

Coerce

data frame

list

Create notes

KatherineKayMRG commented Feb 15, 2024

kylebaron commented Feb 15, 2024

kylebaron commented Feb 15, 2024

kylebaron commented May 31, 2024

KatherineKayMRG left a comment

Choose a reason for hiding this comment

kylebaron commented May 31, 2024

KatherineKayMRG commented May 31, 2024

kyleam left a comment

Choose a reason for hiding this comment

kyleam May 31, 2024

Choose a reason for hiding this comment

kylebaron May 31, 2024

Choose a reason for hiding this comment

kylebaron commented Feb 15, 2024 •

edited

Loading