Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric weights 293 #304

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Metric weights 293 #304

wants to merge 9 commits into from

Conversation

emilliman5
Copy link
Collaborator

@emilliman5 emilliman5 commented Jul 18, 2023

first draft for fixing metric weights errors.

The problem:

If a metric was NA it was essentially imputed to 0 unless you explicitly set the it's weight to 0. The proposed solution: Set metric weight to 0 if its score is NA.

Some caveats:

  1. The solution requires that we compute weights and scores per package because a collection of packages could have different patterns of NAs. e.g. scoring packages from different sources.
  2. What if a user explicitly sets a weight but the metric is missing? Currently, standardize_weights silently resets the weight to 0. This should at the least issue a warning.
  3. If a user sets weights for a subset of metrics should pkg_score: i) error, ii) warn, or iii) only compute a summarized score for the metrics with a weight?
    1. very conservative and will break pipelines if/when metrics are added
    2. the middle ground?
    3. most backward compatible but user could be unknowingly omitting metrics upon upgrade of riskemtric

This is still a work in progress, but i think it is baked enough to discuss.

@emilliman5
Copy link
Collaborator Author

#293

@codecov-commenter
Copy link

codecov-commenter commented Jul 18, 2023

Codecov Report

Merging #304 (1738599) into master (31854ae) will increase coverage by 0.71%.
The diff coverage is 100.00%.

❗ Current head 1738599 differs from pull request most recent head f27fac9. Consider uploading reports for the commit f27fac9 to get more accurate results

@@            Coverage Diff             @@
##           master     #304      +/-   ##
==========================================
+ Coverage   61.85%   62.57%   +0.71%     
==========================================
  Files          66       66              
  Lines        1025     1034       +9     
==========================================
+ Hits          634      647      +13     
+ Misses        391      387       -4     
Files Changed Coverage Δ
R/pkg_ref_cache.R 9.09% <ø> (ø)
R/pkg_ref_class.R 88.80% <ø> (ø)
R/pkg_score.R 59.09% <100.00%> (+5.24%) ⬆️
R/summarize_scores.R 92.59% <100.00%> (+18.67%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@AARON-CLARK
Copy link
Contributor

I can confirm the current solution is working:

library(dplyr)
devtools::load_all(".")
packageVersion("riskmetric")

# > [1] ‘0.2.2’

assessed <- "dplyr" %>%
  pkg_ref(source = "pkg_cran_remote", repos = c("https://cran.rstudio.com")) %>%
  as_tibble() %>%
  pkg_assess()

initial_scoring <- assessed %>% pkg_score()
initial_scoring$pkg_score %>% round(2)

# > [1] 0.11
# > attr(,"label")
# > [1] "Summarized risk score from 0 (low) to 1 (high)."

@emilliman5 emilliman5 marked this pull request as ready for review September 6, 2023 20:25
Copy link
Collaborator

@borgmaan borgmaan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey -- I'm seeing strange behavior when I run the example given in #293. I checked on 2 different systems with clean installs of this branch to make sure I was not tricking myself.

The first issue is that the updates now cause the pkg_score for that example to be negative (instead of between 0-1):

# install branch under review
devtools::install_github("pharmaR/riskmetric@metric_weights_293")

# original code from 293 
library(riskmetric)
library(tibble)

dplyr_ref <-
  pkg_ref("dplyr",
    source = "pkg_cran_remote",
    repos = c("https://cran.rstudio.com")
  ) |>
  as_tibble()

dplyr_assess <-
  pkg_assess(dplyr_ref) |>
  as_tibble()
dplyr_assess$covr_coverage # Is NA

weights1 <-
  c(
    has_vignettes = 1, has_news = 1, news_current = 1, has_bug_reports_url = 1,
    has_website = 1, has_maintainer = 1, has_source_control = 1, export_help = 1,
    bugs_status = 1, license = 1, covr_coverage = 0, downloads_1yr = 1
  )
weights2 <-
  c(
    has_vignettes = 1, has_news = 1, news_current = 1, has_bug_reports_url = 1,
    has_website = 1, has_maintainer = 1, has_source_control = 1, export_help = 1,
    bugs_status = 1, license = 1, covr_coverage = 1, downloads_1yr = 1
  )

dplyr_score1 <- pkg_score(dplyr_assess, weights = weights1)
dplyr_score1$pkg_score

dplyr_score2 <- pkg_score(dplyr_assess, weights = weights2)
dplyr_score2$pkg_score 

# these now match but give negative numbers?
# [1] -0.249766
# attr(,"label")
# [1] "Summarized risk score from 0 (low) to 1 (high)."
# 
# 
# [1] -0.249766
# attr(,"label")
# [1] "Summarized risk score from 0 (low) to 1 (high)."

The second thought/issue is that there likely are changes needed to summarize_scores.list to bring it in alignment with whatever is tweaked above. Currently, these scores perform as expected with the different weighting schemes, but they do not match the values produced by summarize_scores.data.frame. Here's an example extending the code above to show the discrepancy:

# check to show behavior of summarize_scores.list
dplyr_ref2 <- pkg_ref("dplyr",
  source = "pkg_cran_remote",
  repos = c("https://cran.rstudio.com")
)

dplyr_assess2 <- pkg_assess(dplyr_ref2)

# these now match each other but do not match the tibble-based scores
dplyr_score3 <- pkg_score(dplyr_assess2, weights = weights1)
dplyr_score3$pkg_score

dplyr_score4 <- pkg_score(dplyr_assess2, weights = weights2)
dplyr_score4$pkg_score 

# > dplyr_score3$pkg_score
# [1] 0.2456549
# attr(,"label")
# [1] "Summarized risk score from 0 (low) to 1 (high)."
# 
# [1] 0.2456549
# attr(,"label")
# [1] "Summarized risk score from 0 (low) to 1 (high)."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pkg_score returns different value if metic is NA and weights are set explicitly
4 participants