Could the intervals be extended to month and/or month-year? #14

Lextuga007 · 2021-09-17T12:53:15Z

I want to give patientcounter a try with smoking prevalence data by team or ward and I have information over many years so the best way to 'count' the open people in a team or ward are by referrals by month-year. Patientcounter only goes to day - is that right?

johnmackintosh · 2021-10-01T14:54:49Z

Hi @Lextuga007 - I've only just seen this, not sure why I wasn't notified before.

as far as I know, if it works with cut, it should work - this is the guidance for cut.POSIXct:

I'd be happy to take a look if you have some trial data you could share (offline)?

will-ball · 2022-12-14T09:17:25Z

Hey @johnmackintosh did you guys end up finding out if this worked? I'm potentially going to be doing a count of folks added before but not removed from a register on a specific date over multiple years. It appears that specifying "year" would be fine - how would I go about setting the day & month to check at?

johnmackintosh · 2022-12-14T11:21:58Z

@will-ball I never got round to looking into this in detail. In reference to @Lextuga007's comment, the package doesn't necessarily only go to day level, but it does expect date-time, rather than dates.
It was created due to the need for needing hourly or even finer grained counts.

If you use the individual level, the function returns a row per individual per interval, including the original start and end datetimes, plus the interval's base date and hour - which you can use to filter results to a specific date and time.

Alternatively, maybe you could use data.table's rolling joins?

https://www.gormanalysis.com/blog/r-data-table-rolling-joins/

https://r-norberg.blogspot.com/2016/06/understanding-datatable-rolling-joins.html

If you have some fake data to play around with, would be happy to take a look at all the options

will-ball · 2022-12-14T12:33:09Z

Thanks for getting back to me @johnmackintosh

I've not encountered rolling joins before so will take a look, thanks for flagging. I've got a toy dataset to illustrate:

# Simple Example
library(tidyverse)
library(lubridate)
library(truncnorm)

n_people <- 1000

start_date <- as_date("2012-01-01")
end_date <- as_date("2021-12-31")

set.seed(20221214)

data <- as_tibble(
  list(
    id = sample(1:n_people, replace = TRUE),
    added = start_date + sample.int(end_date - start_date, n_people))) %>% 
  mutate(
    removed = added + rtruncnorm(n_people, mean = 30, sd = 15, a = 1, b = 1000),
    days = added %--% removed %/% days(1))

From data which essentially looks like this, I'd like to count how many people are 'registered' on the 31st July each year. I don't think it should complicate anything but the same person can be added/removed multiple times. I will have a play myself but if you get bored and want to take a look let me know.

johnmackintosh · 2022-12-14T12:58:45Z

see if this gives you what you need @will-ball ?

library(tidyverse)
library(lubridate)
library(truncnorm)

library(patientcounter)

n_people <- 1000

start_date <- as_date("2012-01-01")
end_date <- as_date("2021-12-31")

set.seed(20221214)

data <- as_tibble(
  list(
    id = sample(1:n_people, replace = TRUE),
    added = start_date + sample.int(end_date - start_date, n_people))) %>% 
  mutate(
    removed = added + rtruncnorm(n_people, mean = 30, sd = 15, a = 1, b = 1000),
    days = added %--% removed %/% days(1))


data2 <- data %>% 
  mutate(added  = as.POSIXct(added), 
         removed = as.POSIXct(removed))

results <- interval_census(data2, 
                           identifier = 'id', 
                           admit = "added", 
                           discharge = "removed", 
                           time_unit = '1 day', 
                           results = 'patient')

results[lubridate::month(base_date)== 7 & lubridate::day(base_date) == 31] %>% 
  arrange(.,id, added)

johnmackintosh · 2022-12-14T13:13:14Z

results[lubridate::month(base_date)== 7 & lubridate::day(base_date) == 31,.N, .(base_date)]

will give you tallies for each cutoff date

will-ball · 2022-12-14T13:58:23Z

That works perfectly thanks 😄

johnmackintosh · 2022-12-14T14:39:21Z

Nice one @will-ball
Not sure I've been any use to @Lextuga007 yet so will leave this open for now

Lextuga007 · 2023-08-25T07:54:03Z

Yes, it does look like "year" is supported as time_unit parameter feeds into {lubridate} functions. However, when I run a smaller example for years there is a strange thing when an end date is already "floored":

library(dplyr)
library(patientcounter)

df <- tibble::tribble(
  ~id,  ~start_date,    ~end_date, ~smoking_status,
   5L, "2024-08-01", NA, "smoker",
   1L, "2019-01-01", "2020-01-01",        "smoker",
   2L, "2019-01-02", "2020-01-02",    "non-smoker",
   3L, "2019-01-03", "2022-01-01",        "smoker",
   4L, "2019-01-04", NA,    "non-smoker"
  ) |> 
  mutate(start_date = as.POSIXct(start_date),
         end_date = as.POSIXct(end_date))
  
results <- interval_census(df, 
                           identifier = 'id', 
                           admit = "start_date", 
                           discharge = "end_date", 
                           time_unit = 'year', 
                           results = 'patient') |> 
  arrange(id)

id 1 should get 2019 and 2020 but because it's end date is on the 1st 2020 doesn't show. I'm guessing but is this something related to the date times and the time is tipping it to 2019-12-31? The same happens with id 3 which should be 2019, 2020, 2021 and 2022 but 2022 is dropped.

johnmackintosh · 2023-08-26T17:46:27Z

Hmm, I wonder if that is timezone related.
I haven't tried your code yet, but I've encountered issues with the changeover from BST/ GMT if UTC has not been explicitly declared.

I don't have much bandwidth to look into this at present.

Another possible influencing factor is my use of "within" as the method used with foverlaps.
I was thinking about making that a parameter in the main function so that folk can use whatever method suits them best.

Will try and get that sorted soon.

Lextuga007 · 2023-09-16T09:15:14Z

Tom Jemmett https://github.com/tomjemmett wrote this code which I've adapted for the data I used and it's made me realise that what I need to count is not really a census as I don't want to subtract people who leave for something like prevalence.

df |> 
  tidyr::pivot_longer(-c(id, smoking_status), 
                      values_to = "date") |>
  dplyr::mutate(n = ifelse(name == "start_date", 1, -1)) |>
  tidyr::replace_na(list(date = lubridate::today())) |> 
  dplyr::mutate(date = lubridate::floor_date(date, "year")) |> 
  dplyr::arrange(date, smoking_status) |>
  dplyr::mutate(c = cumsum(n),
                .by = smoking_status) |> 
  dplyr::select(-name, -id, -n) |>  
  dplyr::slice_tail(n = 1, by = c(date, smoking_status)) |> 
  tidyr::complete(date = seq(min(date), max(date), by = "year")) |> 
  tidyr::fill(c(c, smoking_status)) |>
  tidyr::replace_na(list(c = 0))

I think for prevalence I'd need to drop the generating of -1 for an exit.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could the intervals be extended to month and/or month-year? #14

Could the intervals be extended to month and/or month-year? #14

Lextuga007 commented Sep 17, 2021

johnmackintosh commented Oct 1, 2021

will-ball commented Dec 14, 2022

johnmackintosh commented Dec 14, 2022

will-ball commented Dec 14, 2022 •

edited

Loading

johnmackintosh commented Dec 14, 2022

johnmackintosh commented Dec 14, 2022

will-ball commented Dec 14, 2022

johnmackintosh commented Dec 14, 2022

Lextuga007 commented Aug 25, 2023

johnmackintosh commented Aug 26, 2023

Lextuga007 commented Sep 16, 2023

Could the intervals be extended to month and/or month-year? #14

Could the intervals be extended to month and/or month-year? #14

Comments

Lextuga007 commented Sep 17, 2021

johnmackintosh commented Oct 1, 2021

will-ball commented Dec 14, 2022

johnmackintosh commented Dec 14, 2022

will-ball commented Dec 14, 2022 • edited Loading

johnmackintosh commented Dec 14, 2022

johnmackintosh commented Dec 14, 2022

will-ball commented Dec 14, 2022

johnmackintosh commented Dec 14, 2022

Lextuga007 commented Aug 25, 2023

johnmackintosh commented Aug 26, 2023

Lextuga007 commented Sep 16, 2023

will-ball commented Dec 14, 2022 •

edited

Loading