July 31, 2019

Tidy Eval

Tidy Eval in R

Tidy Eval is a conceptual framework for doing metaprogramming in R.

K… but what’s metaprogramming?

Metaprogramming is treating code as data – data that can be acted on by other code.

When we treat code as data, it can be read, analyzed, edited, and written by other code.

Writing the “other code” that handles code-as-data is metaprogramming.

Cool, huh?! 😎 

Tidy Eval in R

R’s LISP-y heritage means metaprogramming is built in.

Unfortunately, metaprogramming in base R is a little clunky.

Tidy Eval is a conceptual framework that makes metaprogramming in R more consistent and accessible.

The R package that makes Tidy Eval possible is rlang.

For a deep dive into Tidy Eval, check out the “Metaprogramming” section in Advanced R (2nd edition) by Hadley Wickham.

Tidy Eval for Iterating Fields

Why would metaprogramming be useful for dealing with UDS 3 data?

Form A3 has 140 sibling data fields:

  • SIB 1 MOB, SIB 1 YOB, SIB 1 AGD, SIB 1 NEU, SIB 1 PDX, SIB 1 MOE, SIB 1 AGO
  • SIB 2 MOB, SIB 2 YOB, SIB 2 AGD, SIB 2 NEU, SIB 2 PDX, SIB 2 MOE, SIB 2 AGO
  • SIB 3 MOB, SIB 3 YOB, SIB 3 AGD, SIB 3 NEU, SIB 3 PDX, SIB 3 MOE, SIB 3 AGO
  • … … …
  • … … …
  • … … …
  • SIB 20 MOB, SIB 20 YOB, SIB 20 AGD, SIB 20 NEU, SIB 20 PDX, SIB 20 MOE, SIB 20 AGO

  20 possible siblings \(\times\) 7 data points = 140 sibling fields

Setup

UDS Dataset df_uds with Form A1 + A3 Fields

head(df_uds) %>% 
  pretty_print_scroll
PACKET FORMVER PTID VISITMO VISITDAY VISITYR VISITNUM BIRTHMO BIRTHYR SEX RACE EDUC MARISTAT HANDED SIB1MOB SIB1YOB SIB1AGD SIB1NEU SIB1PDX SIB1MOE SIB1AGO SIB2MOB SIB2YOB SIB2AGD SIB2NEU SIB2PDX SIB2MOE SIB2AGO SIB3MOB SIB3YOB SIB3AGD SIB3NEU SIB3PDX SIB3MOE SIB3AGO SIB4MOB SIB4YOB SIB4AGD SIB4NEU SIB4PDX SIB4MOE SIB4AGO SIB5MOB SIB5YOB SIB5AGD SIB5NEU SIB5PDX SIB5MOE SIB5AGO SIB6MOB SIB6YOB SIB6AGD SIB6NEU SIB6PDX SIB6MOE SIB6AGO SIB7MOB SIB7YOB SIB7AGD SIB7NEU SIB7PDX SIB7MOE SIB7AGO SIB8MOB SIB8YOB SIB8AGD SIB8NEU SIB8PDX SIB8MOE SIB8AGO SIB9MOB SIB9YOB SIB9AGD SIB9NEU SIB9PDX SIB9MOE SIB9AGO SIB10MOB SIB10YOB SIB10AGD SIB10NEU SIB10PDX SIB10MOE SIB10AGO SIB11MOB SIB11YOB SIB11AGD SIB11NEU SIB11PDX SIB11MOE SIB11AGO SIB12MOB SIB12YOB SIB12AGD SIB12NEU SIB12PDX SIB12MOE SIB12AGO SIB13MOB SIB13YOB SIB13AGD SIB13NEU SIB13PDX SIB13MOE SIB13AGO SIB14MOB SIB14YOB SIB14AGD SIB14NEU SIB14PDX SIB14MOE SIB14AGO SIB15MOB SIB15YOB SIB15AGD SIB15NEU SIB15PDX SIB15MOE SIB15AGO SIB16MOB SIB16YOB SIB16AGD SIB16NEU SIB16PDX SIB16MOE SIB16AGO SIB17MOB SIB17YOB SIB17AGD SIB17NEU SIB17PDX SIB17MOE SIB17AGO SIB18MOB SIB18YOB SIB18AGD SIB18NEU SIB18PDX SIB18MOE SIB18AGO SIB19MOB SIB19YOB SIB19AGD SIB19NEU SIB19PDX SIB19MOE SIB19AGO SIB20MOB SIB20YOB SIB20AGD SIB20NEU SIB20PDX SIB20MOE SIB20AGO
I 3 PT0178 7 10 2018 001 8 1950 2 2 13 4 1 12 1945 53 8 3 1951 51 8
I 3 PT0184 12 30 2017 001 7 1953 2 2 12 3 2 8 1949 66 8 9 1951 8 2 1960 8
I 3 PT0190 12 16 2017 001 8 1953 1 1 14 1 2 2 1941 62 9 1 1943 8 2 1944 8 4 1945 66 9
I 3 PT0206 12 19 2017 001 3 1951 2 2 16 3 2 9 1958 53 8
I 3 PT0207 12 19 2017 001 7 1949 2 2 18 2 2 4 1941 65 8
I 3 PT0213 12 26 2017 001 5 1944 2 2 18 3 2 10 1942 65 8 12 1943 73 8 4 1949 40 8

UDS Dataset df_uds with Form A1 + A3 Fields

## Observations: 50
## Variables: 154
## $ PACKET   <chr> "I", "I", "I", "I", "I", "I", "I", "I", "I", "I", "I", …
## $ FORMVER  <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
## $ PTID     <chr> "PT0178", "PT0184", "PT0190", "PT0206", "PT0207", "PT02…
## $ VISITMO  <dbl> 7, 12, 12, 12, 12, 12, 9, 10, 10, 11, 2, 12, 12, 9, 12,…
## $ VISITDAY <int> 10, 30, 16, 19, 19, 26, 25, 15, 16, 20, 4, 2, 2, 13, 31…
## $ VISITYR  <int> 2018, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 2018, 2…
## $ VISITNUM <chr> "001", "001", "001", "001", "001", "001", "001", "001",…
## $ BIRTHMO  <dbl> 8, 7, 8, 3, 7, 5, 12, 7, 10, 9, 1, 8, 11, 6, 11, 1, 10,…
## $ BIRTHYR  <dbl> 1950, 1953, 1953, 1951, 1949, 1944, 1953, 1938, 1950, 1…
## $ SEX      <int> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ RACE     <int> 2, 2, 1, 2, 2, 2, 2, 1, 2, 1, 2, 1, 1, 1, 2, 2, 2, 2, 2…
## $ EDUC     <int> 13, 12, 14, 16, 18, 18, 16, 16, 14, 16, 18, 12, 12, 14,…
## $ MARISTAT <int> 4, 3, 1, 3, 2, 3, 5, 2, 2, 1, 3, 1, 2, 3, 2, 2, 3, 1, 3…
## $ HANDED   <int> 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2…
## $ SIB1MOB  <int> 12, 8, 2, 9, 4, 10, 9, 2, NA, 1, 12, 8, 8, 10, 8, 8, 10…
## $ SIB1YOB  <int> 1945, 1949, 1941, 1958, 1941, 1942, 1945, 1929, 1922, 1…
## $ SIB1AGD  <int> 53, 66, 62, 53, 65, 65, 71, 72, 72, 80, 63, 64, 64, 67,…
## $ SIB1NEU  <int> 8, 8, 9, 8, 8, 8, 8, 8, 8, 1, 5, 1, 1, 1, 8, 8, 5, 8, 8…
## $ SIB1PDX  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, 999, 210, 50, 50, 5…
## $ SIB1MOE  <int> NA, NA, NA, NA, NA, NA, 7, NA, NA, 7, 7, 1, 1, 7, NA, N…
## $ SIB1AGO  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, 73, 29, 55, 55, 55,…
## $ SIB2MOB  <int> 3, 9, 1, NA, NA, 12, 2, NA, NA, NA, 12, 1, 10, 12, NA, …
## $ SIB2YOB  <int> 1951, 1951, 1943, NA, NA, 1943, 1947, NA, 1940, 1930, 1…
## $ SIB2AGD  <int> 51, NA, NA, NA, NA, 73, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB2NEU  <int> 8, 8, 8, NA, NA, 8, 8, NA, 8, 1, 5, 8, 8, 8, 8, 8, NA, …
## $ SIB2PDX  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, 999, 210, NA, NA, N…
## $ SIB2MOE  <int> NA, NA, NA, NA, NA, NA, 7, NA, NA, 7, 7, NA, NA, NA, NA…
## $ SIB2AGO  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 16, NA, NA, NA,…
## $ SIB3MOB  <int> NA, 2, 2, NA, NA, 4, 12, NA, NA, NA, NA, 12, 12, 1, NA,…
## $ SIB3YOB  <int> NA, 1960, 1944, NA, NA, 1949, 1951, NA, 1950, NA, NA, 1…
## $ SIB3AGD  <int> NA, NA, NA, NA, NA, 40, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB3NEU  <int> NA, 8, 8, NA, NA, 8, 8, NA, 8, NA, NA, 4, 4, 8, 8, 8, N…
## $ SIB3PDX  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 240, 240, N…
## $ SIB3MOE  <int> NA, NA, NA, NA, NA, NA, 7, NA, NA, NA, NA, 7, 7, NA, NA…
## $ SIB3AGO  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 74, 74, NA,…
## $ SIB4MOB  <int> NA, NA, 4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2, 3…
## $ SIB4YOB  <int> NA, NA, 1945, NA, NA, NA, NA, NA, 1952, NA, NA, NA, NA,…
## $ SIB4AGD  <int> NA, NA, 66, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB4NEU  <int> NA, NA, 9, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, 8, 8,…
## $ SIB4PDX  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB4MOE  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB4AGO  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB5MOB  <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, 10, …
## $ SIB5YOB  <int> NA, NA, NA, NA, NA, NA, NA, NA, 1935, NA, NA, NA, NA, 1…
## $ SIB5AGD  <int> NA, NA, NA, NA, NA, NA, NA, NA, 83, NA, NA, NA, NA, NA,…
## $ SIB5NEU  <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, 8, N…
## $ SIB5PDX  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB5MOE  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB5AGO  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB6MOB  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB6YOB  <int> NA, NA, NA, NA, NA, NA, NA, NA, 1942, NA, NA, NA, NA, N…
## $ SIB6AGD  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB6NEU  <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, …
## $ SIB6PDX  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB6MOE  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB6AGO  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB7MOB  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB7YOB  <int> NA, NA, NA, NA, NA, NA, NA, NA, 1945, NA, NA, NA, NA, N…
## $ SIB7AGD  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB7NEU  <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, …
## $ SIB7PDX  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB7MOE  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB7AGO  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB8MOB  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB8YOB  <int> NA, NA, NA, NA, NA, NA, NA, NA, 1926, NA, NA, NA, NA, N…
## $ SIB8AGD  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB8NEU  <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, …
## $ SIB8PDX  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB8MOE  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB8AGO  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB9MOB  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB9YOB  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB9AGD  <int> NA, NA, NA, NA, NA, NA, NA, NA, 99, NA, NA, NA, NA, NA,…
## $ SIB9NEU  <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, …
## $ SIB9PDX  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB9MOE  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB9AGO  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB10MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB10YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB10AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB10NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, …
## $ SIB10PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB10MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB10AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB11MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB11YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB11AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB11NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, …
## $ SIB11PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB11MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB11AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB12MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB12YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB12AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB12NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, …
## $ SIB12PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB12MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB12AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB13MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB13YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB13AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB13NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB13PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB13MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB13AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB14MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB14YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB14AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB14NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB14PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB14MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB14AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB15MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB15YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB15AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB15NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB15PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB15MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB15AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB16MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB16YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB16AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB16NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB16PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB16MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB16AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB17MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB17YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB17AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB17NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB17PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB17MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB17AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB18MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB18YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB18AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB18NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB18PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB18MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB18AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB19MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB19YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB19AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB19NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB19PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB19MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB19AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB20MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB20YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB20AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB20NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB20PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB20MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ SIB20AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Caveats

Pipe Operator

Extensive use of the pipe operator, %>% from magrittr or %>>% from pipeR

Basics

  • f(x) is equivalent to x %>% f
  • f(x, y) is equivalent to x %>% f(y)
  • h(g(f(x))) is equivalent to x %>% f %>% g %>% h

Dot Placeholder

  • g(x, f(y)) is equivalent to y %>% f %>% g(x, .)

0 to 60    🏎 💨 

Goals:

  1. Outline basic ideas and glimpse some building blocks

  2. Demonstrate how building blocks can be used with UDS data


… we do get into the weeds

Expressions

Expressions as Arguments

To treat code as data, we need some way to capture the code expressions that are passed to a function before they’re evaluated within the function.

Those familiar with the R tidyverse have seen this in action with dplyr.

library(dplyr)
df_uds_abrv %>% 
  filter(EDUC >= 20)
## # A tibble: 2 x 8
##   PTID   BIRTHMO BIRTHYR   SEX  RACE  EDUC MARISTAT HANDED
##   <chr>    <dbl>   <dbl> <int> <int> <int>    <int>  <int>
## 1 PT0494       1    1936     1     1    20        1      2
## 2 PT0542       3    1939     1     1    20        1      1

Notice the filter condition EDUC >= 20 isn’t a string. It’s an expression.

Expressions vs. Strings

EDUC >= 20 is an expression.

"EDUC >= 20" is a string.

library(dplyr)
df_uds_abrv %>% 
  filter(EDUC >= 20)
## # A tibble: 2 x 8
##   PTID   BIRTHMO BIRTHYR   SEX  RACE  EDUC MARISTAT HANDED
##   <chr>    <dbl>   <dbl> <int> <int> <int>    <int>  <int>
## 1 PT0494       1    1936     1     1    20        1      2
## 2 PT0542       3    1939     1     1    20        1      1

How does dplyr do this? How does it capture the expression?

Expression Capture

Suppose we’d like to build our own function that helps us summarize a data frame (like df_uds) with some descriptive statistics.

Here’s a simple example with dplyr::summarize to build on:

df_uds %>%
  summarize(mean(EDUC))
## # A tibble: 1 x 1
##   `mean(EDUC)`
##          <dbl>
## 1         15.8

Expression Capture

Instead of finding the mean of all participants, what if we want to group the participants by sex, SEX?

df_uds %>% 
  group_by(SEX) %>% 
  summarize(mean(EDUC))
## # A tibble: 2 x 2
##     SEX `mean(EDUC)`
##   <int>        <dbl>
## 1     1         16.7
## 2     2         15.4

Expression Capture

Of course, we could group by other fields like RACE.

df_uds %>% 
  group_by(RACE) %>% 
  summarize(mean(EDUC))
## # A tibble: 2 x 2
##    RACE `mean(EDUC)`
##   <int>        <dbl>
## 1     1         15.9
## 2     2         15.7

Expression Capture

Or MARISTAT.

df_uds %>% 
  group_by(MARISTAT) %>% 
  summarize(mean(EDUC))
## # A tibble: 5 x 2
##   MARISTAT `mean(EDUC)`
##      <int>        <dbl>
## 1        1         16.3
## 2        2         15.5
## 3        3         15.3
## 4        4         13  
## 5        5         16

Expression Capture

Can we create a custom function that allows us to pass whatever grouping variable we want (SEX, RACE, MARISTAT)?

mean_EDUC_group_by <- function(df, group_var) {
  df %>% 
    group_by(group_var) %>% 
    summarize(mean(EDUC))
}

Call the function using SEX as a grouping variable.

df_uds %>% 
  mean_EDUC_group_by(SEX)
## Error: Column `group_var` is unknown

Expression Capture

Within the function, group_var is unknown. We’d have to add it explicitly within the function for it to work. But that defeats the purpose.

mean_EDUC_group_by <- function(df, group_var) {
  df %>% 
    group_by(SEX) %>% 
    summarize(mean(EDUC))
}
df_uds %>% 
  mean_EDUC_group_by(RACE) # Doesn't matter what user passes!
## # A tibble: 2 x 2
##     SEX `mean(EDUC)`
##   <int>        <dbl>
## 1     1         16.7
## 2     2         15.4

Expression Capture

How do we capture the expression SEX passed to mean_EDUC_group_by by the user?

We need to use Tidy Eval, specifically the enquo function from the rlang package. enquo captures – or “quotes” – the expression passed by the user.

library(rlang)
mean_EDUC_group_by <- function(df, group_var) {
  group_var_quo <- enquo(group_var)
  
  df %>% 
    group_by(group_var_quo) %>%
    summarize(mean(EDUC))
}

Expression Capture

Let’s give the updated mean_EDUC_group_by function a whirl.

df_uds %>% 
  mean_EDUC_group_by(SEX)
## Error: Column `group_var_quo` is unknown

There’s still an error. Why?

Expression Capture

We captured – or “quoted” – the expression SEX with group_var_quo.

mean_EDUC_group_by <- function(df, group_var) {
  group_var_quo <- enquo(group_var)
  df %>% 
    group_by(group_var_quo) %>% 
    summarize(mean(EDUC))
}

But for group_by to evaluate group_var_quo as the expression SEX, “unquote” group_var_quo with !! operator.

mean_EDUC_group_by <- function(df, group_var) {
  group_var_quo <- enquo(group_var)
  df %>% 
    group_by(!!group_var_quo) %>% 
    summarize(mean(EDUC))
}

Expression Capture

Let’s try again with SEX.

df_uds %>% 
  mean_EDUC_group_by(SEX)
## # A tibble: 2 x 2
##     SEX `mean(EDUC)`
##   <int>        <dbl>
## 1     1         16.7
## 2     2         15.4

Expression Capture

And again with RACE.

df_uds %>% 
  mean_EDUC_group_by(RACE)
## # A tibble: 2 x 2
##    RACE `mean(EDUC)`
##   <int>        <dbl>
## 1     1         15.9
## 2     2         15.7

Expanding on Expression Capture

Can we generalize mean_EDUC_group_by a bit?

What if we want the mean of something other than EDUC?

Well, we can use the same principles we just applied to the group_var argument.

group_by_mean <- function(df, group_var, mean_var) {
  group_var_quo <- enquo(group_var)
  mean_var_quo <- enquo(mean_var)
  
  df %>% 
    group_by(!!group_var_quo) %>% 
    summarize(mean(!!mean_var_quo))
}

Expanding on Expression Capture

Let’s try it out.

df_uds %>% 
  group_by_mean(group_var = SEX, mean_var = EDUC)
## # A tibble: 2 x 2
##     SEX `mean(EDUC)`
##   <int>        <dbl>
## 1     1         16.7
## 2     2         15.4

But `mean(EDUC)` as a summary table label is ugly.

Can we improve on this?

Expanding on Expression Capture

group_by_mean <- function(df, group_var, mean_var) {
  group_var_quo  <- enquo(group_var)
  mean_var_quo   <- enquo(mean_var)
  
  mean_var_str   <- paste0("mean_", quo_name(mean_var_quo))
  
  df %>% 
    group_by(!!group_var_quo) %>% 
    summarize(!!mean_var_str = mean(!!mean_var_quo))
}
## Error: <text>:9:30: unexpected '='
## 8:     group_by(!!group_var_quo) %>% 
## 9:     summarize(!!mean_var_str =
##                                 ^

Expanding on Expression Capture

group_by_mean <- function(df, group_var, mean_var) {
  group_var_quo  <- enquo(group_var)
  mean_var_quo   <- enquo(mean_var)
  
  mean_var_str   <- paste0("mean_", quo_name(mean_var_quo))
  
  df %>% 
    group_by(!!group_var_quo) %>% 
    summarize(!!mean_var_str := mean(!!mean_var_quo))
}

When the LHS is an unquoted expression, we need :=, a special assignment operator.

Expanding on Expression Capture

df_uds %>% group_by_mean(group_var = SEX, mean_var = EDUC)
## # A tibble: 2 x 2
##     SEX mean_EDUC
##   <int>     <dbl>
## 1     1      16.7
## 2     2      15.4
df_uds %>% group_by_mean(group_var = RACE, mean_var = SIB1AGD)
## # A tibble: 2 x 2
##    RACE mean_SIB1AGD
##   <int>        <dbl>
## 1     1         61.8
## 2     2         58.5

Expression Capture Summary

We can pass expressions to functions.

We can capture the expressions in order to manipulate them.

We can use functions/operators from the rlang package such as enquo, !!, quo_name, and := to quote (capture), manipulate, and unquote expressions passed to a function.

Applying Expression Capture
to UDS 3 Data

Tidy Eval for Iterating Fields

Why would metaprogramming be useful for dealing with UDS 3 data?

Form A3 has 140 sibling data fields:

  • SIB 1 MOB, SIB 1 YOB, SIB 1 AGD, SIB 1 NEU, SIB 1 PDX, SIB 1 MOE, SIB 1 AGO
  • SIB 2 MOB, SIB 2 YOB, SIB 2 AGD, SIB 2 NEU, SIB 2 PDX, SIB 2 MOE, SIB 2 AGO
  • SIB 3 MOB, SIB 3 YOB, SIB 3 AGD, SIB 3 NEU, SIB 3 PDX, SIB 3 MOE, SIB 3 AGO
  • … … …
  • … … …
  • … … …
  • SIB 20 MOB, SIB 20 YOB, SIB 20 AGD, SIB 20 NEU, SIB 20 PDX, SIB 20 MOE, SIB 20 AGO

  20 possible siblings \(\times\) 7 data points = 140 sibling fields

Tidy Eval for Iterating Fields

If we want to do any data validation or simple analysis of all those fields, the code becomes repetitive, error-prone, and hard to maintain.

The basic components of all 140 sibling fields can be expressed simply with code:

sib_base <- "SIB"
sib_nums <- 1:20
sib_data <- c("MOB", "YOB", "AGD", "NEU", "PDX", "MOE", "AGO")

Tidy Eval for Iterating Fields

The components can then be easily combined:

library(tidyr) # for `crossing` and `unite`
sib_fields <- 
  crossing(sib_base, sib_nums, sib_data) %>% # "SIB", 1:20, c("MOB", "YOB", ...)
  arrange(sib_nums) %>%                      # to force expected ordering
  unite(sib_fields, sep = "") %>% 
  pull(sib_fields)
##   [1] "SIB1AGD"  "SIB1AGO"  "SIB1MOB"  "SIB1MOE"  "SIB1NEU"  "SIB1PDX" 
##   [7] "SIB1YOB"  "SIB2AGD"  "SIB2AGO"  "SIB2MOB"  "SIB2MOE"  "SIB2NEU" 
##  [13] "SIB2PDX"  "SIB2YOB"  "SIB3AGD"  "SIB3AGO"  "SIB3MOB"  "SIB3MOE" 
##  [19] "SIB3NEU"  "SIB3PDX"  "SIB3YOB"  "SIB4AGD"  "SIB4AGO"  "SIB4MOB" 
##  [25] "SIB4MOE"  "SIB4NEU"  "SIB4PDX"  "SIB4YOB"  "SIB5AGD"  "SIB5AGO" 
##  [31] "SIB5MOB"  "SIB5MOE"  "SIB5NEU"  "SIB5PDX"  "SIB5YOB"  "SIB6AGD" 
##  [37] "SIB6AGO"  "SIB6MOB"  "SIB6MOE"  "SIB6NEU"  "SIB6PDX"  "SIB6YOB" 
##  [43] "SIB7AGD"  "SIB7AGO"  "SIB7MOB"  "SIB7MOE"  "SIB7NEU"  "SIB7PDX" 
##  [49] "SIB7YOB"  "SIB8AGD"  "SIB8AGO"  "SIB8MOB"  "SIB8MOE"  "SIB8NEU" 
##  [55] "SIB8PDX"  "SIB8YOB"  "SIB9AGD"  "SIB9AGO"  "SIB9MOB"  "SIB9MOE" 
##  [61] "SIB9NEU"  "SIB9PDX"  "SIB9YOB"  "SIB10AGD" "SIB10AGO" "SIB10MOB"
##  [67] "SIB10MOE" "SIB10NEU" "SIB10PDX" "SIB10YOB" "SIB11AGD" "SIB11AGO"
##  [73] "SIB11MOB" "SIB11MOE" "SIB11NEU" "SIB11PDX" "SIB11YOB" "SIB12AGD"
##  [79] "SIB12AGO" "SIB12MOB" "SIB12MOE" "SIB12NEU" "SIB12PDX" "SIB12YOB"
##  [85] "SIB13AGD" "SIB13AGO" "SIB13MOB" "SIB13MOE" "SIB13NEU" "SIB13PDX"
##  [91] "SIB13YOB" "SIB14AGD" "SIB14AGO" "SIB14MOB" "SIB14MOE" "SIB14NEU"
##  [97] "SIB14PDX" "SIB14YOB" "SIB15AGD" "SIB15AGO" "SIB15MOB" "SIB15MOE"
## [103] "SIB15NEU" "SIB15PDX" "SIB15YOB" "SIB16AGD" "SIB16AGO" "SIB16MOB"
## [109] "SIB16MOE" "SIB16NEU" "SIB16PDX" "SIB16YOB" "SIB17AGD" "SIB17AGO"
## [115] "SIB17MOB" "SIB17MOE" "SIB17NEU" "SIB17PDX" "SIB17YOB" "SIB18AGD"
## [121] "SIB18AGO" "SIB18MOB" "SIB18MOE" "SIB18NEU" "SIB18PDX" "SIB18YOB"
## [127] "SIB19AGD" "SIB19AGO" "SIB19MOB" "SIB19MOE" "SIB19NEU" "SIB19PDX"
## [133] "SIB19YOB" "SIB20AGD" "SIB20AGO" "SIB20MOB" "SIB20MOE" "SIB20NEU"
## [139] "SIB20PDX" "SIB20YOB"

Tidy Eval for Iterating Fields

Suppose we want to see some basic descriptives of all the sibling fields (e.g., min, max, median, mean) in order (1) ensure there aren’t any unexpected values, and (2) for continuous variables (AGD, AGO) get a sense of their distributions.

Effectively, we’d like to do 20 siblings \(\times\) 7 data points \(\times\) 4 statistics = 560 calculations.

Tidy Eval for Iterating Fields

We could start here…

df_uds %>% 
  summarize(
    # SIB1MOB
    SIB1MOB_min  = min(SIB1MOB, na.rm = TRUE),
    SIB1MOB_max  = min(SIB1MOB, na.rm = TRUE),
    SIB1MOB_med  = median(SIB1MOB, na.rm = TRUE),
    SIB1MOB_mean = mean(SIB1MOB, na.rm = TRUE),
    # SIB1YOB
    SIB1YOB_min  = min(SIB1YOB, na.rm = TRUE),
    SIB1YOB_max  = max(SIB1YOB, na.rm = TRUE),
    SIB1MOB_med  = median(SIB1YOB, na.rm = TRUE),
    SIB1MOB_mean = mean(SIB1YOB, na.rm = TRUE),
    # ...
    # ... at least 548 lines of copy-pasta'ed code goes here
    # ...
    # SIB20AGO
    SIB20AGO_min  = min(SIB20AGO, na.rm = TRUE),
    SIB20AGO_max  = max(SIB20AGO, na.rm = TRUE),
    SIB20AGO_med  = median(SIB20AGO, na.rm = TRUE),
    SIB20AGO_mean = mean(SIB20AGO, na.rm = TRUE)
  )

Tidy Eval for Iterating Fields

Instead, let’s write a function that effectively writes the code for us.

Input:

  1. A data frame of UDS Form A3 data

  2. The functions we’d like to apply

  3. Components of the sibling fields we want to apply functions to

Output:

  1. A data frame with our results

Tiny Eval for Iterating Fields

Nice helper function from tidyr package for creating all combinations of elements… crossing.

library(tidyr)
crossing(chars = c("a", "b", "c"), nums = 1:2)
## # A tibble: 6 x 2
##   chars  nums
##   <chr> <int>
## 1 a         1
## 2 a         2
## 3 b         1
## 4 b         2
## 5 c         1
## 6 c         2

Tidy Eval for Iterating Fields

To paste the combinations together into a single vector, use tidyr::unite and dplyr::pull.

crossing(chars = c("a", "b", "c"), nums = 1:2) %>% 
  unite(chars_nums, sep = "") %>% 
  pull(chars_nums)
## [1] "a1" "a2" "b1" "b2" "c1" "c2"

Tidy Eval for Iterating Fields

Let’s start writing our function.

my_summary_fxn <- function(df, funcs, ...) {
  # sib_base <- "SIB"
  # sib_nums <- 1:2
  # sib_data <- c("MOB", "YOB")
  # Combine user-passed field components
  crossing(...)
}

Tidy Eval for Iterating Fields

sib_base <- "SIB"
sib_nums <- 1:2
sib_data <- c("MOB", "YOB")

my_summary_fxn(df = df_uds, 
               funcs = list(min, max),
               sib_base, sib_nums, sib_data)
## # A tibble: 4 x 3
##   sib_base sib_nums sib_data
##   <chr>       <int> <chr>   
## 1 SIB             1 MOB     
## 2 SIB             1 YOB     
## 3 SIB             2 MOB     
## 4 SIB             2 YOB

Tidy Eval for Iterating Fields

my_summary_fxn <- function(df, funcs, ...) {
  # sib_base <- "SIB"
  # sib_nums <- 1:2
  # sib_data <- c("MOB", "YOB")
  # Combine user-passed field components into symbols
  fields_syms <-
    crossing(...) %>% 
    unite(fields, sep = "") %>% 
    pull(fields) %>% 
    syms 
  
  fields_syms
}

Tidy Eval for Iterating Fields

my_summary_fxn(df = df_uds, 
               funcs = list(min, max),
               sib_base, sib_nums, sib_data)
## [[1]]
## SIB1MOB
## 
## [[2]]
## SIB1YOB
## 
## [[3]]
## SIB2MOB
## 
## [[4]]
## SIB2YOB

Tidy Eval for Iterating Fields

my_summary_fxn <- function(df, funcs, ...) {
  # Combine user-passed field components into symbols
  fields_syms <- crossing(...) %>% 
    unite(fields, sep = "") %>% pull(fields) %>% syms
  
  # Capture/quote functions as expressions
  funcs_exprs <- enexpr(funcs)[-1] # list, min, max => min, max
  
  as.list(funcs_exprs)
}

Tidy Eval for Iterating Fields

my_summary_fxn(df = df_uds, 
               funcs = list(min, max), 
               sib_base, sib_nums, sib_data)
## [[1]]
## min
## 
## [[2]]
## max

Tidy Eval for Iterating Fields

my_summary_fxn <- function(df, funcs, ...) {
  # Combine user-passed field components into symbols
  fields_syms <- crossing(...) %>% 
    unite(fields, sep = "") %>% pull(fields) %>% syms
  
  # Capture/quote functions as expressions
  funcs_exprs <- enexpr(funcs)[-1] # list, min, max => min, max
  
  # Map over function expressions and field symbols, evaluating with `summarize`
  map_dfc(funcs_exprs, # min, max
          function(func_expr) {
            as_string(func_expr)
          }) 
}

Tidy Eval for Iterating Fields

my_summary_fxn(df = df_uds, 
               funcs = list(min, max),
               sib_base, sib_nums, sib_data) %>% 
  pretty_print
V1 V2
min max
my_summary_fxn(df = df_uds, 
               funcs = list(min, max, median, mean),
               sib_base, sib_nums, sib_data) %>% 
  pretty_print
V1 V2 V3 V4
min max median mean

Tidy Eval for Iterating Fields

my_summary_fxn <- function(df, funcs, ...) {
  # Combine user-passed field components into symbols
  fields_syms <- crossing(...) %>% 
    unite(fields, sep = "") %>% pull(fields) %>% syms
  
  # Capture/quote functions as expressions
  funcs_exprs <- enexpr(funcs)[-1] # list, min, max => min, max
  
  # Map over function expressions and field symbols, evaluting with `summarize`
  map_dfc(funcs_exprs, # min, max
          function(func_expr) {
            map_dfc(fields_syms, # SIB1MOB, SIB1YOB, ...
                    function(field_sym) {
                      #      min, ...        SIB1MOB, ...
                      paste0(func_expr, "_", field_sym)
                    })
          })
}

Tidy Eval for Iterating Fields

my_summary_fxn(df = df_uds, 
               funcs = list(min, max),
               sib_base, sib_nums, sib_data) %>% 
  pretty_print_scroll
V1 V2 V3 V4 V11 V21 V31 V41
min_SIB1MOB min_SIB1YOB min_SIB2MOB min_SIB2YOB max_SIB1MOB max_SIB1YOB max_SIB2MOB max_SIB2YOB
my_summary_fxn(df = df_uds, 
               funcs = list(min, max, median, mean),
               sib_base, sib_nums, sib_data) %>% 
  pretty_print_scroll
V1 V2 V3 V4 V11 V21 V31 V41 V12 V22 V32 V42 V13 V23 V33 V43
min_SIB1MOB min_SIB1YOB min_SIB2MOB min_SIB2YOB max_SIB1MOB max_SIB1YOB max_SIB2MOB max_SIB2YOB median_SIB1MOB median_SIB1YOB median_SIB2MOB median_SIB2YOB mean_SIB1MOB mean_SIB1YOB mean_SIB2MOB mean_SIB2YOB

Tidy Eval for Iterating Fields

my_summary_fxn <- function(df, funcs, ...) {
  # Combine user-passed field components into symbols
  fields_syms <- crossing(...) %>% 
    unite(fields, sep = "") %>% pull(fields) %>% syms
  
  # Capture/quote functions as expressions
  funcs_exprs <- enexpr(funcs)[-1] # list, min, max => min, max
  
  # Map over function expressions and field symbols, evaluting with `summarize`
  map_dfc(funcs_exprs, # min, max
          function(func_expr) {
            map_dfc(fields_syms, # SIB1MOB, SIB1YOB...
                    function(field_sym) {
                      df %>% 
                      summarize(
                        # min_SIB1MOB = min(SIB1MOB, na.rm = TRUE)
                        #        min, ...        SIB1MOB, ...
                        !!paste0(func_expr, "_", field_sym) :=
                          #  min, ...     SIB1MOB, ...
                          (!!func_expr)(!!field_sym,   na.rm = TRUE))
                    })
          })
}

Tidy Eval for Iterating Fields

my_summary_fxn(df = df_uds, 
               funcs = list(min, max),
               sib_base, sib_nums, sib_data) %>% 
  pretty_print
min_SIB1MOB min_SIB1YOB min_SIB2MOB min_SIB2YOB max_SIB1MOB max_SIB1YOB max_SIB2MOB max_SIB2YOB
1 1920 1 1923 12 1962 12 1958

Tidy Eval for Iterating Fields

sib_nums <- 1:3
sib_data <- c("MOB", "YOB", "AGD", "AGO")

my_summary_fxn(df = df_uds, 
               funcs = list(min, max, median, mean),
               sib_base, sib_nums, sib_data) %>% 
  pretty_print_scroll
min_SIB1AGD min_SIB1AGO min_SIB1MOB min_SIB1YOB min_SIB2AGD min_SIB2AGO min_SIB2MOB min_SIB2YOB min_SIB3AGD min_SIB3AGO min_SIB3MOB min_SIB3YOB max_SIB1AGD max_SIB1AGO max_SIB1MOB max_SIB1YOB max_SIB2AGD max_SIB2AGO max_SIB2MOB max_SIB2YOB max_SIB3AGD max_SIB3AGO max_SIB3MOB max_SIB3YOB median_SIB1AGD median_SIB1AGO median_SIB1MOB median_SIB1YOB median_SIB2AGD median_SIB2AGO median_SIB2MOB median_SIB2YOB median_SIB3AGD median_SIB3AGO median_SIB3MOB median_SIB3YOB mean_SIB1AGD mean_SIB1AGO mean_SIB1MOB mean_SIB1YOB mean_SIB2AGD mean_SIB2AGO mean_SIB2MOB mean_SIB2YOB mean_SIB3AGD mean_SIB3AGO mean_SIB3MOB mean_SIB3YOB
0 29 1 1920 0 16 1 1923 0 74 1 1925 94 73 12 1962 83 59 12 1958 86 75 12 1960 64 59 8 1940.5 68.5 37.5 5.5 1942.5 57 74 8 1945 60.06 59.33333 6.704546 1940.333 57.44444 37.5 6.307692 1941 44.77778 74.33333 7.31579 1944.36

UDS 3 Data in REDCap

UDS 3 REDCap Data

Why might metaprogramming be useful for managing UDS 3 data in REDCap?

Small fake dataset using REDCap Collaborative UDS 3.0 data dictionary from KU ADC.

ptid packet visitmo visitday visityr sex fu_sex tele_sex
PT0001 I 1 1 2015 2
PT0002 I 2 2 2015 2
PT0002 F 2 2 2016 2
PT0003 I 3 3 2015 1
PT0003 F 3 3 2016 1
PT0003 T 3 3 2017 1

UDS 3 REDCap Data

Why might metaprogramming be useful for managing UDS 3 data in REDCap?

20 possible siblings (sib1 - sib20)

7 data points (mob, yob, agd, neu, pdx, moe, ago)

3 forms (initial, follow-up fu_, telephone tele_)

420 fields

UDS 3 REDCap Data

ptid packet visitmo visitday visityr sex fu_sex tele_sex
PT0001 I 1 1 2015 2
PT0002 I 2 2 2015 2
PT0002 F 2 2 2016 2
PT0003 I 3 3 2015 1
PT0003 F 3 3 2016 1
PT0003 T 3 3 2017 1
ptid packet sib1mob sib1yob sib1agd fu_sib1mob fu_sib1yob fu_sib1agd tele_sib1mob tele_sib1yob tele_sib1agd
PT0001 I 1 1941
PT0002 I 2 1942
PT0002 F 2 1942
PT0003 I 3 1943 72
PT0003 F 3 1943 72
PT0003 T 3 1943 72

Tidy Eval for Reducing Sparsity

Use dplyr::coalesce.

df_rc_small %>% 
  mutate(sex = coalesce(sex, fu_sex, tele_sex))
ptid packet visitmo visitday visityr sex fu_sex tele_sex
PT0001 I 1 1 2015 2
PT0002 I 2 2 2015 2
PT0002 F 2 2 2016 2 2
PT0003 I 3 3 2015 1
PT0003 F 3 3 2016 1 1
PT0003 T 3 3 2017 1 1

Tidy Eval for Reducing Sparsity

Remove redundant fields with dplyr::select.

df_rc_small %>%
  mutate(sex = coalesce(sex, fu_sex, tele_sex)) %>% 
  select(-fu_sex, -tele_sex)
ptid packet visitmo visitday visityr sex
PT0001 I 1 1 2015 2
PT0002 I 2 2 2015 2
PT0002 F 2 2 2016 2
PT0003 I 3 3 2015 1
PT0003 F 3 3 2016 1
PT0003 T 3 3 2017 1

Tidy Eval for Reducing Sparsity

Easy enough with one field represented in initial, follow-up, and telephone forms.

What about our big REDCap dataset from Form A3?

ptid packet sib1mob sib1yob sib1agd fu_sib1mob fu_sib1yob fu_sib1agd tele_sib1mob tele_sib1yob tele_sib1agd
PT0001 I 1 1941
PT0002 I 2 1942
PT0002 F 2 1942
PT0003 I 3 1943 72
PT0003 F 3 1943 72
PT0003 T 3 1943 72

Tidy Eval for Reducing Sparsity

Brute force approach.

df_rc_big %>% 
  mutate(sib1mob = coalesce(sib1mob, fu_sib1mob, tele_sib1mob),
         sib1yob = coalesce(sib1yob, fu_sib1yob, tele_sib1yob),
         sib1agd = coalesce(sib1agd, fu_sib1agd, tele_sib1agd)) %>% 
  select(-fu_sib1mob, -tele_sib1mob, -fu_sib1yob, -tele_sib1yob,
         -fu_sib1agd, -tele_sib1agd)
ptid packet sib1mob sib1yob sib1agd
PT0001 I 1 1941
PT0002 I 2 1942
PT0002 F 2 1942
PT0003 I 3 1943 72
PT0003 F 3 1943 72
PT0003 T 3 1943 72

Tidy Eval for Reducing Sparsity

Brute force will be daunting once we start including the 19 other possible siblings (sib2sib20) and the 4 other fields (neu, pdx, moe, ago).

Tidy Eval to the rescue!

Tidy Eval for Reducing Sparsity

Input

df_rc_big

irrelevant fields vector, c("ptid", "packet")

ptid packet sib1mob sib1yob sib1agd fu_sib1mob fu_sib1yob fu_sib1agd tele_sib1mob tele_sib1yob tele_sib1agd
PT0001 I 1 1941
PT0002 I 2 1942
PT0002 F 2 1942
PT0003 I 3 1943 72
PT0003 F 3 1943 72
PT0003 T 3 1943 72

Tidy Eval for Reducing Sparsity

Output

ptid packet sib1mob sib1yob sib1agd
PT0001 I 1 1941
PT0002 I 2 1942
PT0002 F 2 1942
PT0003 I 3 1943 72
PT0003 F 3 1943 72
PT0003 T 3 1943 72

Tidy Eval for Reducing Sparsity

coalesce_all_ift_fields <- function(df, irrel_fields) {
  # Get initial visit field names, follow-up names, and telephone names
  ift_fields <- names(df)[-which(irrel_fields %in% names(df))]
  
  # Reduce initial, follow-up, telephone visit fields to initial fields only
  i_fields <- reduce_ift_fieldnames(ift_fields)
  
  # Convert initial field strings to symbol expressions
  i_fields_syms <- syms(i_fields)
  
  # Map over intial visit field symbols, coalescing IFT fields
  # Each iteration returns coalesced field, column-bound to other coal'd fields
  map_dfc(i_fields_syms,
          function(i_field_sym) {
            f_field_sym <- sym(paste0("fu_", i_field_sym))
            t_field_sym <- sym(paste0("tele_", i_field_sym))
            
            df %>% 
              select(!!i_field_sym, !!f_field_sym, !!t_field_sym) %>% 
              mutate(!!i_field_sym := 
                       coalesce(!!i_field_sym, !!f_field_sym, !!t_field_sym)) %>% 
              select(-!!f_field_sym, -!!t_field_sym)
          }) %>% 
    # attach `irrel_fields` to the front of the returned data frame
    bind_cols(df[, irrel_fields], .) 
}

Tidy Eval for Reducing Sparsity

irrel_fields <- c("ptid", "packet")

df_rc_big %>% 
  coalesce_all_ift_fields(irrel_fields) %>% 
  pretty_print
ptid packet sib1mob sib1yob sib1agd
PT0001 I 1 1941
PT0002 I 2 1942
PT0002 F 2 1942
PT0003 I 3 1943 72
PT0003 F 3 1943 72
PT0003 T 3 1943 72

Tidy Eval for Reducing Sparsity

ptid packet sib1mob sib1yob sib1agd fu_sib1mob fu_sib1yob fu_sib1agd tele_sib1mob tele_sib1yob tele_sib1agd
PT0001 I 1 1941
PT0002 I 2 1942
PT0002 F 2 1942
PT0003 I 3 1943 72
PT0003 F 3 1943 72
PT0003 T 3 1943 72
ptid packet sib1mob sib1yob sib1agd
PT0001 I 1 1941
PT0002 I 2 1942
PT0002 F 2 1942
PT0003 I 3 1943 72
PT0003 F 3 1943 72
PT0003 T 3 1943 72

Tidy Eval for Reducing Sparsity

Let’s test this new function on the small REDCap dataset.

Input

df_rc_small

irrelevant fields, c("ptid", "packet", "visitmo", "visitday", "visityr")

ptid packet visitmo visitday visityr sex fu_sex tele_sex
PT0001 I 1 1 2015 2
PT0002 I 2 2 2015 2
PT0002 F 2 2 2016 2
PT0003 I 3 3 2015 1
PT0003 F 3 3 2016 1
PT0003 T 3 3 2017 1

Tidy Eval for Reducing Sparsity

And…

df_rc_small %>% 
  coalesce_all_ift_fields(irrel_fields) %>% 
  pretty_print
ptid packet visitmo visitday visityr sex
PT0001 I 1 1 2015 2
PT0002 I 2 2 2015 2
PT0002 F 2 2 2016 2
PT0003 I 3 3 2015 1
PT0003 F 3 3 2016 1
PT0003 T 3 3 2017 1

Tidy Eval for Reducing Sparsity

ptid packet visitmo visitday visityr sex fu_sex tele_sex
PT0001 I 1 1 2015 2
PT0002 I 2 2 2015 2
PT0002 F 2 2 2016 2
PT0003 I 3 3 2015 1
PT0003 F 3 3 2016 1
PT0003 T 3 3 2017 1


 
 
 
 
 
 
 
 

ptid packet visitmo visitday visityr sex
PT0001 I 1 1 2015 2
PT0002 I 2 2 2015 2
PT0002 F 2 2 2016 2
PT0003 I 3 3 2015 1
PT0003 F 3 3 2016 1
PT0003 T 3 3 2017 1

Resources for Learning More Tidy Eval

Thanks & Acknowledgements

Coordination / Planning

  • Elizabeth Robichaud @ NACC, Delilah Cook @ Wake Forest

  • Mark Espeland & ADRC Data Core Steering Committee

Resource Sharing

  • Suzanne Hunt & University of Kansas ADC

  • NACC

Feedback / Support

  • Hiroko Dodge, Jon Reader, & Michigan ADRC Data Mgmt. and Statistical Core

Tidy Eval for Reducing Sparsity

For reference…

Given the field names

    sib1mob, sib1yob, sib1agd,

    fu_sib1mob, fu_sib1yob, fu_sib1agd,

    tele_sib1mob, tele_sib1yob, tele_sib1agd

reduce_ift_fieldnames returns

    sib1mob, sib1yob, sib1agd