July 31, 2019
Tidy Eval is a conceptual framework for doing metaprogramming in R.
K… but what’s metaprogramming?
Metaprogramming is treating code as data – data that can be acted on by other code.
When we treat code as data, it can be read, analyzed, edited, and written by other code.
Writing the “other code” that handles code-as-data is metaprogramming.
Cool, huh?! 😎
R’s LISP-y heritage means metaprogramming is built in.
Unfortunately, metaprogramming in base R is a little clunky.
Tidy Eval is a conceptual framework that makes metaprogramming in R more consistent and accessible.
The R package that makes Tidy Eval possible is rlang
.
For a deep dive into Tidy Eval, check out the “Metaprogramming” section in Advanced R (2nd edition) by Hadley Wickham.
Why would metaprogramming be useful for dealing with UDS 3 data?
Form A3 has 140 sibling data fields:
20 possible siblings \(\times\) 7 data points = 140 sibling fields
df_uds
with Form A1 + A3 Fieldshead(df_uds) %>% pretty_print_scroll
PACKET | FORMVER | PTID | VISITMO | VISITDAY | VISITYR | VISITNUM | BIRTHMO | BIRTHYR | SEX | RACE | EDUC | MARISTAT | HANDED | SIB1MOB | SIB1YOB | SIB1AGD | SIB1NEU | SIB1PDX | SIB1MOE | SIB1AGO | SIB2MOB | SIB2YOB | SIB2AGD | SIB2NEU | SIB2PDX | SIB2MOE | SIB2AGO | SIB3MOB | SIB3YOB | SIB3AGD | SIB3NEU | SIB3PDX | SIB3MOE | SIB3AGO | SIB4MOB | SIB4YOB | SIB4AGD | SIB4NEU | SIB4PDX | SIB4MOE | SIB4AGO | SIB5MOB | SIB5YOB | SIB5AGD | SIB5NEU | SIB5PDX | SIB5MOE | SIB5AGO | SIB6MOB | SIB6YOB | SIB6AGD | SIB6NEU | SIB6PDX | SIB6MOE | SIB6AGO | SIB7MOB | SIB7YOB | SIB7AGD | SIB7NEU | SIB7PDX | SIB7MOE | SIB7AGO | SIB8MOB | SIB8YOB | SIB8AGD | SIB8NEU | SIB8PDX | SIB8MOE | SIB8AGO | SIB9MOB | SIB9YOB | SIB9AGD | SIB9NEU | SIB9PDX | SIB9MOE | SIB9AGO | SIB10MOB | SIB10YOB | SIB10AGD | SIB10NEU | SIB10PDX | SIB10MOE | SIB10AGO | SIB11MOB | SIB11YOB | SIB11AGD | SIB11NEU | SIB11PDX | SIB11MOE | SIB11AGO | SIB12MOB | SIB12YOB | SIB12AGD | SIB12NEU | SIB12PDX | SIB12MOE | SIB12AGO | SIB13MOB | SIB13YOB | SIB13AGD | SIB13NEU | SIB13PDX | SIB13MOE | SIB13AGO | SIB14MOB | SIB14YOB | SIB14AGD | SIB14NEU | SIB14PDX | SIB14MOE | SIB14AGO | SIB15MOB | SIB15YOB | SIB15AGD | SIB15NEU | SIB15PDX | SIB15MOE | SIB15AGO | SIB16MOB | SIB16YOB | SIB16AGD | SIB16NEU | SIB16PDX | SIB16MOE | SIB16AGO | SIB17MOB | SIB17YOB | SIB17AGD | SIB17NEU | SIB17PDX | SIB17MOE | SIB17AGO | SIB18MOB | SIB18YOB | SIB18AGD | SIB18NEU | SIB18PDX | SIB18MOE | SIB18AGO | SIB19MOB | SIB19YOB | SIB19AGD | SIB19NEU | SIB19PDX | SIB19MOE | SIB19AGO | SIB20MOB | SIB20YOB | SIB20AGD | SIB20NEU | SIB20PDX | SIB20MOE | SIB20AGO |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
I | 3 | PT0178 | 7 | 10 | 2018 | 001 | 8 | 1950 | 2 | 2 | 13 | 4 | 1 | 12 | 1945 | 53 | 8 | 3 | 1951 | 51 | 8 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I | 3 | PT0184 | 12 | 30 | 2017 | 001 | 7 | 1953 | 2 | 2 | 12 | 3 | 2 | 8 | 1949 | 66 | 8 | 9 | 1951 | 8 | 2 | 1960 | 8 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I | 3 | PT0190 | 12 | 16 | 2017 | 001 | 8 | 1953 | 1 | 1 | 14 | 1 | 2 | 2 | 1941 | 62 | 9 | 1 | 1943 | 8 | 2 | 1944 | 8 | 4 | 1945 | 66 | 9 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I | 3 | PT0206 | 12 | 19 | 2017 | 001 | 3 | 1951 | 2 | 2 | 16 | 3 | 2 | 9 | 1958 | 53 | 8 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I | 3 | PT0207 | 12 | 19 | 2017 | 001 | 7 | 1949 | 2 | 2 | 18 | 2 | 2 | 4 | 1941 | 65 | 8 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
I | 3 | PT0213 | 12 | 26 | 2017 | 001 | 5 | 1944 | 2 | 2 | 18 | 3 | 2 | 10 | 1942 | 65 | 8 | 12 | 1943 | 73 | 8 | 4 | 1949 | 40 | 8 |
df_uds
with Form A1 + A3 Fields## Observations: 50 ## Variables: 154 ## $ PACKET <chr> "I", "I", "I", "I", "I", "I", "I", "I", "I", "I", "I", … ## $ FORMVER <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3… ## $ PTID <chr> "PT0178", "PT0184", "PT0190", "PT0206", "PT0207", "PT02… ## $ VISITMO <dbl> 7, 12, 12, 12, 12, 12, 9, 10, 10, 11, 2, 12, 12, 9, 12,… ## $ VISITDAY <int> 10, 30, 16, 19, 19, 26, 25, 15, 16, 20, 4, 2, 2, 13, 31… ## $ VISITYR <int> 2018, 2017, 2017, 2017, 2017, 2017, 2017, 2018, 2018, 2… ## $ VISITNUM <chr> "001", "001", "001", "001", "001", "001", "001", "001",… ## $ BIRTHMO <dbl> 8, 7, 8, 3, 7, 5, 12, 7, 10, 9, 1, 8, 11, 6, 11, 1, 10,… ## $ BIRTHYR <dbl> 1950, 1953, 1953, 1951, 1949, 1944, 1953, 1938, 1950, 1… ## $ SEX <int> 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2… ## $ RACE <int> 2, 2, 1, 2, 2, 2, 2, 1, 2, 1, 2, 1, 1, 1, 2, 2, 2, 2, 2… ## $ EDUC <int> 13, 12, 14, 16, 18, 18, 16, 16, 14, 16, 18, 12, 12, 14,… ## $ MARISTAT <int> 4, 3, 1, 3, 2, 3, 5, 2, 2, 1, 3, 1, 2, 3, 2, 2, 3, 1, 3… ## $ HANDED <int> 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2… ## $ SIB1MOB <int> 12, 8, 2, 9, 4, 10, 9, 2, NA, 1, 12, 8, 8, 10, 8, 8, 10… ## $ SIB1YOB <int> 1945, 1949, 1941, 1958, 1941, 1942, 1945, 1929, 1922, 1… ## $ SIB1AGD <int> 53, 66, 62, 53, 65, 65, 71, 72, 72, 80, 63, 64, 64, 67,… ## $ SIB1NEU <int> 8, 8, 9, 8, 8, 8, 8, 8, 8, 1, 5, 1, 1, 1, 8, 8, 5, 8, 8… ## $ SIB1PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, 999, 210, 50, 50, 5… ## $ SIB1MOE <int> NA, NA, NA, NA, NA, NA, 7, NA, NA, 7, 7, 1, 1, 7, NA, N… ## $ SIB1AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, 73, 29, 55, 55, 55,… ## $ SIB2MOB <int> 3, 9, 1, NA, NA, 12, 2, NA, NA, NA, 12, 1, 10, 12, NA, … ## $ SIB2YOB <int> 1951, 1951, 1943, NA, NA, 1943, 1947, NA, 1940, 1930, 1… ## $ SIB2AGD <int> 51, NA, NA, NA, NA, 73, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB2NEU <int> 8, 8, 8, NA, NA, 8, 8, NA, 8, 1, 5, 8, 8, 8, 8, 8, NA, … ## $ SIB2PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, 999, 210, NA, NA, N… ## $ SIB2MOE <int> NA, NA, NA, NA, NA, NA, 7, NA, NA, 7, 7, NA, NA, NA, NA… ## $ SIB2AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 16, NA, NA, NA,… ## $ SIB3MOB <int> NA, 2, 2, NA, NA, 4, 12, NA, NA, NA, NA, 12, 12, 1, NA,… ## $ SIB3YOB <int> NA, 1960, 1944, NA, NA, 1949, 1951, NA, 1950, NA, NA, 1… ## $ SIB3AGD <int> NA, NA, NA, NA, NA, 40, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB3NEU <int> NA, 8, 8, NA, NA, 8, 8, NA, 8, NA, NA, 4, 4, 8, 8, 8, N… ## $ SIB3PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 240, 240, N… ## $ SIB3MOE <int> NA, NA, NA, NA, NA, NA, 7, NA, NA, NA, NA, 7, 7, NA, NA… ## $ SIB3AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 74, 74, NA,… ## $ SIB4MOB <int> NA, NA, 4, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 2, 3… ## $ SIB4YOB <int> NA, NA, 1945, NA, NA, NA, NA, NA, 1952, NA, NA, NA, NA,… ## $ SIB4AGD <int> NA, NA, 66, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB4NEU <int> NA, NA, 9, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, 8, 8,… ## $ SIB4PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB4MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB4AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB5MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, 10, … ## $ SIB5YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, 1935, NA, NA, NA, NA, 1… ## $ SIB5AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, 83, NA, NA, NA, NA, NA,… ## $ SIB5NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, 8, N… ## $ SIB5PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB5MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB5AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB6MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB6YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, 1942, NA, NA, NA, NA, N… ## $ SIB6AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB6NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, … ## $ SIB6PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB6MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB6AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB7MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB7YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, 1945, NA, NA, NA, NA, N… ## $ SIB7AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB7NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, … ## $ SIB7PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB7MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB7AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB8MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB8YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, 1926, NA, NA, NA, NA, N… ## $ SIB8AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB8NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, … ## $ SIB8PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB8MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB8AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB9MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB9YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB9AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, 99, NA, NA, NA, NA, NA,… ## $ SIB9NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, … ## $ SIB9PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB9MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB9AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB10MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB10YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB10AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB10NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, … ## $ SIB10PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB10MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB10AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB11MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB11YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB11AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB11NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, … ## $ SIB11PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB11MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB11AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB12MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB12YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB12AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB12NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, 8, NA, NA, NA, NA, NA, … ## $ SIB12PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB12MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB12AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB13MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB13YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB13AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB13NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB13PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB13MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB13AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB14MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB14YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB14AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB14NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB14PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB14MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB14AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB15MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB15YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB15AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB15NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB15PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB15MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB15AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB16MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB16YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB16AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB16NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB16PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB16MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB16AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB17MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB17YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB17AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB17NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB17PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB17MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB17AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB18MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB18YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB18AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB18NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB18PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB18MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB18AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB19MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB19YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB19AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB19NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB19PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB19MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB19AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB20MOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB20YOB <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB20AGD <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB20NEU <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB20PDX <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB20MOE <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,… ## $ SIB20AGO <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
Extensive use of the pipe operator, %>%
from magrittr
or %>>%
from pipeR
f(x)
is equivalent to x %>% f
f(x, y)
is equivalent to x %>% f(y)
h(g(f(x)))
is equivalent to x %>% f %>% g %>% h
g(x, f(y))
is equivalent to y %>% f %>% g(x, .)
Goals:
Outline basic ideas and glimpse some building blocks
Demonstrate how building blocks can be used with UDS data
… we do get into the weeds
To treat code as data, we need some way to capture the code expressions that are passed to a function before they’re evaluated within the function.
Those familiar with the R tidyverse have seen this in action with dplyr
.
library(dplyr) df_uds_abrv %>% filter(EDUC >= 20)
## # A tibble: 2 x 8 ## PTID BIRTHMO BIRTHYR SEX RACE EDUC MARISTAT HANDED ## <chr> <dbl> <dbl> <int> <int> <int> <int> <int> ## 1 PT0494 1 1936 1 1 20 1 2 ## 2 PT0542 3 1939 1 1 20 1 1
Notice the filter condition EDUC >= 20
isn’t a string. It’s an expression.
EDUC >= 20
is an expression.
"EDUC >= 20"
is a string.
library(dplyr) df_uds_abrv %>% filter(EDUC >= 20)
## # A tibble: 2 x 8 ## PTID BIRTHMO BIRTHYR SEX RACE EDUC MARISTAT HANDED ## <chr> <dbl> <dbl> <int> <int> <int> <int> <int> ## 1 PT0494 1 1936 1 1 20 1 2 ## 2 PT0542 3 1939 1 1 20 1 1
How does dplyr
do this? How does it capture the expression?
Suppose we’d like to build our own function that helps us summarize a data frame (like df_uds
) with some descriptive statistics.
Here’s a simple example with dplyr::summarize
to build on:
df_uds %>% summarize(mean(EDUC))
## # A tibble: 1 x 1 ## `mean(EDUC)` ## <dbl> ## 1 15.8
Instead of finding the mean of all participants, what if we want to group the participants by sex, SEX
?
df_uds %>% group_by(SEX) %>% summarize(mean(EDUC))
## # A tibble: 2 x 2 ## SEX `mean(EDUC)` ## <int> <dbl> ## 1 1 16.7 ## 2 2 15.4
Of course, we could group by other fields like RACE
.
df_uds %>% group_by(RACE) %>% summarize(mean(EDUC))
## # A tibble: 2 x 2 ## RACE `mean(EDUC)` ## <int> <dbl> ## 1 1 15.9 ## 2 2 15.7
Or MARISTAT
.
df_uds %>% group_by(MARISTAT) %>% summarize(mean(EDUC))
## # A tibble: 5 x 2 ## MARISTAT `mean(EDUC)` ## <int> <dbl> ## 1 1 16.3 ## 2 2 15.5 ## 3 3 15.3 ## 4 4 13 ## 5 5 16
Can we create a custom function that allows us to pass whatever grouping variable we want (SEX
, RACE
, MARISTAT
)?
mean_EDUC_group_by <- function(df, group_var) { df %>% group_by(group_var) %>% summarize(mean(EDUC)) }
Call the function using SEX
as a grouping variable.
df_uds %>% mean_EDUC_group_by(SEX)
## Error: Column `group_var` is unknown
Within the function, group_var
is unknown. We’d have to add it explicitly within the function for it to work. But that defeats the purpose.
mean_EDUC_group_by <- function(df, group_var) { df %>% group_by(SEX) %>% summarize(mean(EDUC)) } df_uds %>% mean_EDUC_group_by(RACE) # Doesn't matter what user passes!
## # A tibble: 2 x 2 ## SEX `mean(EDUC)` ## <int> <dbl> ## 1 1 16.7 ## 2 2 15.4
How do we capture the expression SEX
passed to mean_EDUC_group_by
by the user?
We need to use Tidy Eval, specifically the enquo
function from the rlang
package. enquo
captures – or “quotes” – the expression passed by the user.
library(rlang) mean_EDUC_group_by <- function(df, group_var) { group_var_quo <- enquo(group_var) df %>% group_by(group_var_quo) %>% summarize(mean(EDUC)) }
Let’s give the updated mean_EDUC_group_by
function a whirl.
df_uds %>% mean_EDUC_group_by(SEX)
## Error: Column `group_var_quo` is unknown
There’s still an error. Why?
We captured – or “quoted” – the expression SEX
with group_var_quo
.
mean_EDUC_group_by <- function(df, group_var) { group_var_quo <- enquo(group_var) df %>% group_by(group_var_quo) %>% summarize(mean(EDUC)) }
But for group_by
to evaluate group_var_quo
as the expression SEX
, “unquote” group_var_quo
with !!
operator.
mean_EDUC_group_by <- function(df, group_var) { group_var_quo <- enquo(group_var) df %>% group_by(!!group_var_quo) %>% summarize(mean(EDUC)) }
Let’s try again with SEX
.
df_uds %>% mean_EDUC_group_by(SEX)
## # A tibble: 2 x 2 ## SEX `mean(EDUC)` ## <int> <dbl> ## 1 1 16.7 ## 2 2 15.4
And again with RACE
.
df_uds %>% mean_EDUC_group_by(RACE)
## # A tibble: 2 x 2 ## RACE `mean(EDUC)` ## <int> <dbl> ## 1 1 15.9 ## 2 2 15.7
Can we generalize mean_EDUC_group_by
a bit?
What if we want the mean of something other than EDUC
?
Well, we can use the same principles we just applied to the group_var
argument.
group_by_mean <- function(df, group_var, mean_var) { group_var_quo <- enquo(group_var) mean_var_quo <- enquo(mean_var) df %>% group_by(!!group_var_quo) %>% summarize(mean(!!mean_var_quo)) }
Let’s try it out.
df_uds %>% group_by_mean(group_var = SEX, mean_var = EDUC)
## # A tibble: 2 x 2 ## SEX `mean(EDUC)` ## <int> <dbl> ## 1 1 16.7 ## 2 2 15.4
But `mean(EDUC)` as a summary table label is ugly.
Can we improve on this?
group_by_mean <- function(df, group_var, mean_var) { group_var_quo <- enquo(group_var) mean_var_quo <- enquo(mean_var) mean_var_str <- paste0("mean_", quo_name(mean_var_quo)) df %>% group_by(!!group_var_quo) %>% summarize(!!mean_var_str = mean(!!mean_var_quo)) }
## Error: <text>:9:30: unexpected '=' ## 8: group_by(!!group_var_quo) %>% ## 9: summarize(!!mean_var_str = ## ^
group_by_mean <- function(df, group_var, mean_var) { group_var_quo <- enquo(group_var) mean_var_quo <- enquo(mean_var) mean_var_str <- paste0("mean_", quo_name(mean_var_quo)) df %>% group_by(!!group_var_quo) %>% summarize(!!mean_var_str := mean(!!mean_var_quo)) }
When the LHS is an unquoted expression, we need :=
, a special assignment operator.
df_uds %>% group_by_mean(group_var = SEX, mean_var = EDUC)
## # A tibble: 2 x 2 ## SEX mean_EDUC ## <int> <dbl> ## 1 1 16.7 ## 2 2 15.4
df_uds %>% group_by_mean(group_var = RACE, mean_var = SIB1AGD)
## # A tibble: 2 x 2 ## RACE mean_SIB1AGD ## <int> <dbl> ## 1 1 61.8 ## 2 2 58.5
We can pass expressions to functions.
We can capture the expressions in order to manipulate them.
We can use functions/operators from the rlang
package such as enquo
, !!
, quo_name
, and :=
to quote (capture), manipulate, and unquote expressions passed to a function.
Why would metaprogramming be useful for dealing with UDS 3 data?
Form A3 has 140 sibling data fields:
20 possible siblings \(\times\) 7 data points = 140 sibling fields
If we want to do any data validation or simple analysis of all those fields, the code becomes repetitive, error-prone, and hard to maintain.
The basic components of all 140 sibling fields can be expressed simply with code:
sib_base <- "SIB" sib_nums <- 1:20 sib_data <- c("MOB", "YOB", "AGD", "NEU", "PDX", "MOE", "AGO")
The components can then be easily combined:
library(tidyr) # for `crossing` and `unite` sib_fields <- crossing(sib_base, sib_nums, sib_data) %>% # "SIB", 1:20, c("MOB", "YOB", ...) arrange(sib_nums) %>% # to force expected ordering unite(sib_fields, sep = "") %>% pull(sib_fields)
## [1] "SIB1AGD" "SIB1AGO" "SIB1MOB" "SIB1MOE" "SIB1NEU" "SIB1PDX" ## [7] "SIB1YOB" "SIB2AGD" "SIB2AGO" "SIB2MOB" "SIB2MOE" "SIB2NEU" ## [13] "SIB2PDX" "SIB2YOB" "SIB3AGD" "SIB3AGO" "SIB3MOB" "SIB3MOE" ## [19] "SIB3NEU" "SIB3PDX" "SIB3YOB" "SIB4AGD" "SIB4AGO" "SIB4MOB" ## [25] "SIB4MOE" "SIB4NEU" "SIB4PDX" "SIB4YOB" "SIB5AGD" "SIB5AGO" ## [31] "SIB5MOB" "SIB5MOE" "SIB5NEU" "SIB5PDX" "SIB5YOB" "SIB6AGD" ## [37] "SIB6AGO" "SIB6MOB" "SIB6MOE" "SIB6NEU" "SIB6PDX" "SIB6YOB" ## [43] "SIB7AGD" "SIB7AGO" "SIB7MOB" "SIB7MOE" "SIB7NEU" "SIB7PDX" ## [49] "SIB7YOB" "SIB8AGD" "SIB8AGO" "SIB8MOB" "SIB8MOE" "SIB8NEU" ## [55] "SIB8PDX" "SIB8YOB" "SIB9AGD" "SIB9AGO" "SIB9MOB" "SIB9MOE" ## [61] "SIB9NEU" "SIB9PDX" "SIB9YOB" "SIB10AGD" "SIB10AGO" "SIB10MOB" ## [67] "SIB10MOE" "SIB10NEU" "SIB10PDX" "SIB10YOB" "SIB11AGD" "SIB11AGO" ## [73] "SIB11MOB" "SIB11MOE" "SIB11NEU" "SIB11PDX" "SIB11YOB" "SIB12AGD" ## [79] "SIB12AGO" "SIB12MOB" "SIB12MOE" "SIB12NEU" "SIB12PDX" "SIB12YOB" ## [85] "SIB13AGD" "SIB13AGO" "SIB13MOB" "SIB13MOE" "SIB13NEU" "SIB13PDX" ## [91] "SIB13YOB" "SIB14AGD" "SIB14AGO" "SIB14MOB" "SIB14MOE" "SIB14NEU" ## [97] "SIB14PDX" "SIB14YOB" "SIB15AGD" "SIB15AGO" "SIB15MOB" "SIB15MOE" ## [103] "SIB15NEU" "SIB15PDX" "SIB15YOB" "SIB16AGD" "SIB16AGO" "SIB16MOB" ## [109] "SIB16MOE" "SIB16NEU" "SIB16PDX" "SIB16YOB" "SIB17AGD" "SIB17AGO" ## [115] "SIB17MOB" "SIB17MOE" "SIB17NEU" "SIB17PDX" "SIB17YOB" "SIB18AGD" ## [121] "SIB18AGO" "SIB18MOB" "SIB18MOE" "SIB18NEU" "SIB18PDX" "SIB18YOB" ## [127] "SIB19AGD" "SIB19AGO" "SIB19MOB" "SIB19MOE" "SIB19NEU" "SIB19PDX" ## [133] "SIB19YOB" "SIB20AGD" "SIB20AGO" "SIB20MOB" "SIB20MOE" "SIB20NEU" ## [139] "SIB20PDX" "SIB20YOB"
Suppose we want to see some basic descriptives of all the sibling fields (e.g., min, max, median, mean) in order (1) ensure there aren’t any unexpected values, and (2) for continuous variables (AGD, AGO) get a sense of their distributions.
Effectively, we’d like to do 20 siblings \(\times\) 7 data points \(\times\) 4 statistics = 560 calculations.
We could start here…
df_uds %>% summarize( # SIB1MOB SIB1MOB_min = min(SIB1MOB, na.rm = TRUE), SIB1MOB_max = min(SIB1MOB, na.rm = TRUE), SIB1MOB_med = median(SIB1MOB, na.rm = TRUE), SIB1MOB_mean = mean(SIB1MOB, na.rm = TRUE), # SIB1YOB SIB1YOB_min = min(SIB1YOB, na.rm = TRUE), SIB1YOB_max = max(SIB1YOB, na.rm = TRUE), SIB1MOB_med = median(SIB1YOB, na.rm = TRUE), SIB1MOB_mean = mean(SIB1YOB, na.rm = TRUE), # ... # ... at least 548 lines of copy-pasta'ed code goes here # ... # SIB20AGO SIB20AGO_min = min(SIB20AGO, na.rm = TRUE), SIB20AGO_max = max(SIB20AGO, na.rm = TRUE), SIB20AGO_med = median(SIB20AGO, na.rm = TRUE), SIB20AGO_mean = mean(SIB20AGO, na.rm = TRUE) )
Instead, let’s write a function that effectively writes the code for us.
Input:
A data frame of UDS Form A3 data
The functions we’d like to apply
Components of the sibling fields we want to apply functions to
Output:
Nice helper function from tidyr
package for creating all combinations of elements… crossing
.
library(tidyr) crossing(chars = c("a", "b", "c"), nums = 1:2)
## # A tibble: 6 x 2 ## chars nums ## <chr> <int> ## 1 a 1 ## 2 a 2 ## 3 b 1 ## 4 b 2 ## 5 c 1 ## 6 c 2
To paste the combinations together into a single vector, use tidyr::unite
and dplyr::pull
.
crossing(chars = c("a", "b", "c"), nums = 1:2) %>% unite(chars_nums, sep = "") %>% pull(chars_nums)
## [1] "a1" "a2" "b1" "b2" "c1" "c2"
Let’s start writing our function.
my_summary_fxn <- function(df, funcs, ...) { # sib_base <- "SIB" # sib_nums <- 1:2 # sib_data <- c("MOB", "YOB") # Combine user-passed field components crossing(...) }
sib_base <- "SIB" sib_nums <- 1:2 sib_data <- c("MOB", "YOB") my_summary_fxn(df = df_uds, funcs = list(min, max), sib_base, sib_nums, sib_data)
## # A tibble: 4 x 3 ## sib_base sib_nums sib_data ## <chr> <int> <chr> ## 1 SIB 1 MOB ## 2 SIB 1 YOB ## 3 SIB 2 MOB ## 4 SIB 2 YOB
my_summary_fxn <- function(df, funcs, ...) { # sib_base <- "SIB" # sib_nums <- 1:2 # sib_data <- c("MOB", "YOB") # Combine user-passed field components into symbols fields_syms <- crossing(...) %>% unite(fields, sep = "") %>% pull(fields) %>% syms fields_syms }
my_summary_fxn(df = df_uds, funcs = list(min, max), sib_base, sib_nums, sib_data)
## [[1]] ## SIB1MOB ## ## [[2]] ## SIB1YOB ## ## [[3]] ## SIB2MOB ## ## [[4]] ## SIB2YOB
my_summary_fxn <- function(df, funcs, ...) { # Combine user-passed field components into symbols fields_syms <- crossing(...) %>% unite(fields, sep = "") %>% pull(fields) %>% syms # Capture/quote functions as expressions funcs_exprs <- enexpr(funcs)[-1] # list, min, max => min, max as.list(funcs_exprs) }
my_summary_fxn(df = df_uds, funcs = list(min, max), sib_base, sib_nums, sib_data)
## [[1]] ## min ## ## [[2]] ## max
my_summary_fxn <- function(df, funcs, ...) { # Combine user-passed field components into symbols fields_syms <- crossing(...) %>% unite(fields, sep = "") %>% pull(fields) %>% syms # Capture/quote functions as expressions funcs_exprs <- enexpr(funcs)[-1] # list, min, max => min, max # Map over function expressions and field symbols, evaluating with `summarize` map_dfc(funcs_exprs, # min, max function(func_expr) { as_string(func_expr) }) }
my_summary_fxn(df = df_uds, funcs = list(min, max), sib_base, sib_nums, sib_data) %>% pretty_print
V1 | V2 |
---|---|
min | max |
my_summary_fxn(df = df_uds, funcs = list(min, max, median, mean), sib_base, sib_nums, sib_data) %>% pretty_print
V1 | V2 | V3 | V4 |
---|---|---|---|
min | max | median | mean |
my_summary_fxn <- function(df, funcs, ...) { # Combine user-passed field components into symbols fields_syms <- crossing(...) %>% unite(fields, sep = "") %>% pull(fields) %>% syms # Capture/quote functions as expressions funcs_exprs <- enexpr(funcs)[-1] # list, min, max => min, max # Map over function expressions and field symbols, evaluting with `summarize` map_dfc(funcs_exprs, # min, max function(func_expr) { map_dfc(fields_syms, # SIB1MOB, SIB1YOB, ... function(field_sym) { # min, ... SIB1MOB, ... paste0(func_expr, "_", field_sym) }) }) }
my_summary_fxn(df = df_uds, funcs = list(min, max), sib_base, sib_nums, sib_data) %>% pretty_print_scroll
V1 | V2 | V3 | V4 | V11 | V21 | V31 | V41 |
---|---|---|---|---|---|---|---|
min_SIB1MOB | min_SIB1YOB | min_SIB2MOB | min_SIB2YOB | max_SIB1MOB | max_SIB1YOB | max_SIB2MOB | max_SIB2YOB |
my_summary_fxn(df = df_uds, funcs = list(min, max, median, mean), sib_base, sib_nums, sib_data) %>% pretty_print_scroll
V1 | V2 | V3 | V4 | V11 | V21 | V31 | V41 | V12 | V22 | V32 | V42 | V13 | V23 | V33 | V43 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
min_SIB1MOB | min_SIB1YOB | min_SIB2MOB | min_SIB2YOB | max_SIB1MOB | max_SIB1YOB | max_SIB2MOB | max_SIB2YOB | median_SIB1MOB | median_SIB1YOB | median_SIB2MOB | median_SIB2YOB | mean_SIB1MOB | mean_SIB1YOB | mean_SIB2MOB | mean_SIB2YOB |
my_summary_fxn <- function(df, funcs, ...) { # Combine user-passed field components into symbols fields_syms <- crossing(...) %>% unite(fields, sep = "") %>% pull(fields) %>% syms # Capture/quote functions as expressions funcs_exprs <- enexpr(funcs)[-1] # list, min, max => min, max # Map over function expressions and field symbols, evaluting with `summarize` map_dfc(funcs_exprs, # min, max function(func_expr) { map_dfc(fields_syms, # SIB1MOB, SIB1YOB... function(field_sym) { df %>% summarize( # min_SIB1MOB = min(SIB1MOB, na.rm = TRUE) # min, ... SIB1MOB, ... !!paste0(func_expr, "_", field_sym) := # min, ... SIB1MOB, ... (!!func_expr)(!!field_sym, na.rm = TRUE)) }) }) }
my_summary_fxn(df = df_uds, funcs = list(min, max), sib_base, sib_nums, sib_data) %>% pretty_print
min_SIB1MOB | min_SIB1YOB | min_SIB2MOB | min_SIB2YOB | max_SIB1MOB | max_SIB1YOB | max_SIB2MOB | max_SIB2YOB |
---|---|---|---|---|---|---|---|
1 | 1920 | 1 | 1923 | 12 | 1962 | 12 | 1958 |
sib_nums <- 1:3 sib_data <- c("MOB", "YOB", "AGD", "AGO") my_summary_fxn(df = df_uds, funcs = list(min, max, median, mean), sib_base, sib_nums, sib_data) %>% pretty_print_scroll
min_SIB1AGD | min_SIB1AGO | min_SIB1MOB | min_SIB1YOB | min_SIB2AGD | min_SIB2AGO | min_SIB2MOB | min_SIB2YOB | min_SIB3AGD | min_SIB3AGO | min_SIB3MOB | min_SIB3YOB | max_SIB1AGD | max_SIB1AGO | max_SIB1MOB | max_SIB1YOB | max_SIB2AGD | max_SIB2AGO | max_SIB2MOB | max_SIB2YOB | max_SIB3AGD | max_SIB3AGO | max_SIB3MOB | max_SIB3YOB | median_SIB1AGD | median_SIB1AGO | median_SIB1MOB | median_SIB1YOB | median_SIB2AGD | median_SIB2AGO | median_SIB2MOB | median_SIB2YOB | median_SIB3AGD | median_SIB3AGO | median_SIB3MOB | median_SIB3YOB | mean_SIB1AGD | mean_SIB1AGO | mean_SIB1MOB | mean_SIB1YOB | mean_SIB2AGD | mean_SIB2AGO | mean_SIB2MOB | mean_SIB2YOB | mean_SIB3AGD | mean_SIB3AGO | mean_SIB3MOB | mean_SIB3YOB |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 29 | 1 | 1920 | 0 | 16 | 1 | 1923 | 0 | 74 | 1 | 1925 | 94 | 73 | 12 | 1962 | 83 | 59 | 12 | 1958 | 86 | 75 | 12 | 1960 | 64 | 59 | 8 | 1940.5 | 68.5 | 37.5 | 5.5 | 1942.5 | 57 | 74 | 8 | 1945 | 60.06 | 59.33333 | 6.704546 | 1940.333 | 57.44444 | 37.5 | 6.307692 | 1941 | 44.77778 | 74.33333 | 7.31579 | 1944.36 |
Why might metaprogramming be useful for managing UDS 3 data in REDCap?
Small fake dataset using REDCap Collaborative UDS 3.0 data dictionary from KU ADC.
ptid | packet | visitmo | visitday | visityr | sex | fu_sex | tele_sex |
---|---|---|---|---|---|---|---|
PT0001 | I | 1 | 1 | 2015 | 2 | ||
PT0002 | I | 2 | 2 | 2015 | 2 | ||
PT0002 | F | 2 | 2 | 2016 | 2 | ||
PT0003 | I | 3 | 3 | 2015 | 1 | ||
PT0003 | F | 3 | 3 | 2016 | 1 | ||
PT0003 | T | 3 | 3 | 2017 | 1 |
Why might metaprogramming be useful for managing UDS 3 data in REDCap?
20 possible siblings (sib1
- sib20
)
7 data points (mob
, yob
, agd
, neu
, pdx
, moe
, ago
)
3 forms (initial, follow-up fu_
, telephone tele_
)
420 fields
ptid | packet | visitmo | visitday | visityr | sex | fu_sex | tele_sex |
---|---|---|---|---|---|---|---|
PT0001 | I | 1 | 1 | 2015 | 2 | ||
PT0002 | I | 2 | 2 | 2015 | 2 | ||
PT0002 | F | 2 | 2 | 2016 | 2 | ||
PT0003 | I | 3 | 3 | 2015 | 1 | ||
PT0003 | F | 3 | 3 | 2016 | 1 | ||
PT0003 | T | 3 | 3 | 2017 | 1 |
ptid | packet | sib1mob | sib1yob | sib1agd | fu_sib1mob | fu_sib1yob | fu_sib1agd | tele_sib1mob | tele_sib1yob | tele_sib1agd |
---|---|---|---|---|---|---|---|---|---|---|
PT0001 | I | 1 | 1941 | |||||||
PT0002 | I | 2 | 1942 | |||||||
PT0002 | F | 2 | 1942 | |||||||
PT0003 | I | 3 | 1943 | 72 | ||||||
PT0003 | F | 3 | 1943 | 72 | ||||||
PT0003 | T | 3 | 1943 | 72 |
Use dplyr::coalesce
.
df_rc_small %>% mutate(sex = coalesce(sex, fu_sex, tele_sex))
ptid | packet | visitmo | visitday | visityr | sex | fu_sex | tele_sex |
---|---|---|---|---|---|---|---|
PT0001 | I | 1 | 1 | 2015 | 2 | ||
PT0002 | I | 2 | 2 | 2015 | 2 | ||
PT0002 | F | 2 | 2 | 2016 | 2 | 2 | |
PT0003 | I | 3 | 3 | 2015 | 1 | ||
PT0003 | F | 3 | 3 | 2016 | 1 | 1 | |
PT0003 | T | 3 | 3 | 2017 | 1 | 1 |
Remove redundant fields with dplyr::select
.
df_rc_small %>% mutate(sex = coalesce(sex, fu_sex, tele_sex)) %>% select(-fu_sex, -tele_sex)
ptid | packet | visitmo | visitday | visityr | sex |
---|---|---|---|---|---|
PT0001 | I | 1 | 1 | 2015 | 2 |
PT0002 | I | 2 | 2 | 2015 | 2 |
PT0002 | F | 2 | 2 | 2016 | 2 |
PT0003 | I | 3 | 3 | 2015 | 1 |
PT0003 | F | 3 | 3 | 2016 | 1 |
PT0003 | T | 3 | 3 | 2017 | 1 |
Easy enough with one field represented in initial, follow-up, and telephone forms.
What about our big REDCap dataset from Form A3?
ptid | packet | sib1mob | sib1yob | sib1agd | fu_sib1mob | fu_sib1yob | fu_sib1agd | tele_sib1mob | tele_sib1yob | tele_sib1agd |
---|---|---|---|---|---|---|---|---|---|---|
PT0001 | I | 1 | 1941 | |||||||
PT0002 | I | 2 | 1942 | |||||||
PT0002 | F | 2 | 1942 | |||||||
PT0003 | I | 3 | 1943 | 72 | ||||||
PT0003 | F | 3 | 1943 | 72 | ||||||
PT0003 | T | 3 | 1943 | 72 |
Brute force approach.
df_rc_big %>% mutate(sib1mob = coalesce(sib1mob, fu_sib1mob, tele_sib1mob), sib1yob = coalesce(sib1yob, fu_sib1yob, tele_sib1yob), sib1agd = coalesce(sib1agd, fu_sib1agd, tele_sib1agd)) %>% select(-fu_sib1mob, -tele_sib1mob, -fu_sib1yob, -tele_sib1yob, -fu_sib1agd, -tele_sib1agd)
ptid | packet | sib1mob | sib1yob | sib1agd |
---|---|---|---|---|
PT0001 | I | 1 | 1941 | |
PT0002 | I | 2 | 1942 | |
PT0002 | F | 2 | 1942 | |
PT0003 | I | 3 | 1943 | 72 |
PT0003 | F | 3 | 1943 | 72 |
PT0003 | T | 3 | 1943 | 72 |
Brute force will be daunting once we start including the 19 other possible siblings (sib2
–sib20
) and the 4 other fields (neu
, pdx
, moe
, ago
).
Tidy Eval to the rescue!
Input
df_rc_big
irrelevant fields vector, c("ptid", "packet")
ptid | packet | sib1mob | sib1yob | sib1agd | fu_sib1mob | fu_sib1yob | fu_sib1agd | tele_sib1mob | tele_sib1yob | tele_sib1agd |
---|---|---|---|---|---|---|---|---|---|---|
PT0001 | I | 1 | 1941 | |||||||
PT0002 | I | 2 | 1942 | |||||||
PT0002 | F | 2 | 1942 | |||||||
PT0003 | I | 3 | 1943 | 72 | ||||||
PT0003 | F | 3 | 1943 | 72 | ||||||
PT0003 | T | 3 | 1943 | 72 |
Output
ptid | packet | sib1mob | sib1yob | sib1agd |
---|---|---|---|---|
PT0001 | I | 1 | 1941 | |
PT0002 | I | 2 | 1942 | |
PT0002 | F | 2 | 1942 | |
PT0003 | I | 3 | 1943 | 72 |
PT0003 | F | 3 | 1943 | 72 |
PT0003 | T | 3 | 1943 | 72 |
coalesce_all_ift_fields <- function(df, irrel_fields) { # Get initial visit field names, follow-up names, and telephone names ift_fields <- names(df)[-which(irrel_fields %in% names(df))] # Reduce initial, follow-up, telephone visit fields to initial fields only i_fields <- reduce_ift_fieldnames(ift_fields) # Convert initial field strings to symbol expressions i_fields_syms <- syms(i_fields) # Map over intial visit field symbols, coalescing IFT fields # Each iteration returns coalesced field, column-bound to other coal'd fields map_dfc(i_fields_syms, function(i_field_sym) { f_field_sym <- sym(paste0("fu_", i_field_sym)) t_field_sym <- sym(paste0("tele_", i_field_sym)) df %>% select(!!i_field_sym, !!f_field_sym, !!t_field_sym) %>% mutate(!!i_field_sym := coalesce(!!i_field_sym, !!f_field_sym, !!t_field_sym)) %>% select(-!!f_field_sym, -!!t_field_sym) }) %>% # attach `irrel_fields` to the front of the returned data frame bind_cols(df[, irrel_fields], .) }
irrel_fields <- c("ptid", "packet") df_rc_big %>% coalesce_all_ift_fields(irrel_fields) %>% pretty_print
ptid | packet | sib1mob | sib1yob | sib1agd |
---|---|---|---|---|
PT0001 | I | 1 | 1941 | |
PT0002 | I | 2 | 1942 | |
PT0002 | F | 2 | 1942 | |
PT0003 | I | 3 | 1943 | 72 |
PT0003 | F | 3 | 1943 | 72 |
PT0003 | T | 3 | 1943 | 72 |
ptid | packet | sib1mob | sib1yob | sib1agd | fu_sib1mob | fu_sib1yob | fu_sib1agd | tele_sib1mob | tele_sib1yob | tele_sib1agd |
---|---|---|---|---|---|---|---|---|---|---|
PT0001 | I | 1 | 1941 | |||||||
PT0002 | I | 2 | 1942 | |||||||
PT0002 | F | 2 | 1942 | |||||||
PT0003 | I | 3 | 1943 | 72 | ||||||
PT0003 | F | 3 | 1943 | 72 | ||||||
PT0003 | T | 3 | 1943 | 72 |
ptid | packet | sib1mob | sib1yob | sib1agd |
---|---|---|---|---|
PT0001 | I | 1 | 1941 | |
PT0002 | I | 2 | 1942 | |
PT0002 | F | 2 | 1942 | |
PT0003 | I | 3 | 1943 | 72 |
PT0003 | F | 3 | 1943 | 72 |
PT0003 | T | 3 | 1943 | 72 |
Let’s test this new function on the small REDCap dataset.
Input
df_rc_small
irrelevant fields, c("ptid", "packet", "visitmo", "visitday", "visityr")
ptid | packet | visitmo | visitday | visityr | sex | fu_sex | tele_sex |
---|---|---|---|---|---|---|---|
PT0001 | I | 1 | 1 | 2015 | 2 | ||
PT0002 | I | 2 | 2 | 2015 | 2 | ||
PT0002 | F | 2 | 2 | 2016 | 2 | ||
PT0003 | I | 3 | 3 | 2015 | 1 | ||
PT0003 | F | 3 | 3 | 2016 | 1 | ||
PT0003 | T | 3 | 3 | 2017 | 1 |
And…
df_rc_small %>% coalesce_all_ift_fields(irrel_fields) %>% pretty_print
ptid | packet | visitmo | visitday | visityr | sex |
---|---|---|---|---|---|
PT0001 | I | 1 | 1 | 2015 | 2 |
PT0002 | I | 2 | 2 | 2015 | 2 |
PT0002 | F | 2 | 2 | 2016 | 2 |
PT0003 | I | 3 | 3 | 2015 | 1 |
PT0003 | F | 3 | 3 | 2016 | 1 |
PT0003 | T | 3 | 3 | 2017 | 1 |
ptid | packet | visitmo | visitday | visityr | sex | fu_sex | tele_sex |
---|---|---|---|---|---|---|---|
PT0001 | I | 1 | 1 | 2015 | 2 | ||
PT0002 | I | 2 | 2 | 2015 | 2 | ||
PT0002 | F | 2 | 2 | 2016 | 2 | ||
PT0003 | I | 3 | 3 | 2015 | 1 | ||
PT0003 | F | 3 | 3 | 2016 | 1 | ||
PT0003 | T | 3 | 3 | 2017 | 1 |
ptid | packet | visitmo | visitday | visityr | sex |
---|---|---|---|---|---|
PT0001 | I | 1 | 1 | 2015 | 2 |
PT0002 | I | 2 | 2 | 2015 | 2 |
PT0002 | F | 2 | 2 | 2016 | 2 |
PT0003 | I | 3 | 3 | 2015 | 1 |
PT0003 | F | 3 | 3 | 2016 | 1 |
PT0003 | T | 3 | 3 | 2017 | 1 |
Advanced R (2nd edition) by Hadley Wickham
rlang
package
rlang
cheatsheet
My GitHub with this slidestack
Elizabeth Robichaud @ NACC, Delilah Cook @ Wake Forest
Mark Espeland & ADRC Data Core Steering Committee
Suzanne Hunt & University of Kansas ADC
NACC
For reference…
Given the field names
sib1mob
, sib1yob
, sib1agd
,
fu_sib1mob
, fu_sib1yob
, fu_sib1agd
,
tele_sib1mob
, tele_sib1yob
, tele_sib1agd
reduce_ift_fieldnames
returns
sib1mob
, sib1yob
, sib1agd