Flexible and fast matching functions.
Usage
cc_match(
cases,
controls,
by,
id,
no_controls,
replace = TRUE,
seed = NULL,
return_case_values = FALSE,
verbose = TRUE
)
Arguments
- cases
A
data.frame
including cases.- controls
A
data.frame
including potential controls.- by
A list of matching factors and their accompanying matching criteria either supplied as a function or as 'exact', if exact matching is requested for a specific variable. The matching function should take two arguments as input where the first argument is the value of the case and the second argument is the value of the control. The function
function(x,y) abs(x - y) < 1
can for example be used to match cases to compactors with a maximal age difference of one year. The functionfunction(x,y) y >= x
can be used for concurrent matching, i.e., requiering the control to be alive at the time of matching.Example:
- id
Name of the variable that identifies unique observations.
- no_controls
The number of controls that should be matched to each case.
- replace
Logical indicator if sampling of controls should be done with or without replacement. Default:
FALSE
.- seed
Optional seed used for the matching. Useful for reproducibility.
- return_case_values
If
TRUE
the case value of the matching variable will be returned in the output data set, e.g., if you match on time, the controls in the output dataset will have a variablecase_time
which has the value of the case's time variable. This can be useful if you would like to start follow-up at the time of matching in your later analysis.- verbose
Logical indicator if default checks should be printed. Default:
TRUE
.
Value
A dataset of matched cases and controls.
id
Individuals unique id
case
Case indicator
riskset
Riskset number/identifier
Details
The function returns a warning if the number of available controls is smaller than the number of controls requested for each case. In this case all available controls are matched to the case.
If there are no matched available for a particular case, the case will
be removed from the output dataset and a warning will be shown indicating
the id
of the case that has been removed.
If you want to match exact on a variable, please use the 'exact' indicator as shown above. This will improve the computation time.
Examples
require(rstpm2)
#> Loading required package: rstpm2
#> Loading required package: survival
#> Loading required package: splines
#>
#> Attaching package: ‘rstpm2’
#> The following object is masked from ‘package:survival’:
#>
#> colon
risksets <- cc_match(cases = brcancer[brcancer$hormon == 1, ],
controls = brcancer[brcancer$hormon == 0, ],
id = "id",
by = list(x2 = "exact",
rectime = function(x, y) y >= x),
no_controls = 1,
replace = TRUE)
#> Warning: ✖ No controls available for 236.
#> ℹ I'll remove this control form the output dataset.
#> Warning: ✖ No controls available for 272.
#> ℹ I'll remove this control form the output dataset.
#> Warning: ✖ No controls available for 275.
#> ℹ I'll remove this control form the output dataset.
#> Warning: ✖ No controls available for 468.
#> ℹ I'll remove this control form the output dataset.
#> ----------------------------------------
#> No. of comparators: 242
#> No. of unique comparators: 124
#> Proportion of unique compartors: 51.2 %
#> ----------------------------------------
#>
head(risksets)
#> id case riskset
#> 1 2 1 1
#> 2 432 0 1
#> 3 3 1 2
#> 4 240 0 2
#> 5 4 1 3
#> 6 46 0 3