Skip to contents

Flexible and fast matching functions.

Usage

cc_match(
  cases,
  controls,
  by,
  id,
  no_controls,
  replace = TRUE,
  seed = NULL,
  return_case_values = FALSE,
  verbose = TRUE
)

Arguments

cases

A data.frame including cases.

controls

A data.frame including potential controls.

by

A list of matching factors and their accompanying matching criteria either supplied as a function or as 'exact', if exact matching is requested for a specific variable. The matching function should take two arguments as input where the first argument is the value of the case and the second argument is the value of the control. The function function(x,y) abs(x - y) < 1 can for example be used to match cases to compactors with a maximal age difference of one year. The function function(x,y) y >= x can be used for concurrent matching, i.e., requiering the control to be alive at the time of matching.

Example:

list(sex  = 'exact',
     age  = function(x,y) abs(x - y) < 1,
     time = function(x,y) y >= x)

id

Name of the variable that identifies unique observations.

no_controls

The number of controls that should be matched to each case.

replace

Logical indicator if sampling of controls should be done with or without replacement. Default: FALSE.

seed

Optional seed used for the matching. Useful for reproducibility.

return_case_values

If TRUE the case value of the matching variable will be returned in the output data set, e.g., if you match on time, the controls in the output dataset will have a variable case_time which has the value of the case's time variable. This can be useful if you would like to start follow-up at the time of matching in your later analysis.

verbose

Logical indicator if default checks should be printed. Default: TRUE.

Value

A dataset of matched cases and controls.

id

Individuals unique id

case

Case indicator

riskset

Riskset number/identifier

Details

The function returns a warning if the number of available controls is smaller than the number of controls requested for each case. In this case all available controls are matched to the case.

If there are no matched available for a particular case, the case will be removed from the output dataset and a warning will be shown indicating the id of the case that has been removed.

If you want to match exact on a variable, please use the 'exact' indicator as shown above. This will improve the computation time.

Examples


require(rstpm2)
#> Loading required package: rstpm2
#> Loading required package: survival
#> Loading required package: splines
#> 
#> Attaching package: ‘rstpm2’
#> The following object is masked from ‘package:survival’:
#> 
#>     colon

risksets <- cc_match(cases    = brcancer[brcancer$hormon == 1, ],
                     controls = brcancer[brcancer$hormon == 0, ],
                     id = "id",
                     by = list(x2 = "exact",
                               rectime = function(x, y) y >= x),
                     no_controls = 1,
                     replace = TRUE)
#> Warning:  No controls available for 236.
#>  I'll remove this control form the output dataset.
#> Warning:  No controls available for 272.
#>  I'll remove this control form the output dataset.
#> Warning:  No controls available for 275.
#>  I'll remove this control form the output dataset.
#> Warning:  No controls available for 468.
#>  I'll remove this control form the output dataset.
#> ----------------------------------------
#> No. of comparators: 242 
#> No. of unique comparators: 124 
#> Proportion of unique compartors: 51.2 %
#> ----------------------------------------
#> 

head(risksets)
#>    id case riskset
#> 1   2    1       1
#> 2 432    0       1
#> 3   3    1       2
#> 4 240    0       2
#> 5   4    1       3
#> 6  46    0       3