Methods Inf Med 2015; 54(06): 505-514
DOI: 10.3414/ME14-01-0113
Original Articles
Schattauer GmbH

Analysis of Clinical Cohort Data Using Nested Case-control and Case-cohort Sampling Designs

A Powerful and Economical Tool
K. Ohneberg
1   Institute for Medical Biometry and Statistics, Medical Center University of Freiburg, Freiburg, Germany
2   Freiburg Center for Data Analysis and Modelling, University of Freiburg, Freiburg, Germany
,
M. Wolkewitz
1   Institute for Medical Biometry and Statistics, Medical Center University of Freiburg, Freiburg, Germany
2   Freiburg Center for Data Analysis and Modelling, University of Freiburg, Freiburg, Germany
,
J. Beyersmann
3   Institute of Statistics, Ulm University, Ulm, Germany
,
M. Palomar-Martinez
4   Hospital Universitari Arnau de Vilanova, Lleida, Spain
5   Universitat Autónoma de Barcelona, Barcelona, Spain
,
P. Olaechea-Astigarraga
6   Service of Intensive Care Medicine, Hospital de Galdakao-Usansolo, Bizkaia, Spain
,
F. Alvarez-Lerma
7   Service of Intensive Care Medicine, Parc de Salut Mar, Barcelona, Spain
,
M. Schumacher
1   Institute for Medical Biometry and Statistics, Medical Center University of Freiburg, Freiburg, Germany
› Author Affiliations
Further Information

Publication History

received: 06 November 2014

accepted: 13 April 2015

Publication Date:
23 January 2018 (online)

Summary

Background: Sampling from a large cohort in order to derive a subsample that would be sufficient for statistical analysis is a frequently used method for handling large data sets in epidemiological studies with limited resources for exposure measurement. For clinical studies however, when interest is in the influence of a potential risk factor, cohort studies are often the first choice with all individuals entering the analysis.

Objectives: Our aim is to close the gap between epidemiological and clinical studies with respect to design and power considerations. Schoenfeld’s formula for the number of events required for a Cox’ proportional hazards model is fundamental. Our objective is to compare the power of analyzing the full cohort and the power of a nested case- control and a case-cohort design.

Methods: We compare formulas for power for sampling designs and cohort studies. In our data example we simultaneously apply a nested case-control design with a varying number of controls matched to each case, a case cohort design with varying subcohort size, a random subsample and a full cohort analysis. For each design we calculate the standard error for estimated regression coefficients and the mean number of distinct persons, for whom covariate information is required.

Results: The formula for the power of a nested case-control design and the power of a case-cohort design is directly connected to the power of a cohort study using the well known Schoenfeld formula. The loss in precision of parameter estimates is relatively small compared to the saving in resources.

Conclusions: Nested case-control and case-cohort studies, but not random subsamples yield an attractive alternative for analyzing clinical studies in the situation of a low event rate. Power calculations can be conducted straightforwardly to quantify the loss of power compared to the savings in the number of patients using a sampling design instead of analyzing the full cohort.

 
  • References

  • 1 Liddell FDK, McDonald JC, Thomas DC. Methods of cohort analysis: appraisal by application to asbestos mining. Journal of the Royal Statistical Society 1977; Series A 140: 469-491
  • 2 Langholz B. Borgan Ø. Countermatching: A stratified nested case-control sampling method. Biometrika 1995; 82 (01) 69-79
  • 3 Schoenfeld D. The asymptotic properties of nonparametric tests for comparing survival distributions. Biometrika 1981; 68 (01) 316-319
  • 4 Schoenfeld DA. Sample-size formula for the proportional-hazards regression model. Biometrics 1983; 39: 499-503
  • 5 Ohneberg K, Schumacher M. Sample Size Calculations for Clinical Trials. In: Ibrahim J, Klein J, Scheike T, van Houwelingen HC. editors. Handbook of Survival Analysis Chapman & Hall/CRC; 2013: 571-594
  • 6 Klein JP, van Houwelingen HC, Ibrahim JG, Scheike TH. Handbook of Survival Analysis. Chapman & Hall/CRC; 2013
  • 7 Prentice RL, Breslow NE. Retrospective studies and failure time models. Biometrika Trust 1978; 65 (01) 153-158
  • 8 Pearce N. Incidence density matching with a simple SAS computer program. Int J Epidemiol 1989; 18: 981-984
  • 9 Richardson DB. An incidence density sampling program for nested case-control analyses. Occup Environ Med 2004; 61: e59
  • 10 Breslow NE. Statistics in Epidemiology: The Case-Control Study. Journal of the American Statistical Association 1996; 91 (433) 14-28
  • 11 Goldstein L, Langholz B. Asymptotic theory for nested case-control sampling in the Cox regression model. The Annals of Statistics 1992; 20 (04) 1903-1928
  • 12 Prentice RL. A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 1986; 73 (01) 1
  • 13 Barlow WE, Ichikawa L, Rosner D, Izumi S. Analysis of case-cohort designs. J Clin Epidemiol 1999; 52: 1165-1172
  • 14 Cox DR. Regression Models and Life-tables (with Discussion). Journal of the Royal Statistical Society, Series B: Methodological 1972; 34: 187-220
  • 15 Cox DR. Partial likelihood. Biometrika 1975; 62: 269-276
  • 16 Self SG, Prentice RL. Asymptotic distribution theory and efficiency results for case-cohort studies. The Annals of Statistics 1988; 16 (01) 64-81
  • 17 Barlow WE. Robust variance estimation for the case-cohort design. Biometrics 1994; 50: 1064-1072
  • 18 Petersen L, Sørensen TI, Andersen PK. Comparison of case-cohort estimators based on data on premature death of adult adoptees. Statistics in Medicine 2003; 22 (24) 3795-3803
  • 19 Onland-Moret NC, van der A DL, van der Schouw YT, Buschers W, Elias SG, van Gils CH. et al. Analysis of case-cohort data: A comparison of different methods. Journal of clinical epidemiology 2007; 60 (04) 350-355
  • 20 Breslow NE, Lumley T, Ballantyne CM, Chambless LE, Kulich M. Using the whole cohort in the analysis of case-cohort data. American Journal of Epidemiology 2009; 169 (11) 1398-1405
  • 21 Schöttker B, Jorde R, Peasey A, Thorand B, Jansen EH, de Groot L. et al. Vitamin D and mortality: meta-analysis of individual participant data from a large consortium of cohort studies from Europe and the United States. BMJ 2014; 348: g3656
  • 22 Lachin JM. Sample size evaluation for a multiply matched case-control study using the score test from a conditional logistic (discrete Cox PH) regression model. Statistics in Medicine 2008; 27 (14) 2509-2523
  • 23 Ury HK. Efficiency of case-control studies with multiple controls per case: continuous or dichotomous data. Biometrics 1975: 643-649
  • 24 Pang D. A relative power table for nested matched case-control studies. Occupational and environmental medicine 1999; 56 (01) 67-69
  • 25 Cai J, Zeng D. Sample size/power calculation for case-cohort studies. Biometrics 2004; 60 (04) 1015-1024
  • 26 Wolkewitz M, Cooper BS, Palomar-Martinez M. Olaechea-Astigarraga P, Alvarez-Lerma F, Schumacher M. Nested case-control studies in cohorts with competing events. Epidemiology 2014; 25 (01) 122-125
  • 27 Langholz B, Thomas DC. Nested case-control and case-cohort methods of sampling from a cohort: a critical comparison. American Journal of Epidemiology 1990; 131 (01) 169-176
  • 28 Breslow N, Lubin J, Marek P, Langholz B. Multiplicative models and cohort analysis. Journal of the American Statistical Association 1983; 78 (381) 1-12
  • 29 Langholz B, Thomas DC. Efficiency of cohort sampling designs: Some surprising results. Biometrics 1991; 47: 1563-1571
  • 30 Borgan O, Samuelsen SO. Nested Case-Control and Case-Cohort Studies. In Ibrahim J, Klein J, Scheike T, van Houwelingen HC. editors. Handbook of Survival Analysis.. Chapman & Hall/CRC 2013: 343-367
  • 31 Kulathinal S, Karvanen J, Saarela O. Kuulasmaa for the MORGAM Project K Case-cohort design in practice - experiences from the MORGAM Project. Epidemiologic Perspectives & Innovations 2007; 4 (01) 15
  • 32 Saarela O, Kulathinal S, Arjas E, Läärä E. Nested case-control data utilized for multiple outcomes: a likelihood approach and alternatives. Statistics in Medicine 2008; 27 (28) 5991-6008
  • 33 Samuelsen SO. A pseudo-likelihood approach to analysis of nested case-control studies. Biometrika 1997; 84 (02) 379-394
  • 34 Støer NC, Samuelsen SO. Comparison of estimators in nested case- control studies with multiple outcomes. Lifetime Data Analysis 2012; 18 (03) 261-283
  • 35 Repsilber D, Fink L, Jacobsen M, Bläsing O, Ziegler A. Sample selection for microarray gene expression studies. Methods Inf Med 2005; 44 (03) 461-467
  • 36 Stürmer T, Gefeller O, Brenner H. A computer program to estimate power and relative efficiency of flexibly matched case-control studies. Methods Inf Med 2005; 44: 693-696
  • 37 Schröder M, Hüsing J, Jöckel K. An implementation of automated individual matching for observational studies. Methods Inf Med 2004; 43 (05) 516-520
  • 38 Wolkewitz M, Beyersmann J, Gastmeier P, Schumacher M. Efficient risk set sampling when a time-dependent exposure is present: matching for time to exposure versus exposure density sampling. Methods Inf Med 2009; 48: 438-443