Methods Inf Med 1994; 33(02): 180-186
DOI: 10.1055/s-0038-1635010
Original Article
Schattauer GmbH

How to Correct for Chance Agreement in the Estimation of Sensitivity and Specificity of Diagnostic Tests

O. Gefeller
1   Department of Medical Statistics, University of Göttingen
,
H. Brenner
2   Unit of Epidemiology, University of Ulm, Germany
› Author Affiliations
Further Information

Publication History

Publication Date:
08 February 2018 (online)

Abstract:

The traditional concept of describing the validity of a diagnostic test neglects the presence of chance agreement between test result and true (disease) status. Sensitivity and specificity, as the fundamental measures of validity, can thus only be considered in conjunction with each other to provide an appropriate basis for the evaluation of the capacity of the test to discriminate truly diseased from truly undiseased subjects. In this paper, chance-corrected analogues of sensitivity and specificity are presented as supplemental measures of validity, which pay attention to the problem of chance agreement and offer the opportunity to be interpreted separately. While recent proposals of chance-correction techniques, suggested by several authors in this context, lead to measures which are dependent on disease prevalence, our method does not share this major disadvantage. We discuss the extension of the conventional ROC-curve approach to chance-corrected measures of sensitivity and specificity. Furthermore, point and asymptotic interval estimates of the parameters of interest are derived under different sampling frameworks for validation studies. The small sample behavior of the estimates is investigated in a simulation study, leading to a logarithmic modification of the interval estimate in order to hold the nominal confidence level for small samples.

 
  • REFERENCES

  • 1 Gerhardt W, Keller H. Evaluation of test data from clinical studies. Scand J Clin Lab Invest 1986; 46 (Suppl): 1-74.
  • 2 Abel U. Die Bewertung diagnostischer Tests. Stuttgart: Hippokrates Verlag; 1993
  • 3 Cohen J. A coefficient of agreement for nominal scales. Educ Psychol Meas 1960; 20: 37-46.
  • 4 Fleiss JL. Measuring agreement between two judges on the presence or absence of trait. Biometrics 1975; 31: 651-9.
  • 5 Thompson WD, Walter SD. A reappraisal of the kappa coefficient. J Clin Epidemiol 1988; 41: 949-58.
  • 6 Kraemer HC. Assessment of 2 × 2 associations: generalization of signal-detection methodology. Am Statist 1988; 42: 37-49.
  • 7 Kraemer HC, Bloch DA. Kappa coefficients in epidemiology: an appraisal of a reappraisal. J Clin Epidemiol 1988; 41: 959-68.
  • 8 Bloch DA, Kraemer HC. 2 × 2 Kappa coefficients: measures of agreement or association. Biometrics 1989; 45: 269-87.
  • 9 Coughlin SS, Pickle LW. Sensitivity and specificity-like measures of the validity of a diagnostic test that are corrected for chance agreement. Epidemiology 1992; 03: 178-81.
  • 10 Jamart J. Chance-corrected sensitivity and specificity for three-zone diagnostic tests. J Clin Epidemiol 1992; 45: 1035-9.
  • 11 Jamart J. Rejoinder: chance-corrected sensitivity and specificity for three-zone diagnostic tests. J Clin Epidemiol 1993; 46: 206.
  • 12 Metz CE. Basic principles of ROC analysis. Seminars in Nuclear Med 1978; 08: 283-98.
  • 13 Goodman LA, Kruskal WH. Measures of Associations for Cross-Classifications. New York: Springer; 1979
  • 14 White A, Landis JR. A general categorical data methodology for evaluating medical diagnostic tests. Commun Stat Theor Meth 1982; 11: 567-605.
  • 15 Bishop YMM, Fienberg SE, Holland PW. Discrete Multivariate Analysis: Theory and Practice. Cambridge: MIT Press; 1975.
  • 16 Gefeller O, Brenner H. Results of a simulation study to analyze the small sample behaviour of chance-corrected measures of validity. Technical Report No. 93/07. University of Göttingen; 1993
  • 17 Simel DL, Matchar DB, Feussner JR. Diagnostic tests are not always black or white: or, all that glitters is not [a] gold [standard]. J Clin Epidemiol 1991; 44: 967-70.
  • 18 Schulzer M, Anderson DR, Drance SM. Sensitivity and specificity of a diagnostic test determined by repeated observations in the absence of an external standard. J Clin Epidemiol 1991; 44: 1167-79.
  • 19 Bennett BM. Further results on indices of diagnostic screening. Biom J 1983; 24: 59-62.
  • 20 Katz D, Baptista I, Azen SP, Pike MC. Obtaining confidence intervals for the risk ratio in cohort studies. Biometrics 1978; 34: 469-74.