Methods Inf Med 1993; 32(02): 131-136
DOI: 10.1055/s-0038-1634906
Original Article
Schattauer GmbH

Discrimination and Reproducibility of an Information Maximizing Multivariable Model

P. S. Heckerling
1   Department of Medicine, University of Illinois, Chicago Ill
,
R. C. Conant
2   Department of Electrical Engineering and Computer Science, University of Illinois, Chicago Ill
,
Th. G. Tape
3   Department of Internal Medicine, University of Nebraska, Omaha Nebr, USA
,
R. S. Wigton
3   Department of Internal Medicine, University of Nebraska, Omaha Nebr, USA
› Author Affiliations
Further Information

Publication History

Publication Date:
08 February 2018 (online)

Abstract:

Predictor variables for multivariate rules are frequently selected by methods that maximize likelihood rather than information. We compared the discrimination and reproducibility of a prediction rule for pneumonia derived using extended dependency analysis (EDA), an information maximizing variable selection program, with that of a validated rule derived using logistic regression. Discrimination was measured by receiver-operating characteristic (ROC) analysis, and reproducibility by rederivation of the rule on 200 replicate samples of size 250 and 500, generated from a training cohort of 905 patients using Monte Carlo techniques.

Four of the five predictor variables selected by EDA were identical to those selected by logistic regression. With each variable weighted by its conditional contribution to total information transmission, EDA discriminated pneumonia and nonpneumonia in the training cohort with an ROC area of 0.800 (vs 0.816 for logistic regression, p = 0.60), and in the validation cohort with an area of 0.822 (vs 0.821 for logistic regression, p = 0.98). EDA demonstrated reproducibility comparable to that of logistic regression according to most criteria for replicability. Replicate EDA models showed good discrimination in the training and testing cohorts, and met statistical criteria for validation (no significant difference in ROC areas at a one-tailed alpha level of 0.05) in 80.8% to 94.2% of cases.

We conclude that extended dependency analysis selected the most important variables for predicting pneumonia, based on a validated logistic regression model. The information-theoretic model showed good discriminatory power, and demonstrated reproducibility according to clinically reasonable criteria. Information-theoretic variable selection by extended dependency analysis appears to be a reasonable basis for developing clinical prediction rules.

 
  • REFERENCES

  • 1 Hosmer DW, Lemeshow S. Applied Logistic Regression . New York: Wiley; 1989
  • 2 Lachenbruch PA. Discriminant Analysis . New York: Hafner Press; 1975
  • 3 Breiman L, Friedman JH, Olshen RA. et al. Classification and Regression Trees . Belmont Cal: Wadsworth International Group; 1984
  • 4 Christensen R. Entropy Minimax Sourcebook (Vol 1: General Description). Lincoln Mass: Entropy Limited; 1981
  • 5 Shannon CE, Weaver W. The Mathematical Theory of Communication . Chicago: University of Illinois Press; 1949
  • 6 Wasson JH, Sox HC, Neff RK, Goldman L. Clinical prediction rules: applications and methodologic standards. N Engl J Med 1985; 313: 793-9.
  • 7 Heckerling PS, Tape TG, Wigton RS. et al. Clinical prediction rule for pulmonary infiltrates. Ann Intern Med 1990; 113: 664-70.
  • 8 Conant RC. Detection and analysis of dependency structures. Int J General Systems 1981; 07: 81-91.
  • 9 Barnoon S, Wolfe H. Measuring the Effectiveness of Medical Decisions: An Operations Research Approach. Springfield: Charles C Thomas; 1972: 53-73.
  • 10 Diamond GA, Hirsch M, Forrester JS. et al. Application of information theory to clinical diagnostic testing: the electrocardiographic stress test. Circulation 1981; 63: 915-21.
  • 11 Rifkin RD. Maximum Shannon information content of diagnostic medical testing: including application to multiple non-independent tests. Med Decis Making 1985; 05: 179-90.
  • 12 Heckerling PS. Information content of diagnostic tests in the medical literature. Meth Inform Med 1990; 29: 61-6.
  • 13 Miller G. Note on the bias of information estimates. In: Quaster H. ed. Information Theory in Psychology. Glencoe Ill: The Free Press; 1955
  • 14 Broekstra G. C-analysis of C-structures. Int J General Systems 1981; 07: 33-61.
  • 15 Engleman L. Stepwise logistic regression. In: Dixon WJ, Brown MB, Engelman L, Hill MA, Jennrich RI. BMDP Statistical Software Manual. Vol 2. Berkeley Cal: University of California Press; 1988: 941-69.
  • 16 Green D, Swets J. Signal Detection Theory and Psychophysics . New York: Wiley; 1966: 45-9.
  • 17 Dorfman DD, Alf E. Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals: rating-method data. J Math Psychol 1969; 06: 487-96.
  • 18 Heckerling PS, Conant RC, Tape TG, Wigton RS. Reproducibility of predictor variables from a validated clinical rule. Med Decis Making 1992; 12: 280-5.
  • 19 Reichert TA, Christensen RA. Validated predictons of survival in coronary artery disease. In: Entrophy Minimax Sourcebook. Vol 4. Lincoln Mass: Entropy Limited; 1981: 457-90.
  • 20 Harrell FE, Lee KL, Califf RM, Pryor DB, Rosati RA. Regression modelling strategies for improved prognostic prediction. Stat Med 1984; 03: 143-52.
  • 21 Harrell FE, Lee KL, Matchar DB, Reichert TA. Regression models for prognostic prediction: advantages, problems, and suggested solutions. Cancer Treat Rep 1985; 69: 1071-7.
  • 22 Reichert TA, Krieger AJ. Quantitative certainty in differential diagnosis. In: Proceedings of the Second International Joint Conference on Pattern Recognition. Copenhagen: IEEE 74 CHO 885-4C; 1974: 434-7.
  • 23 Glasziou P, Hilden J. Test selection measures. Med Decis Making 1989; 09: 133-41.