Methods Inf Med 2008; 47(04): 356-363
DOI: 10.3414/ME0489
Original Article
Schattauer GmbH

An Efficient Validation Method of Probabilistic Record Linkage Including Readmissions and Twins

M. Tromp
1   Department of Medical Informatics, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
3   Department of Public Health Epidemiology, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
,
A. C. J. Ravelli
1   Department of Medical Informatics, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
,
N. Méray
1   Department of Medical Informatics, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
,
J. B. Reitsma
2   Department of Clinical Epidemiology, Biostatistics and Bioinformatics, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
,
G. J. Bonsel
3   Department of Public Health Epidemiology, Academic Medical Center, University of Amsterdam, Amsterdam, The Netherlands
› Institutsangaben
Weitere Informationen

Publikationsverlauf

Received: 05. Juni 2007

accepted: 21. November 2007

Publikationsdatum:
18. Januar 2018 (online)

Summary

Objective: To describe an efficient, generalizable approach to validate probabilistic record linkage results, in particular by a model-guided detection of linking errors, and to apply this approach to validate linkage of admissions of newborns. Methods: Our double-blind validation procedure consisted of three steps: sample selection, data collection and data analysis. The linked Dutch national newborn admission registry contained 30,082 records for 2001 including readmissions (7.4%) and twins (9.7%). A highly informative sample was selected from the linked file by oversampling uncertain links based on modelderived linking weight. Four hundred and eight fax forms with minimal registry information (admissions of 191 children) were sent out to different pediatric units. The pediatricians were asked to create a short detailed patient history from independent sources. The linkage status and additional record data was validated against this external information.

Results: Response rate was 97% (395/408 faxes). Accuracy of the linkage of singleton admissions was high: except for some expected errors in the uncertain area (0.02% of record pairs), linkage was error-free. Validation of multiple birth readmissions showed 37% linkage errors due to low data quality of the multiple birth variables. The quality of the linked registry file was still high; only 1.7% of the children were from a multiple birth with multiple admissions, resulting in less than 1% linking error.

Conclusions: Our external validation procedure of record linkage was feasible, efficient, and informative about identifying the source of the errors.

 
  • References

  • 1 Bell RM, Keesey J, Richards T. The urge to merge: linking vital statistics records and Medicaid claims. Med Care 1994; 32 (10) 1004-1018.
  • 2 Howe GR. Use of computerized record linkage in cohort studies. Epidemiol Rev 1998; 20 (Suppl. 01) 112-121.
  • 3 Jaro MA. Probabilistic linkage of large public health data files. Stat Med 1995; 14 5-7 491-498.
  • 4 Libby G, Macdonald TM, Evans JM. Recordlinkage methodology for prescribing research. J Clin Pharm Ther 2001; 26 (Suppl. 04) 241-246.
  • 5 Ingelfinger JR, Drazen JM. Registry research and medical privacy. N Engl J Med 2004; 350 (14) 1452-1453.
  • 6 Cook LJ, Olson LM, Dean JM. Probabilistic record linkage: relationships between file sizes, identifiers and match weights. Methods Inf Med 2001; 40 (Suppl. 03) 196-203.
  • 7 Roos LL, Wajda A. Record linkage strategies. Part I: Estimating information and evaluating approaches. Methods Inf Med 1991; 30 (Suppl. 02) 117-123.
  • 8 Roos Jr LL, Wajda A, Nicol JP. The art and science of record linkage: methods that work with few identifiers. Comput Biol Med 1986; 16 (Suppl. 01) 45-57.
  • 9 Croft ML, Read AW, de Klerk N, Hansen J, Kurinczuk JJ. Population based ascertainment of twins and their siblings, born in Western Australia 1980 to 1992, through the construction and validation of a maternally linked database of siblings. Twin Res 2002; 5 (Suppl. 05) 317-323.
  • 10 Zingmond DS, Ye Z, Ettner SL, Liu H. Linking hospital discharge and death records – accuracy and sources of bias. J Clin Epidemiol 2004; 57 (Suppl. 01) 21-29.
  • 11 Alsop JC, Langley JD. Determining first admissions in a hospital discharge file via record linkage. Methods Inf Med 1998; 37 (Suppl. 01) 32-37.
  • 12 Hornbrook MC, Whitlock EP, Berg CJ, Callaghan WM, Bachman DJ, Gold R. et al. Development of an algorithm to identify pregnancy episodes in an integrated health care delivery system. Health Serv Res 2007; 42 (Suppl. 02) 908-927.
  • 13 Jamieson E, Roberts J, Browne G. The feasibility and accuracy of anonymized record linkage to estimate shared clientele among three health and social service agencies. Methods Inf Med 1995; 34 (Suppl. 04) 371-377.
  • 14 Maizlish NA, Herrera L. A record linkage protocol for a diabetes registry at ethnically diverse community health centers. J Am Med Inform Assoc 2005; 12 (Suppl. 03) 331-337.
  • 15 Quantin C, Bouzelat H, Allaert FA, Benhamiche AM, Faivre J, Dusserre L. How to ensure data security of an epidemiological follow-up: quality assessment of an anonymous record linkage procedure. Int J Med Inform 1998; 49 (Suppl. 01) 117-122.
  • 16 Buescher PA. Method of linking Medicaid records to birth certificates may affect infant outcome statistics. Am J Public Health 1999; 89 (Suppl. 04) 564-566.
  • 17 Fair M, Cyr M, Allen AC, Wen SW, Guyon G, MacDonald RC. An assessment of the validity of a computer system for probabilistic record linkage of birth and infant death records in Canada. The Fetal and Infant Health Study Group. Chronic Dis Can 2000; 21 (Suppl. 01) 8-13.
  • 18 Nitsch D, Morton S, DeStavola BL, Clark H, Leon DA. How good is probabilistic record linkage to reconstruct reproductive histories? Results from the Aberdeen Children of the 1950s study. BMC Med Res Methodol 2006; 6: 15.
  • 19 Meray N, Reitsma JB, Ravelli AC, Bonsel GJ. Probabilistic record linkage is a valid and transparent tool to combine databases without a patient identification number. J Clin Epidemiol 2007; 60 (Suppl. 09) 883-891.
  • 20 Tromp M, Reitsma JB, Ravelli AC, Meray N, Bonsel GJ. Record Linkage: Making the most out of errors in linking variables. AMIA Annu Symp Proc 2006 pp 779-783.
  • 21 Fellegi IP, Sunter AB. A theory for record linkage. Journal of the American Statistical Association 1969; 64 (328) 1183.
  • 22 Newcombe HB. Handbook of record linkage: Methods for Health and Statistical Studies, Administration and Business. Oxford: Oxford University Press; 1988
  • 23 Reitsma JB. Registers in Cardiovascular Epidemiology. Amsterdam: University of Amsterdam; 1999
  • 24 Tromp M, Meray N, Ravelli AC, Reitsma JB, Bonsel GJ. Medical Record Linkage of Anonymous Registries without Validated Sample Linkage of the Dutch Perinatal Registries. Stud Health Technol Inform 2005; 116: 125-130.
  • 25 Brenner H, Schmidtmann I. Determinants of homonym and synonym rates of record linkage in disease registration. Methods Inf Med 1996; 35 (Suppl. 01) 19-24.
  • 26 Brenner H, Schmidtmann I, Stegmaier C. Effects of record linkage errors on registry-based followup studies. Stat Med 1997; 16 (23) 2633-2643.
  • 27 Brenner H, Schmidtmann I. Effects of record linkage errors on disease registration. Methods Inf Med 1998; 37 (Suppl. 01) 69-74.
  • 28 Oberaigner W. Errors in Survival Rates Caused by Routinely Used Deterministic Record Linkage Methods. Methods Inf Med 2007; 46 (Suppl. 04) 420-424.
  • 29 Langley JD, Botha JL. Use of record linkage techniques to maintain the Leicestershire Diabetes Register. Comput Methods Programs Biomed 1994; 41 3-4 287-295.