Methods Inf Med 1998; 37(03): 271-277
DOI: 10.1055/s-0038-1634527
Original Article
Schattauer GmbH

Automatic Record Hash Coding and Linkage for Epidemiological Follow-up Data Confidentiality

C. Quantin
1   Department of Medical Informatics, Teaching Hospital of Dijon, France
,
H. Bouzelat
1   Department of Medical Informatics, Teaching Hospital of Dijon, France
,
F. A. Allaert
2   W2 “Data Security”, European Federation of Medical Informatics
,
A. M. Benhamiche
3   Registry of Digestive Cancers in Burgundy, Burgundy University, Dijon, France
,
J. Faivre
,
L. Dusserre
1   Department of Medical Informatics, Teaching Hospital of Dijon, France
› Author Affiliations
Further Information

Publication History

Publication Date:
14 February 2018 (online)

Abstract

A protocol is proposed to allow linkage of anonymous medical information within the framework of epidemiological follow-up studies. The protocol is composed of two steps; the first concerns the irreversible transformation of identification data, using a one-way hash function which is used after spelling processing. To avoid dictionary attacks, two large random files of keys, called pads, are introduced. The second step consists in the linkage of files rendered anonymous. The weight given to each linkage field is estimated by a mixture model, the likelihood of which being maximized with the Expectation and Maximization (EM) algorithm. The performance of this method has been assessed by comparing record linkage, based on exclusive use of the automatic procedure, with a manual linkage, obtained by the Burgundy Registry of Digestive Cancers. The result of the linkage of a file of 2,847 cancers with a file of 388,614 hospitalization stays in the Dijon university hospital showed a sensitivity of 97% and a specificity of 93%.

 
  • REFERENCES

  • 1 Beckett B. Introduction mix methodes de cryptnlogic. Masson: 1990
  • 2 Brassard G. Modern Cryptology. Lecture Notes in Computer Science. 1993
  • 3 Schneier B. Applied Cryptography. Protocols. Algorithms, and Source Code in C. John Wiley & Sons Inc; 1994
  • 4 Meux E. Encrypting personal identifiers. Health Services Research 1994; 29 (2) 247-56.
  • 5 Bouzelat H, Quantin C, Dusserre L. Extraction and Anonymity Protocol of Medical File. J Inter Am Medic Inform Assoc. Washington: 1996: 323-7.
  • 6 Rivest RL, Shamir A, Adleman L. A Method for obtaining Digital Signatures and Public-Key Cryptosystems. Comm Assoc for Comput Mach 1978; 2: 120-6.
  • 7 Michaelis J, Miller M, Pommereniing K, Schmidtmann I. A new concept to ensure data privacy and data security in cancer registries. MED INFO '95. Greenes RA, Peterson HE, Protti DJ. (editors) Elsevier Science Publishers (North-Holland); 1995: 661-5.
  • 8 Dusserre L, Quantin C, Bouzelat H. A one way public-key cryptosystem for the linkage of nominal files in epidemiological studies. MEDINFO '95. Greenes RA, Peterson HE, Protti DJ. (editors) Elsevier Science Publishers (North-Holland); 1995: 644-7.
  • 9 Jaro MA. Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa. Florida. J Am Stat Ass 1989; S4: 414-20.
  • 10 Jaro MA. Probabilistic-Linkage of large public health data files. Statistics in Medicine 1995; 14: 491-8.
  • 11 Fellegi IP, Senter AB. A theory for record linkage. J Am Stat Assoc 1969; 64: 1183-210.
  • 12 Dempster AP, Laird NM, Rubin DB. Methods and Likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 1977; 39: 1-38.
  • 13 Kelly RP. Blocking considerations for record linkage under conditions of uncertainty. In Proceedings of the Social Statistics Section American Statistical Association. 1984: 602-5.