Summary
Objectives:
According to European legislation, we must develop computer software allowing the
linkage of medical records previously rendered anonymous. Some of them, like AUTOMATCH,
are used in daily practice either to gather medical files in epidemiologic studies
or for clinical purpose. In the first situation, the aim is to avoid homonymous errors,
and in the second one, synonymous errors. The objective of this work is to study the
effect of different parameters (number of identification variables, phonetic treatments
of names, direct or probabilistic linkage procedure) on the reliability of the linkage
in order to determine which strategy is the best according to the purpose of the linkage.
Methods:
The assessment of the Burgundy Perinatal Network requires the linking of discharge
abstracts of mothers and neonates, collected in all the hospitals of the region. Those
data are used to compare direct and probabilistic linkage, using different parameterization
strategies.
Results:
If the linkage has to be performed in real time, so that no validation of indecisions
generated by probabilistic linkage is possible, probabilistic linkage using three
variables without any phonetic treatment seems to be the most appropriate approach,
combined with a direct linkage using four variables applied to non-conclusive links.
If a validation of indecisions is possible in an epidemiological study, probabilistic
linkage using five variables, with a phonetic treatment adapted to the local language
has to be preferred. For medical purpose, it should be combined with a direct linkage
with four or five variables.
Conclusion:
This paper reveals that the time and money available to manage indecision as well
as the purpose of the linkage are of paramount importance for choosing a linkage strategy.
Keywords
Probabilistic file linkage - direct file linkage - decision analysis - hash coding
- phonetic treatment