Summary
Background: Establishing a Case Definition (CDef) is a first step in many epidemiological, clinical,
surveillance, and research activities. The application of CDefs still relies on manual
steps and this is a major source of inefficiency in surveillance and research.
Objective: Describe the need and propose an approach for automating the useful representation
of CDefs for medical conditions.
Methods: We translated the existing Brighton Collaboration CDef for anaphylaxis by mostly
relying on the identification of synonyms for the criteria of the CDef using the NLM
MetaMap tool. We also generated a CDef for the same condition using all the related
PubMed abstracts, processing them with a text mining tool, and further treating the
synonyms with the above strategy. The co-occur-rence of the anaphylaxis and any other
medical term within the same sentence of the abstracts supported the construction
of a large semantic network. The ‘islands’ algorithm reduced the network and revealed
its densest region including the nodes that were used to represent the key criteria
of the CDef. We evaluated the ability of the “translated” and the “generated” CDef
to classify a set of 6034 H1N1 reports for anaphylaxis using two similarity approaches
and comparing them with our previous semi-automated classification approach.
Results: Overall classification performance across approaches to producing CDefs was similar,
with the generated CDef and vector space model with cosine similarity having the highest
accuracy (0.825±0.003) and the semi-automated approach and vector space model with
cosine similarity having the highest recall (0.809±0.042). Precision was low for all
approaches.
Conclusion: The useful representation of CDefs is a complicated task but potentially offers substantial
gains in efficiency to support safety and clinical surveillance.
Citation: Botsis T, Ball R. Automating case definitions using literature-based reasoning. Appl
Clin Inf 2013; 4: 515–527
http://dx.doi.org/10.4338/ACI-2013-04-RA-0028
Keywords
Case definition - safety surveillance - semantic networks - literature-based reasoning
- anaphylaxis - similarity