Abstract
Objective To conduct research to show the value of text mining for automatically identifying
suspected bleeding adverse drug events (ADEs) in the emergency department (ED).
Methods A corpus of ED admission notes was manually annotated for bleeding ADEs. The notes
were taken for patients ≥ 65 years of age who had an ICD-9 code for bleeding, the
presence of hemoglobin value ≤ 8 g/dL, or were transfused > 2 units of packed red
blood cells. This training corpus was used to develop bleeding ADE algorithms using
Random Forest and Classification and Regression Tree (CART). A completely separate
set of notes was annotated and used to test the classification performance of the
final models using the area under the ROC curve (AUROC).
Results The best performing CART resulted in an AUROC on the training set of 0.882. The model's
AUROC on the test set was 0.827. At a sensitivity of 0.679, the model had a specificity
of 0.908 and a positive predictive value (PPV) of 0.814. It had a relatively simple
and intuitive structure consisting of 13 decision nodes and 14 leaf nodes. Decision
path probabilities ranged from 0.041 to 1.0. The AUROC for the best performing Random
Forest method on the training set was 0.917. On the test set, the model's AUROC was
0.859. At a sensitivity of 0.274, the model had a specificity of 0.986 and a PPV of
0.92.
Conclusion Both models accurately identify bleeding ADEs using the presence or absence of certain
clinical concepts in ED admission notes for older adult patients. The CART model is
particularly noteworthy because it does not require significant technical overhead
to implement. Future work should seek to replicate the results on a larger test set
pulled from another institution.
Keywords
hemorrhage - quality improvement - text mining - adverse drug event - emergency department