RSS-Feed abonnieren
DOI: 10.1055/s-0038-1634080
Assessing the Difficulty and Time Cost of De-identification in Clinical Narratives
Publikationsverlauf
Publikationsdatum:
06. Februar 2018 (online)
Summary
Objective: To characterize the difficulty confronting investigators in removing protected health information (PHI) from cross-discipline, free-text clinical notes, an important challenge to clinical informatics research as recalibrated by the introduction of the US Health Insurance Portability and Accountability Act (HIPAA) and similar regulations.
Methods: Randomized selection of clinical narratives from complete admissions written by diverse providers, reviewed using a two-tiered rater system and simple automated regular expression tools. For manual review, two independent reviewers used simple search and replace algorithms and visual scanning to find PHI as defined by HIPAA, followed by an independent second review to detect any missed PHI. Simple automated review was also performed for the “easy” PHI that are number- or date-based.
Results: From 262 notes, 2074 PHI, or 7.9 ± 6.1 per note, were found. The average recall (or sensitivity) was 95.9% while precision was 99.6% for single reviewers. Agreement between individual reviewers was strong (ICC = 0.99), although some asymmetry in errors was seen between reviewers (p = 0.001). The automated technique had better recall (98.5%) but worse precision (88.4%) for its subset of identifiers. Manually de-identifying a note took 87.3 ± 61 seconds on average.
Conclusions: Manual de-identification of free-text notes is tedious and time-consuming, but even simple PHI is difficult to automatically identify with the exactitude required under HIPAA.
-
References
- 1 Chute CG. et al A framework for comprehensive health terminology systems in the United States: development guidelines, criteria for selection, and public policy implications. ANSI Healthcare Informatics Standards Board Vocabulary Working Group and the Computer-Based Patient Records Institute Working Group on Codes and Structures. J Am Med Inform Assoc 1998; 5 (06) 503-10.
- 2 NIH Draft Statement on Sharing Research Data Bethesda, MD: NIH; 2002 [accessed April 24, 2005]. Available from http://grants1.nih.gov/grants/guide/notice-files/NOT-OD-02-035.html
- 3 Berman JJ. Confidentiality issues for medical data miners. Artif Intell Med 2002; 26 (01) (02) 25-36.
- 4 Initiative on Privacy Standardisation in Europe (IPSE): Final Report. Brussels, Belgium: CEN/ISSS; 2002 [Accessed Aug 25, 2005] Available at http://www.cenorm.be/cenorm/businessdomains/businessdomains/isss/activity/ipsefinalreport.pdf
- 5 Fielstein E. et al Algorithmic De-identification of VA Medical Exam Text for HIPAA Privacy Compliance: Preliminary Findings. Medinfo 2004 11. (Pt 1) 1590
- 6 Thomas SM. et al A successful technique for removing names in pathology reports using an augmented search and replace method. Proc AMIA Symp 2002; 777-81.
- 7 Ruch P. et al Medical document anonymization with a semantic lexicon. Proc AMIA Symp 2000; 729-33.
- 8 Taira RK. et al Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp 2002; 757-61.
- 9 Security and Privacy Workgroup (WEDI) Deidentification white paper (version 3.1) (monograph on the internet). Reston, VA: Workgroup for Electronic Data Interchange; 2001 /cited Aug 25, 2005) 1-12. Available from http://privacy.cs. cmu.edu/dataprivacy/HIPAA/SNIPdeidv31.pdf
- 10 Baker S. et al Anonymization, data-matching and privacy: a case study. Final report. Washington, DC: Steptoe & Johnson, LLP 2003
- 11 Sweeney L. Replacing personally-identifying information in medical records, the Scrub system. Proc AMIA Annu Fall Symp 1996; 333-7.
- 12 Quantin C. et al Automatic record hash coding and linkage for epidemiological follow-up data confidentiality. Methods Inf Med 1998; 37 (03) 271-7.
- 13 Quantin C. et al Decision analysis for the assessment of a record linkage procedure: application to a perinatal network. Methods Inf Med 2005; 44 (01) 72-9.
- 14 Berman JJ. Concept-match medical data scrubbing. How pathology text can be used in research. Arch Pathol Lab Med 2003; 127 (06) 680-6.
- 15 Ohno-Machado L. et al Protecting patient privacy by quantifiable control of disclosures in disseminated databases. Int J Med Inform 2004; 73 (07) (08) 599-606.
- 16 Berman JJ. Zero-check: a zero-knowledge protocol for reconciling patient identities across institutions. Arch Pathol Lab Med 2004; 128 (03) 344-6.
- 17 Sweeney L. Guaranteeing anonymity when sharing medical data, the Datafly System. Proc AMIA Annu Fall Symp 1997; 51-5.
- 18 Gupta D. et al Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol 2004; 121 (02) 176-86.
- 19 Chinchor N. Overview of MUC-7/MET-2 [monograph on the internet]. Science Applications International Corporation 1998 [accessed Aug 25, 2005] Available at http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/muc_7_proceedings/overview.html
- 20 Hirschman L. et al Rutabaga by any other name: extracting biological names. J Biomed Inform 2002; 35 (04) 247-59.
- 21 Nebeker J. et al High rates of adverse drug events in a highly computerized hospital. Arch Intern Med 2005; 165 (10) 1111-6.
- 22 Bruce R, Wiebe J. Recognizing Subjectivity: A case study of manual tagging. Natural Language Engineering 1999; 5 (02) 1-16.
- 23 Streiner D, Norman G. Health Measurement Scales: A Practical Guide to Their Development and Use. 2nd ed. New York: Oxford University Press; 1995