Assessing the Difficulty and Time Cost of De-identification in Clinical Narratives

D. A. Dorr; W. F. Phillips; S. Phansalkar; S. A. Sims; J. F. Hurdle

doi:10.1055/s-0038-1634080

RSS-Feed abonnieren

Bitte kopieren Sie die angezeigte URL und fügen sie dann in Ihren RSS-Reader ein.

https://www.thieme-connect.de/rss/thieme/de/10.1055-s-00035037.xml

Teilen / Bookmarken

Facebook X Linkedin Weibo

PDF herunterladen

Methods Inf Med 2006; 45(03): 246-252
DOI: 10.1055/s-0038-1634080

Original Article

Schattauer GmbH

Assessing the Difficulty and Time Cost of De-identification in Clinical Narratives

D. A. Dorr

¹Department of Medical Informatics and Clinical Epidemiology, Oregon Health and Science University, Portland, OR, USA

,

W. F. Phillips

²School of Computing, University of Utah, Salt Lake City, UT, USA

,

S. Phansalkar

³Department of Medical Informatics, University of Utah, Salt Lake City, UT, USA

⁴Geriatric Research, Education, and Clinical Center (GRECC), George E. Wahlen Department of Veterans Affairs Medical Center, Salt Lake City, UT, USA

,

S. A. Sims

³Department of Medical Informatics, University of Utah, Salt Lake City, UT, USA

⁴Geriatric Research, Education, and Clinical Center (GRECC), George E. Wahlen Department of Veterans Affairs Medical Center, Salt Lake City, UT, USA

,

J. F. Hurdle

³Department of Medical Informatics, University of Utah, Salt Lake City, UT, USA

⁴Geriatric Research, Education, and Clinical Center (GRECC), George E. Wahlen Department of Veterans Affairs Medical Center, Salt Lake City, UT, USA

› Institutsangaben

Weitere Informationen

Publikationsverlauf

Publikationsdatum:
06. Februar 2018 (online)

Abstract
Volltext
Referenzen

Lizenzen und Reprints

Summary

Objective: To characterize the difficulty confronting investigators in removing protected health information (PHI) from cross-discipline, free-text clinical notes, an important challenge to clinical informatics research as recalibrated by the introduction of the US Health Insurance Portability and Accountability Act (HIPAA) and similar regulations.

Methods: Randomized selection of clinical narratives from complete admissions written by diverse providers, reviewed using a two-tiered rater system and simple automated regular expression tools. For manual review, two independent reviewers used simple search and replace algorithms and visual scanning to find PHI as defined by HIPAA, followed by an independent second review to detect any missed PHI. Simple automated review was also performed for the “easy” PHI that are number- or date-based.

Results: From 262 notes, 2074 PHI, or 7.9 ± 6.1 per note, were found. The average recall (or sensitivity) was 95.9% while precision was 99.6% for single reviewers. Agreement between individual reviewers was strong (ICC = 0.99), although some asymmetry in errors was seen between reviewers (p = 0.001). The automated technique had better recall (98.5%) but worse precision (88.4%) for its subset of identifiers. Manually de-identifying a note took 87.3 ± 61 seconds on average.

Conclusions: Manual de-identification of free-text notes is tedious and time-consuming, but even simple PHI is difficult to automatically identify with the exactitude required under HIPAA.

Keywords

Health Insurance Portability and Accountability Act - Computerized medical records systems - medical informatics computing - natural language processing - de-identification

References
1 Chute CG. et al A framework for comprehensive health terminology systems in the United States: development guidelines, criteria for selection, and public policy implications. ANSI Healthcare Informatics Standards Board Vocabulary Working Group and the Computer-Based Patient Records Institute Working Group on Codes and Structures. J Am Med Inform Assoc 1998; 5 (06) 503-10.

Crossref PubMed Google Scholar
2 NIH Draft Statement on Sharing Research Data Bethesda, MD: NIH; 2002 [accessed April 24, 2005]. Available from http://grants1.nih.gov/grants/guide/notice-files/NOT-OD-02-035.html

PubMed Google Scholar
3 Berman JJ. Confidentiality issues for medical data miners. Artif Intell Med 2002; 26 (01) (02) 25-36.

Crossref PubMed Google Scholar
4 Initiative on Privacy Standardisation in Europe (IPSE): Final Report. Brussels, Belgium: CEN/ISSS; 2002 [Accessed Aug 25, 2005] Available at http://www.cenorm.be/cenorm/businessdomains/businessdomains/isss/activity/ipsefinalreport.pdf

PubMed
5 Fielstein E. et al Algorithmic De-identification of VA Medical Exam Text for HIPAA Privacy Compliance: Preliminary Findings. Medinfo 2004 11. (Pt 1) 1590

PubMed Google Scholar
6 Thomas SM. et al A successful technique for removing names in pathology reports using an augmented search and replace method. Proc AMIA Symp 2002; 777-81.

PubMed Google Scholar
7 Ruch P. et al Medical document anonymization with a semantic lexicon. Proc AMIA Symp 2000; 729-33.

PubMed Google Scholar
8 Taira RK. et al Identification of patient name references within medical documents using semantic selectional restrictions. Proc AMIA Symp 2002; 757-61.

PubMed Google Scholar
9 Security and Privacy Workgroup (WEDI) Deidentification white paper (version 3.1) (monograph on the internet). Reston, VA: Workgroup for Electronic Data Interchange; 2001 /cited Aug 25, 2005) 1-12. Available from http://privacy.cs. cmu.edu/dataprivacy/HIPAA/SNIPdeidv31.pdf

PubMed Google Scholar
10 Baker S. et al Anonymization, data-matching and privacy: a case study. Final report. Washington, DC: Steptoe & Johnson, LLP 2003

PubMed Google Scholar
11 Sweeney L. Replacing personally-identifying information in medical records, the Scrub system. Proc AMIA Annu Fall Symp 1996; 333-7.

PubMed Google Scholar
12 Quantin C. et al Automatic record hash coding and linkage for epidemiological follow-up data confidentiality. Methods Inf Med 1998; 37 (03) 271-7.

Artikel in Thieme Connect PubMed Google Scholar
13 Quantin C. et al Decision analysis for the assessment of a record linkage procedure: application to a perinatal network. Methods Inf Med 2005; 44 (01) 72-9.

Artikel in Thieme Connect PubMed Google Scholar
14 Berman JJ. Concept-match medical data scrubbing. How pathology text can be used in research. Arch Pathol Lab Med 2003; 127 (06) 680-6.

PubMed Google Scholar
15 Ohno-Machado L. et al Protecting patient privacy by quantifiable control of disclosures in disseminated databases. Int J Med Inform 2004; 73 (07) (08) 599-606.

Crossref PubMed Google Scholar
16 Berman JJ. Zero-check: a zero-knowledge protocol for reconciling patient identities across institutions. Arch Pathol Lab Med 2004; 128 (03) 344-6.

PubMed Google Scholar
17 Sweeney L. Guaranteeing anonymity when sharing medical data, the Datafly System. Proc AMIA Annu Fall Symp 1997; 51-5.

PubMed Google Scholar
18 Gupta D. et al Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research. Am J Clin Pathol 2004; 121 (02) 176-86.

Crossref PubMed Google Scholar
19 Chinchor N. Overview of MUC-7/MET-2 [monograph on the internet]. Science Applications International Corporation 1998 [accessed Aug 25, 2005] Available at http://www.itl.nist.gov/iaui/894.02/related_projects/muc/proceedings/muc_7_proceedings/overview.html

PubMed Google Scholar
20 Hirschman L. et al Rutabaga by any other name: extracting biological names. J Biomed Inform 2002; 35 (04) 247-59.

Crossref PubMed Google Scholar
21 Nebeker J. et al High rates of adverse drug events in a highly computerized hospital. Arch Intern Med 2005; 165 (10) 1111-6.

Crossref PubMed Google Scholar
22 Bruce R, Wiebe J. Recognizing Subjectivity: A case study of manual tagging. Natural Language Engineering 1999; 5 (02) 1-16.

Crossref PubMed Google Scholar
23 Streiner D, Norman G. Health Measurement Scales: A Practical Guide to Their Development and Use. 2nd ed. New York: Oxford University Press; 1995

Google Scholar

RSS-Feed abonnieren

Teilen / Bookmarken

Assessing the Difficulty and Time Cost of De-identification in Clinical Narratives

Publikationsverlauf

Summary

Keywords

References