Methods Inf Med 2017; 56(03): 230-237
DOI: 10.3414/ME16-01-0073
Paper
Schattauer GmbH

Structuring Legacy Pathology Reports by openEHR Archetypes to Enable Semantic Querying

Stefan Kropf
1   Innovation Center Computer Assisted Surgery (ICCAS), Leipzig University, Leipzig, Germany
2   Institute for Medical Informatics, Statistics and Epidemiology (IMISE), Leipzig University, Leipzig, Germany
,
Peter Krücken
3   Institute of Pathology, Leipzig University, Leipzig, Germany
,
Wolf Mueller
4   Department of Neuropathology, Leipzig University, Leipzig, Germany
,
Kerstin Denecke
5   Institute for Medical Informatics, Bern University of Applied Sciences, Bern, Switzerland
› Author Affiliations
Funding The paper mainly is a result of the Digital Patient Model Project (ICCAS), granted by BMBF (03Z1LN11).
Further Information

Publication History

received: 15 June 2016

accepted in revised form: 10 January 2017

Publication Date:
24 January 2018 (online)

Summary

Background: Clinical information is often stored as free text, e.g. in discharge summaries or pathology reports. These documents are semi-structured using section headers, numbered lists, items and classification strings. However, it is still challenging to retrieve relevant documents since keyword searches applied on complete unstructured documents result in many false positive retrieval results.

Objectives: We are concentrating on the processing of pathology reports as an example for unstructured clinical documents. The objective is to transform reports semi- automatically into an information structure that enables an improved access and retrieval of relevant data. The data is expected to be stored in a standardized, structured way to make it accessible for queries that are applied to specific sections of a document (section-sensitive queries) and for information reuse.

Methods: Our processing pipeline comprises information modelling, section boundary detection and section-sensitive queries. For enabling a focused search in unstructured data, documents are automatically structured and transformed into a patient information model specified through openEHR archetypes. The resulting XML-based pathology electronic health records (PEHRs) are queried by XQuery and visualized by XSLT in HTML.

Results: Pathology reports (PRs) can be reliably structured into sections by a keyword- based approach. The information modelling using openEHR allows saving time in the modelling process since many archetypes can be reused. The resulting standardized, structured PEHRs allow accessing relevant data by retrieving data matching user queries.

Conclusions: Mapping unstructured reports into a standardized information model is a practical solution for a better access to data. Archetype-based XML enables section-sensitive retrieval and visualisation by well-established XML techniques. Focussing the retrieval to particular sections has the potential of saving retrieval time and improving the accuracy of the retrieval.