Keywords
terminology binding - SNOMED CT - electronic health records - information models
Introduction
Data sharing within and between stakeholders in health care is low. Part of the problem
is lack of semantic interoperability, in other words the data entered at one place
and time cannot be reused in other places and over time unambiguously.
SNOMED CT has evolved over decades into what today is the most comprehensive terminology
in the medical field. In 2016 it was judged “the best available core reference terminology for cross-border, national, and regional
eHealth deployments in Europe.”[1] But neither SNOMED CT nor any other terminology is a stand-alone solution to the
problem of semantic interoperability, a terminology must be used in harmonisation
with one or many information models that can provide structure.
Establishing links between elements of a terminology and an information model is called
terminology binding[2] and is also often referred to as mapping or subset development. Terminology binding
is typically performed during configuration of health record systems, in response
to data sharing efforts or during implementation of decision support systems. Providing
relevant subsets for different parts of an information model terminology binding can
also facilitate natural language processing (NLP) of narrative text.
There is, however, a lack of published guidelines on the process of terminology binding
SNOMED CT. A paper from 2012 describes a method for mapping,[3] but a survey on SNOMED CT implementations in 2013 stated that there is a lack of
“subset development methodologies,”[4] and a literature review in 2015 stated that these processes were rarely described.[5] In 2015, the TermInfo project[6] wrote guidelines for the binding of SNOMED CT to HL7 Reference Information Model,
and in 2016 it was proposed that guidelines should be developed on the combination
of information model elements and SNOMED CT hierarchy as well as granularity issues.[1]
Lack of guidelines makes performing terminology binding difficult and the result inconsistent,
hindering reuse of data and impeding transformation to sharable data in health care.
Objectives
This paper provides a state-of-the-art review[7] and analysis of research published about terminology binding processes concerning
SNOMED CT. We aim to collate existing knowledge on difficulties and possible solutions
and to point out relevant future research. The primary audience is practitioners and
researchers working with terminology binding.
Definitions
Ironically there is a lack of terminological clarity in this perisemantic area working
toward unambiguity. This paper uses the following terms and definitions for concept, description, subset, subset development, information model, terminology binding,
value set binding, and mapping.
SNOMED CT is made up of concepts representing things in the real world, for example disorders assumed to exist in
real patients or procedures done or to be done on real patients. Others call these
representational entities (or just entities).[8] In SNOMED CT each concept has two or more descriptions linked to it, each containing one term. One description is the Fully Specified Name
(FSN)[9] which includes a term that is always unique within SNOMED CT, while the other descriptions
are synonyms that have terms for use in different languages and/or contexts. Synonyms
are not always unique within SNOMED CT. For example, the concept with id 22298006
has FSN “myocardial infarction (disorder).” It also has the descriptions “Heart attack”
and “MI” (in English) and “hjärtinfarkt” (in Swedish), among others.
A subset is a collection of components from a terminology. SNOMED CT subsets presented in
RF2-format are simple reference sets, often called just “refsets.” In some use cases
subsets are called value sets. A SNOMED CT subset can include either SNOMED CT concepts,
which in an interface can be represented by any of the descriptions linked to them,
or specified descriptions.
Subset development is the process of choosing concepts or descriptions from a terminology relevant to
a specific application. SNOMED CT subsets can be defined either by enumerating the
included concepts (extensional subsets) or by using SNOMED CTs description logic (intentional
subsets).
An (clinical) information model is defined by ISO/DIS 13972:2020 as “logical models designed to express one or more
clinical concepts and their context in a standardized and reusable manner, specifying
the requirements for health, clinical and care information as a discrete set of logical
clinical data elements.”[10]
According to Benson and Grieve “Terminology binding is the process of establishing links between elements of a terminology such as SNOMED
and an information model.”[2]
Value set binding is a type of terminology binding where a subset of a terminology is stated as the
allowed values for a certain part of an information model, for example the allowed
concepts in a section of Family History could be a subset of all concepts subsumed
by 64572001 | Disease (disorder)|.
The difference between “value set binding” and “subset development” is that a value
set binding will always refer to a specified part of an information model, whereas
subsets can be developed for multiple use cases and, sometimes after adjustment, be
used as value sets in different information models.
Mapping can mean at least two things. First, making an extensional subset by starting with
an existing list of terms and selecting corresponding SNOMED CT concepts and then
abandoning the initial list. This type of mapping produces a subset with one concept
per entity.
Second, making a link between concepts in one terminology and concepts in SNOMED CT,
intending to continue using both terminologies (for new and/or legacy data) and using
the map to transfer data from one to the other. This type of mapping produces a map
with concepts from two terminologies linked to each other.
Methods
Choice of Databases and Primary Inclusion
Two larger literature reviews on SNOMED CT use have been published, one in 2008 covering
1966 to 2006[11] and one in 2014 covering 2001 to 2012.[12] The 2008 review searched Medline, whereas the 2014 review also included Embase.
Both of these databases are geared toward medical and biomedical science. To find
publications also published in the informatics field we chose to perform searches
in Scopus, Web of Science, and Medline.
For primary inclusion we searched for papers containing SNOMED in some form and some
term that could relate to terminology binding, subset development, or mapping. We
used the term “SNOMED” as this also includes similar predecessors, such as SNOMED
RT. We considered older versions not relevant since the structure of them are significantly
different from SNOMED CT. Although UMLS includes SNOMED CT, for the same reasons,
papers only describing UMLS were not considered relevant.
Since ICD is used for statistics, reporting and reimbursement, there are many papers
on mappings between these two terminologies. We are interested in terminology binding
done for implementation purposes. A search including ICD was found to severely decrease
specificity with only a small gain in sensitivity and papers with “ICD” in title or
tag were therefore excluded acknowledging that this would omit potential papers where
both ICD and SNOMED CT were used in clinical practice.
Many findings in informatics are presented at conferences rather than published in
journals, and therefore we have included both conference proceedings and journal papers.
We had no limit timewise historically, and a complementary search was made on April
13, 2020 to include papers published during our work. See [Table 1] for search strings and total hits in the three databases.
Table 1
Database search strings and hits
|
Database
|
Search string
|
Hits
|
|
Scopus
|
TITLE-ABS-KEY (SNOMED) AND TITLE-ABS-KEY (“terminology binding” OR subset* OR map*
OR “information model” OR implement*) AND NOT TITLE-ABS-KEY ICD*
|
539
|
|
Web of Science
|
TS = SNOMED AND TS= (“terminology binding” OR subset* OR map* OR “information model”
OR implement*) NOT TS = ICD*
|
358
|
|
Medline
|
SNOMED AND (“terminology binding” OR subset* OR map* OR “information model” OR implement*)
NOT ICD*
|
287
|
The three resulting lists of papers were merged in Zotero reference management software.[13] Papers with the same title, author, and publication year were identified as duplicates
and removed, see [Fig. 1].
Fig. 1 PRISMA diagram.
Abstract Review and First Exclusion
We imported the resulting 616 papers into Rayyan software.[14] Papers which described processes where medical information had been mapped to SNOMED
CT resulting in a subset or a map for implementation and papers with recommendations
of how such work should be performed were included. To refine and ensure consensus
about the exclusion/inclusion criteria, the review was unblinded after the first 20
papers were analyzed, and the results discussed within the group. The remaining papers
were then reviewed in a blinded fashion where at least two reviewers categorised each
paper. In cases of disagreement, a third reviewer made the final choice.
When there was both a journal paper and a conference proceeding published on the same
material, both were read, and if the journal article was an improved version of the
conference proceeding only the journal paper was included. If, however, they presented
different ideas, albeit from the same material, both were included.
Papers in other languages than English, review papers and papers where full text was
not accessible were excluded.
Full-Text Review, Coding, and Secondary Exclusion
The resulting list of papers was then imported into nVivo.[15] Initially, the four authors thoroughly read 20 randomly selected papers. These were
discussed and a consensus was reached on what topics to look for in all papers. The
topics were chosen to capture important aspects of the context where the work was
performed and different parts of the terminology binding process. The authors have
different academic and practical backgrounds, ensuring width of scope. These topics
correspond to what Webster and Watson refer to as “concepts,”[16] but as this term has a specific meaning in SNOMED CT, we prefer the term “topic.”
Together they constitute a conceptual framework as described by Rowe.[17] The topics were represented as nodes in NVivo and sub-nodes were developed iteratively
during the annotation process.
During this process, 28 articles were excluded because they were not on topic, the
terminology binding process was not sufficiently described, or full text was not available.
The included papers are listed in [Appendix A]. During annotation, subtopics were developed iteratively with results as described
under the respective topic below.
Synthesis of Results
We have used both quantitative and qualitative methods, and the results of these will
be intertwined. Calculations were performed in MS Excel and Chi-square method was
used for quantitative analysis across topics.
Results
The Following Topics Were Chosen
The content analysis of the papers described under methods above led to the formulation
of seven key topics as shown in [Fig. 2].
Fig. 2 Terminology binding topics.
Input Material
The majority of the described projects (n = 46, 84%) focused on information used for documentation. Of these, 33 used existing
terminology such as EHR templates or local term lists as input. The projects starting
with free text often had domain experts or focus groups who produced a list of terms
as input, while two projects used both free text and listed terms as input.
A smaller proportion of the projects focused on information in guidelines or literature,
and all of these started with free text, see [Fig. 3]. These projects were performed to support improved data-gathering in half of the
cases (n = 4) and as part of the development of clinical decision support systems (n = 5) in the other half.
Fig. 3 Input material.
Papers with different types of input data were evenly distributed over time.
Stated Competence of Project Participants
Of the papers, 40% (n = 22) did not state the competence of those who performed the terminology binding.
This includes the papers focused on development of tools where domain knowledge is
less relevant. Of the papers mentioning competence, half reported both terminological
and domain knowledge. This could be one person with double competences or separate
people working together. The other half of the papers were evenly split between domain
experts and terminologists doing the work, see [Fig. 4]. No statistically significant difference was found in the distribution of competence
with regard to the aim of the project described. The co-operation between terminologist
and domain experts was stressed in nine papers.
Fig. 4 Stated competence of project participants.
Tools for Terminology Binding
Terminology browsers, sometimes multiple, were the most commonly mentioned tools,
with CliniClue Browser and SNOMED International's Browser being most frequent. Others were: Gephi, HealthTerm (only used for browsing), Nictiz Terminology Browser, UMLS Metathesaurus, and a prototype visualization tool called TermViz. Five projects mentioned having used multiple browsers, which sometimes yielded different
results during look-ups.[18]
In the included projects 11 different tools that used to support terminology binding
were mentioned ([Table 2]). The tools range from one-off local projects to software that is still available
in 2020.
Table 2
Tools mentioned in included papers
|
Name of tool
|
References
|
Publication year
|
Stated use in project
|
|
BioPortal Annotator
|
[54]
|
2013
|
… a tool that processes text submitted by a user, recognizes relevant biomedical terms
in the text, and returns the annotations to the user.
|
|
eleMAP
|
[55]
|
2011
|
… semi-automatic mapping of research DEs [Data Elements] to standardized biomedical
vocabularies and metadata registries.
|
|
LexValueSets
|
[56]
|
2008
|
context-driven value sets extraction.
|
|
Mayo Clinic Tools SAVS/MCSV
|
[57]
[58]
|
2004, 2006
|
The MCVS is a set of tools … which facilitates health vocabulary indexing.
|
|
Medical Text Extraction, Reasoning, and Mapping System (MTERMS)
|
[59]
|
2016
|
… generic natural language processing (NLP) application … to process and map local
allergy entries to the standard terminologies.
|
|
MoST
|
[31]
|
2008
|
… archetype authoring, semi-automatic SNOMED CT terminology binding assistance and
terminology visualization
|
|
SAMT/SSMT
|
[60]
|
2014
|
… discover mappings between clinical terms and SNOMED-CT concepts.
|
|
Snapper
|
[33]
[34]
|
2010, 2011
|
… creating mappings from an existing terminology to SNOMED CT.
|
|
Termworks by Apelon
|
[30]
[58]
|
2004, 2008
|
… searched SNOMED for concept names matching the unresolved narratives using a multi-step
algorithm.
|
|
UMLS MetaMap
|
[28]
|
2010
|
… map biomedical text to the [UMLS] Metathesaurus.
|
|
Unnamed tool
|
[61]
|
2012
|
… propose a hybrid approach relying on linguistics as well as structural information
[for mapping].
|
Six papers stated the use of Excel for managing and storing subsets. One project used both SNOMED International's developer toolkit and an SQL server
[19] to manage subsets. No paper mentioned a FHIR-based terminology server. One paper
used a tool included in the EHR to develop templates, where the terminology bindings
were thus documented,[20] albeit not in an easily sharable format.
Five papers mentioned lack of appropriate tooling as a specific problem.
Problems Encountered
We have identified four themes regarding the problems described in the included articles.
Problems Related to Input Material
To correctly bind input material to SNOMED CT concepts the input material needs to
be properly understood by either human, machines, or both in collaboration. Assumed
information must somehow be considered and included in the input for terminology binding.
An example is “Basically, all salpingectomy/oophorectomy cases have been done robotically in the
practice since several years ago. This is not reflected in the documentation of surgical
notes.”[21] If this is not taken into account during terminology binding, underspecified concepts
might be selected.
Local terms or abbreviations can be difficult to interpret, for example “In orthopaedics, the term “SLAP tear” is a short form for describing a tear to the
labrum”
[22] and homonyms used in different contexts might lead to misinterpretation as demonstrated
by “The term “left adnexa” used to describe the left uterine adnexa, but it could also
refer to the left ocular adnexa.”[22]
Some papers report problems where the input material has multiple terms perceived
as synonyms, for example “the neurological finding of bilateral extensor plantar response was expressed in 13
different ways.”[23]
Projects which describe mapping of clinical guidelines or decision support systems
report granularity issues, for example they “often encountered a guideline term too general to appear as a patient data item in
electronic medical records.”
[24]
Where the input material is derived from statistical classifications, terms based
on “not elsewhere classified” or “other specified” can cause problems because a corresponding
concept is not allowed in SNOMED CT. For example, “concepts such as “other cardiovascular problems” are vague, as they depend on what
has been specified in the context.”[24]
Problems Related to SNOMED CT
The most common problem during mapping, mentioned in approximately 60% of papers,
was failure to find an appropriate concept despite relevant searches. This was mostly
in the order of a few percent of the entire subset or less and was generally solved
by requests for new concepts. It was never stated as a significant problem, but the
delay caused by this made some use local extensions where they could add concepts
more quickly.[22]
Lack of terms for existing concepts was also mentioned as in “the terminology used in the clinical area cannot always be found in SNOMED CT. In
this case we tried to find a concept ID that represents the concept that lies behind
the terminology used by the care professional.”[25] Sometimes multiple very similar terms in SNOMED CT for different concepts made finding
the right concept difficult.[18]
Some SNOMED CT terms were deemed incorrect in relation to FSN or other terms for the
same concept, for example “we did not think that “depression (finding)” and “sadness” were semantically equal as defined by SNOMED CT,”[26] and some terms were formatted inconsistently for example “left popliteal artery structure (body structure) and structure of right popliteal
artery (body structure).”[26] Cultural differences sometimes make terms incorrect as in the “use of stimulants, like alcohol, marihuana, and cigarettes is defined as abuse of
these stimulants although the use of stimulants is not always considered as abuse.”[25]
Half of the papers mentioned difficulties pertaining to the SNOMED CT concept model.
For example, the model was perceived to be developed for surgical procedures, resulting
in difficulties post-coordinating, and it was stated that “there is no proper way to post-coordinate nonoperative or nonsurgical concepts.”[27] One paper lacked the possibility of documenting patient preferences as in ““patient-prefers-bct,” describing the patients preference of breast conserving treatment
over mastectomy.”[28] Combinations made of existing concepts, for example “six-courses-anthracycline-chemotherapy” or the intention “elimination-distant-metastases”[28] were reported as absent, but others stated that post-coordination solved problems
with lacking concepts, for example the concept “asthma education completed before the enrolment for the DMP”[29] was post-coordinated. Post-coordinating was deemed more complicated than requesting
new precoordinated concepts because “these require a sophisticated knowledge of concept modelling and the evolution of
SNOMED hierarchies over time”
[30] and “post-coordination may be equivalent to an existing precoordination or another post-coordination.
Logical contradictions also have to be checked for and avoided.”
[31]
Choice of hierarchies can be difficult, with similar concepts found in observable
entities and findings.[3]
[25]
[32].
Another problem with the concept model reported was the way subsumption includes concepts
that can be correct in one setting but incorrect in another, for example “the specializations of the SNOMED CT concept “nose and throat examination” include
the concept “rhinolaryngologic examination under general anesthesia” which is not
a part of the preoperative airway examinations that is mentioned in the guideline.”
[27]
Refsets used to document subsets are sometimes reported to lack functionality because
the “format does not directly allow for specific concepts to be included merely for navigational
grouping purposes and not selectable in the user interface.”[33]
The choice between what to document with terminology and what to document with an
information model is mentioned with, for example, finger extensors and laterality[25] and “patient younger than 1 year” versus date of birth.[27]
Some descriptions state solutions that use the SNOMED CT concept model incorrectly,
revealing that there is a lack of knowledge among both authors and reviewers, for
example “in the archetype Apgar, the ELEMENT term Colour referring to the skin was mapped to
the SNOMED CT concept Colors (qualifier value) because a more specific concept in SNOMED
CT has not been found”[32] and “when examining the family history of a stroke patient one wants to know if stroke
at an early age runs in the family. For this concept we needed three SNOMED CT codes:
one for stroke, one for age, and one for young.”[25]
Lack of recommendations for modelling[34] and the need to further develop modelling guidelines[3] was raised as a problem.
Problems Related to Information Models
Several papers point out the need for a stated information model to bind the terminology
to, either at national or international level, and the agencies sharing information
with one another need continued communication.[3]
[24]
[30]
[35]
[36] Arguments for this are both to share the burden of the work[35] and to obtain semantic interoperability.[3]
Problems Related to Tools
The necessity of tools to support mapping is stressed with, for example, “default context …. should be supported by tooling”[34] and “continue the search for useful IT-tools for documentation of the structured clinical
content.”[35] Some of the problems mentioned above, for example related to the choice of concepts
from the wrong hierarchy, could be prevented with supported tooling.
Problems also occurred when tools were used, for instance when different parts of
projects used different tools, thus producing different results for the same task.
This occurred both with mapping tools[30] and browsers. In the browsers it was sometimes due to different settings regarding
extension and version.[18]
In one paper a publicly available tool is suggested to support consistent post-coordination.[37]
Validation
Validation was described in 69% (n = 38) of included papers, the most common type being independent reviewers working
with the same material and then comparing results.Papers on tooling described automatic
controls within the tools but also used human controls as in “all filtered SNOMED
CT results are presented to the clinical modeler as candidate mappings.”[31]
Project Motivations and Output
Four main types of motivation and corresponding outputs from the included projects
were found. Evaluation of applicability (29%, n = 16) includes projects that evaluated SNOMED CT coverage solely or compared SNOMED
CT coverage with other terminologies against a set of terms or local codes in a particular
setting, often as a first step in an implementation or as part of preparation for
such, but no implementation was described. The output was sometimes a subset or map,
but more often a measurement of coverage of a domain. Papers included in descriptions
of implementations (29%, n = 16) describe single projects and their experiences. Examples include shifting to
SNOMED CT in a clinical registry or template in an EHR, and the direct output of the
work was, for example, a subset, a map between code systems, or a template populated
with SNOMED CT. Recommendations (22%, n = 12) group papers which provide general descriptions of or recommendations on how
terminology binding can be done. Some include a case description as well, but the
focus of the papers is generic recommendations. The output in these papers was the
recommendation in itself.
The final group consists of papers on the development of tools (20%, n = 11) where either β software or evaluations of different approaches used in software
development were described.
No patterns in the distribution of types of projects over the time-period were found.
None of the papers described repeated work. Successive projects from the same research
groups is not counted as repeated work.
Recommendations Given
We have identified three themes regarding the recommendations given in the included
articles.
Ensure Domain Knowledge and Informatics Competence
Among the papers stating recommendations more than half stress the need for knowledge
of both domain and informatics within the project. Some examples are: “it is critical to have terminologists with considerable clinical background or domain
expertise”[26] and “requires considerable training before successful implementation”[23] When clinicians with no prior experience of informatics were engaged “substantial education was needed.”
[35]
Another argument for engaging clinicians in the work is to enable and make sure that
good work practices guide the configuration of IT systems, and not the other way around,
as in this example: “No decision support tool should disrupt the nurse's workflow, increase documentation
burden, or decrease time with the patient; all these variables should be tested.”
[38]
Follow a Process Including Validation
Papers describing a process start with domain analysis, sorting of the input material
or making process flow diagrams as in “graphical representation of the clinical process, using symbols for start- and end-points,
process, decision points, data, etc.”[35]
Both manual projects and automated projects recommended using some type of validation,
either as a single step or as a continuous process as in “there was ongoing collaboration–validation, discussion, and commentary for each group
of maps. This was critical to achieving eventual consensus on the final maps.”[26]
Some projects took note of the “quality of the relationship between source legacy interface terms and target SNOMED
CT concepts”[26] and others used Krippendorff's Alpha to mathematically measure discrepancies.[36]
We found detailed recommendations of how to choose concepts regarding, for example,
hierarchies and granularity only in one paper.[3]
Plan for Maintenance
Since SNOMED CT is an evolving terminology there needs to be plans and systems for
the maintenance of developed subsets[39] or maps, and this expense can be relatively substantial.[30]
Article Statistics
Publication year for the included articles was evenly distributed from 2004 to 2020.
The included articles were found evenly in journals (n = 29) and in conference proceedings (n = 26). The majority were published in Information System outlets: three papers were
from medical journals, none from medical conferences. Fifty-three percent (n = 29) of first authors were women.Papers came from all parts of the world with a
majority from North America. Denmark and the Netherlands dominated in Europe.
Projects varied in size, with the smallest describing a method for post-coordination
tested on 10 terms and the largest using both structured and narrative data as input
starting with over 37,000 values.
Discussion
In the subsequent sections the findings in the previous sections are analyzed and
discussed with respect to our research aim: to collate existing knowledge on difficulties
and possible solutions and to point out relevant future research. We have organized
the discussion primarily based on the findings reported under problems and recommendations
found above. Recommendations are given for both practice and research.
General Process
We found no established process for terminology binding to SNOMED CT in the included
papers; however, most processes described started with a review of what information
would be relevant to document, proceeded to find relevant terms and concepts in SNOMED
CT and then performed some type of validation with domain experts.
During the first step care must be taken to evaluate the quality of the input data
before using it as a starting point for terminology binding. There might be a technical
debt pertaining to previous or existing systems that preferably should not be brought
into the new information structure.[40]
[41] It was stressed that the way clinicians work should take precedence over the system,
rather than the other way around.
RECOMMENDATION FOR PRACTICE: Sort input data, select relevant concepts, validate.
RECOMMENDATION FOR RESEARCH: Test and refine the process described above. Evaluate
balance between keeping legacy structures and resolving technical debt.
Understanding the Meaning of Terms
The repeated problems described with understanding the meaning of terms in the input
material stresses the necessity of domain knowledge. Insufficient knowledge of context
and language used in the setting also obstructs understanding of the terms used in
SNOMED CT. It is shown that domain knowledge is highly important when configuring
templates or other types of data-collection material[42] and our finding that 75% of papers with stated competence had involved some sort
of domain knowledge supports this being understood amongst the described projects.
Knowledge of the logical structure of SNOMED CT is also necessary to make qualified
choices between concepts and judge when and what new terms or concepts are needed.
Informaticians can provide this competence, but some knowledge of informatics is also
needed by the domain experts and many papers stressed the cooperation between informaticians
and domain experts. More supportive tools could perhaps alleviate the need for knowledge
about SNOMED CT intricacies.
RECOMMENDATION FOR PRACTICE: Involve domain experts and invest time in educating them
in informatics and SNOMED CT.
RECOMMENDATION FOR RESEARCH: Design supportive tooling geared to domain experts without
in-depth knowledge about informatics or SNOMED CT.
Unspecified Terms
The problems reported regarding explicitly unspecified terms in the input material
cover two principally different types of entities, not otherwise specified (NOS) and
not elsewhere classified (NEC).[43]
NOS terms can be terminology bound to SNOMED CT by using content at a higher level
in the hierarchy, for example 233604007 | Pneumonia (disorder) | for unspecified pneumonia. This binding is a 1–1 map as both concepts mean the same
thing.
NEC terms are defined by the other existing terms in the set and are therefore subject
to semantic drift as the set evolves. Such concepts are not allowed in SNOMED CT,
and it is thus not possible to bind an NEC term to a SNOMED CT concept. (It is, however,
possible to bind from SNOMED CT to NEC terms). NEC terms also require in-depth knowledge
of the rest of the set to be used accurately and should be used by skilled classifiers,
or software, rather than by clinicians.
It must at times be possible to enter data even when none of the available concepts
in the list are adequate. A possible solution in these situations is to allow free
text entry. The entered text could be analyzed, and sequentially corresponding concepts
could be added to the subset as, for example, described by Warren et al.[44] It is not possible, however, to automatically map all the free-text entries to the
NEC term, as the free-text entry could have been chosen for numerous reasons.
Sometimes the free text would be used when there is actually a matching term in the
subset. NLP could perhaps be used to suggest suitable terms from the subset and thus
decrease the amount of unstructured data.
RECOMMENDATIONS FOR PRACTICE: Use general concepts for nonspecified terms. Enable
free text entries for situations where the subset might be incomplete, and data must
be entered.
RECOMMENDATIONS FOR RESEARCH: Find effective processes for analysing data entered
as free text to either bind it to an existing concept in the subset or add the requested
concept to the subset.
Incorrect Terms
SNOMED CT includes many terms, and some are inevitably outdated or incorrect. SNOMED
International provides a content request service for National Release Centres (NRC)
to report errors. Some NRCs also offer a message function in the browser[45] for all users to report issues, but these are not mentioned in any paper included
and are perhaps not well known.
Sometimes the incorrect terms are related to context and setting, for instance drug
abuse versus recreational use of drugs. To our knowledge there is no proposed solution
to this apart from not using the conflicting concepts.[25] Context-specific concepts can be added per extension but are then not interoperable
outside that extension. Discrepancies as these are hard to cater for when trying to
use a single terminology on a global scale and the example where cultural influence
impedes semantic interoperability.
RECOMMENDATION FOR PRACTICE: Report incorrect terms to National Release Centre (NRC)
or SNOMED International. Develop local extensions when needed.
RECOMMENDATION FOR RESEARCH: Design processes for managing reported incorrect terms
considering both promptness and quality.
Lack of Terms or Concepts
Lack of terms or concepts introduces delay in implementation processes, as required
terms are not immediately available for use. However, the fact that a concept or term
does not exist in SNOMED CT does not imply it should not exist, merely that no one
has yet made a request. Given access to a SNOMED CT module, new terms and concepts
can technically be added as long as they follow SNOMED CT's principles of being understandable,
reproducible, and useful and other editorial principles.[46] No paper described a project with direct access to an authoring environment and
possibility to edit or add SNOMED CT content, but that might perhaps be a way to speed
up development of SNOMED CT and production of complete subsets.
Since content is added on a request basis the coverage of a domain is correlated to
the use of SNOMED CT in that particular domain, and the problem with lack of concepts
will probably decrease with increased usage of SNOMED CT. There are clinical reference
groups (CRGs) administratively supported by SNOMED International working with segments
of the terminology relevant to their domain, but these need clinicians who can allocate
time without compensation.
For some lacking concepts it might be better to use another terminology, for example
LOINC, UniProt or human phenotype ontology (HPO). We have not analyzed the demarcation
between these terminologies and SNOMED CT in this review. Guidelines for when to use
what terminology and maps to bridge between them could provide valuable support. Such
work has been undertaken for LOINC and SNOMED CT, HPO, and SNOMED CT.[47]
RECOMMENDATION FOR PRACTICE: Engage in relevant CRGs. Request new content as needed.
RECOMMENDATION FOR RESEARCH: Design supportive software for authoring of content close
to the implementation setting. Continue developing maps and demarcations between terminologies.
Using Precoordinated Concepts or Post-coordinating or Using the Information Model
The choice of post- or precoordination within SNOMED CT or using the information model
to express compound meanings is a challenge. For example, laterality of a body site
could be included in a precoordinated concept ({body site = left hand}), be post-coordinated
with SNOMED CT ({body site = hand: laterality = left}) or stored in two separate classes
of an information model ({body site = hand}, {laterality = left}), see [Fig. 5].
Fig. 5 Using precoordinated concepts or post-coordinating or using the information model.
Legacy systems must sometimes catch all information in a single concept, and this
generates expectations to find even rather complex notions, such as breast cancer (event) + before (time-relation) + distal venous thrombosis (event), precoordinated in SNOMED CT. Precoordinating such concepts has negative
implications for information sharing and reusability and would also lead to combinatorial
explosions.[48] What to precoordinate, what to post-coordinate with SNOMED CT, and what to document
using an information model is, however sometimes a difficult choice and has implications
for information sharing.
Information represented by using different classes from an information model are only
understandable via normal form if the involved classes also are terminology bound
to SNOMED CT, which they rarely are. On the other hand, if the classes in the communicating
systems are the same, i.e., they share an information model or parts thereof, the
information can be understood.
There is a difference regarding pre- and post-coordination within SNOMED CT depending
on if the purpose is to refine concepts within the same hierarchy to different degrees
(as, for example, adding laterality), or if the purpose is adding information that
also alters the context (for instance negations). This discrepancy is not discussed
in the papers, perhaps because it is not well known.
There is an abundance of precoordinated concepts in various domains within SNOMED
CT today, but there is also ongoing work on delimiting what types of concepts should
be precoordinated and what should best be managed with post-coordination or in an
information model. This work is partly described in the Precoordination Pattern Project
within SNOMED International, but this is not mentioned in any of the papers and perhaps
not well-known outside SNOMED International internal work areas.
Post-coordination is perceived to be difficult, and none of the papers used tooling
to support such work.
RECOMMENDATION FOR PRACTICE: Be consistent regarding what to store with terminology
and what to store with an information model. Use the same demarcation as those with
whom you will share information if possible.
RECOMMENDATION FOR RESEARCH: Develop tooling to facilitate post-coordination and comparison
of pre- and post-coordinated SNOMED CT content. Compare different demarcation lines
between terminology and information model in the search for an optimal compromise.
Choice of Information Model
There is an expressed wish to share the effort of terminology binding. Terminology
binding is relative to the information model used, and thus it would be helpful if
there was an agreement on what information model to use. There are, however, several
information models in health care, and it is unlikely that any one of these will be
chosen as the sole information model in the foreseeable future.
One solution could be to perform terminology binding based on how the information
is documented or displayed, i.e., at the model-of-use-level,[49] and leaving conversion to different information models, i.e., level of meaning,
to informaticians. This would be somewhat like detailed clinical models (DCMs)[50] or the clinical information model initiative (CIMI),[51] which are neutral to information models. The feasibility of these solutions has
not been described in the included papers.
RECOMMENDATION FOR PRACTICE: It is beyond the scope of this paper to recommend an
information model. Prioritise internal information structure.
RECOMMENDATION FOR RESEARCH: Evaluate feasibility of DCMs and similar solutions.
Intentional Subsets Do Not Meet Expectations
Intentional subsets can be developed using Expression Constraint Language (ECL).[52] ECL is a domain-specific language developed for SNOMED CT. The simplest form is
to include all children of a concept, but limitations can also be made with attributes
and text-strings. Some papers described the problem that all children under a concept
were not always relevant for the use case at hand. Concepts not logically defined
regarding all their characteristics, “primitive” in SNOMED CT, might not be included
in intentional subsets and manual curation of intentionally developed subset will
be necessary for some implementation work.
RECOMMENDATION FOR PRACTICE: Manually validate subsets developed with ECL.
RECOMMENDATION FOR RESEARCH: Participate in enhancing the SNOMED CT concept model
to improve ECL searches. Develop methods to minimise primitive content.
Proficiency with SNOMED CT
SNOMED CT is a complex terminology, more so than many of the alternatives. This complexity
makes it possible to cater for diverse and advance needs but comes at the cost of
greater knowledge requirements for correct implementation and use. Terminology binding
to precoordinated content requires knowledge of the construction of SNOMED CT regarding
hierarchies and inheritance of attributes, among other things. Complete terminology
binding also often needs either post-coordination or modelling of new content, which
requires even more knowledge of SNOMED CT.
Today SNOMED International offers courses and several different software use the Machine-Readable
Concept Model (MRCM) to facilitate correct modelling. Some of the papers were written
before this support was readily available, but some of the example problems provided
in the papers nevertheless show symptoms of lack of knowledge rather than deficiencies
in SNOMED CT, among both authors and reviewers.
RECOMMENDATION FOR PRACTICE: Participate in education and user for a develop targeted
education toward different types of users.
RECOMMENDATION FOR RESEARCH: Design supportive tooling geared to domain experts without
in-depth knowledge about informatics or SNOMED CT.
Tools
Tools can be used for different parts of the terminology binding process; for gathering
input data, for the actual mapping including iterative cooperative work, for storing
or sharing the result. Software for managing developed maps or subsets are not covered
in this literature review.
Different tools can support different parts of this process. The spreadsheet program
MS Excel is the most commonly used software to manage and store subsets in the included
projects. A possible explanation for this is widespread access. As several papers
stated, Excel is not, however, a suitable tool for the actual mapping.
One explanation to low usage of mapping tools in the described projects could be the
business model for the commercially available tools, especially for situations relating
to evaluation of applicability or smaller terminology binding projects. Perhaps open-source
tools, pay-by-use license or a shared license hosted by NRCs could facilitate usage
of supportive tools.
Another explanation could be that local code systems still prevail. Demand for sharable
data are now increasing and might put pressure on transferring from free text or local
codes to using SNOMED CT or other terminologies, and thus encourage the development
of facilitating software.
It is noteworthy that many of the described terminology binding tools are developed
within a research setting, and as far as we know have not become widely used.
No paper described software support for publishing or sharing subsets or maps, something
that is becoming increasingly common, for example, among FHIR and OpenEHR communities.
Informaticians are scarce and use of supportive tools could be a way of facilitating
shared work with subject matter experts, improving quality and reducing administrative
work. It would be interesting to read about tooling in gray information,[53] but to our knowledge there is no established journal or other media for that type
of content.
RECOMMENDATION FOR PRACTICE: Use dedicated tooling where such exists.
RECOMMENDATION FOR RESEARCH: Design methods and tools suited for supporting terminology
binding.
Conclusion
In this state-of-the-art literature review we have described problems reported in
the process of terminology binding to SNOMED CT and analyzed these against solutions
suggested in the included papers, other published knowledge, and our own experiences.
We have formulated recommendations for practitioners as well as future research for
each problem described. These recommendations for terminology binding processes could
facilitate semantic interoperability within health care and thus alleviate the problems
described under Introduction.
Our focus has been on work geared to SNOMED CT, but some insights might be relevant
for those working toward other terminologies.
Many of the stated problems can be solved by better co-operation between domain experts
and informaticians and better knowledge of SNOMED CT. Settings where these competences
either work together or where staff with double knowledge act as brokers are well
equipped for terminology binding. Tooling is not thoroughly researched and might be
a possible way to facilitate terminology binding and terminology curation.
Bias/Limitations
This review is affected by the same publication bias as all academic work. Successful
and/or well-funded groups publish their work, sometimes iteratively, whereas less
successful attempts, which would be very interesting, might not be published and are
thus not included. We have tried to minimise this by including all journals and conferences
indexed in three different databases.The knowledge developed within the user community
(as opposed to the academic community) is also not fully captured in this review,
and we can only claim to describe the academic knowledge of the area. Particularly,
the practical experiences of most implementers will not be covered by the academic
literature. This study could be further enhanced by a study of gray information[53] published outside the research community.