Clinical Terminology Mapping Service Based on Information Retrieval

Sungwon Jung; Seung-Jong Yu; Byoung-Kee Yi

doi:10.1055/a-2797-4219

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035037.xml

Download PDF

CC BY-NC-ND 4.0 · Methods Inf Med
DOI: 10.1055/a-2797-4219

Original Article

Clinical Terminology Mapping Service Based on Information Retrieval

Authors

Sungwon Jung

¹InfoClinic Co., Seoul, Korea

²Kangwon Institute of Telecommunications & Information, Kangwon National University, Chuncheon, Korea
Seung-Jong Yu

¹InfoClinic Co., Seoul, Korea
Byoung-Kee Yi

³Department of Artificial Intelligence Convergence, Kangwon National University, Chuncheon, Korea

Funding Information This research was supported by the Regional Innovation System & Education (RISE) program through the Gangwon RISE Center, funded by the Ministry of Education (MOE) and the Gangwon State (G.S.), Republic of Korea (2025-RISE-10-005). This study was supported by 2023 Research Grant from Kangwon National University (202305120001).

Further Information

PDF Download Permissions and Reprints

Abstract

Background

Standardized clinical terminology is essential for semantic interoperability. Typically, a hospital's terminology expert manually maps local terminology with international standards such as SNOMED CT. The manual mapping process is demanding, labor-intensive, and time-consuming, and its effectiveness relies on the expertise of the professional handling it.

Objective

We developed a method to map clinical terms to SNOMED CT concept descriptions using an information retrieval (IR) approach with rich synonyms. We also provide a free mapping support service to help terminology experts alleviate the challenges of manual mapping without the need for additional manipulation.

Methods

We created indexes using edge n-grams and synonyms. We adopted Elasticsearch for indexing and query processing, incorporating data from the SPECIALIST Lexicon to enrich the synonym database. Eight different indexes were initially created, but only four were retained based on performance. We tested indexes individually and in combination, using a dataset of 1,753 one-to-one mapped instances from the National Library of Medicine ICD-9-CM Procedure codes to the SNOMED CT Map. We compared our approach with MetaMap for evaluation.

Results

We found that using rich synonyms and edge n-gram indexing significantly improved the accuracy of mapping clinical terms to SNOMED CT. The indexes incorporating synonyms and edge n-grams performed better than those using either technique alone. Combining these methods captured more relevant terms and synonyms, resulting in more precise mappings. Our method outperformed the baseline provided by MetaMap, demonstrating enhanced capability in handling complex medical terminology and improving the overall mapping quality.

Conclusion

Our study introduced an IR method with rich synonyms for mapping clinical terms to SNOMED CT, analyzing 40 unmapped terms, and identifying key issues. The approach shows promise in improving terminology mapping, and future work will explore advanced methods to enhance accuracy further, aiming to reduce manual mapping efforts and improve result evaluation.

Keywords

semantic interoperability - term mapping - information retrieval - rich synonym - query expansion

Introduction

Standardized clinical terminology is essential for semantic interoperability.[1] [2] Despite clinical statements being expressed in English, the terms used may vary based on regions and countries.[3] Hospitals with internally developed terminologies must align local terms with standard terminology to facilitate health information exchange and support clinical research. In a general mapping approach, a hospital's terminology expert manually maps local terminology with international standards such as SNOMED CT.[4] [5] The process of manual mapping is demanding, labor-intensive, and time-consuming, and its effectiveness relies on the expertise of the professional handling it.[3] A hybrid approach combining manual and semi-automated mapping has been implemented to address these limitations.[6] [7] [8] [9] [10] [11] [12] [13] [14] [15] However, some challenges persist, such as validation, efficiency, and maintenance.

Some mapping tools are specifically designed for the sole purpose of mapping, while others may serve multiple functions. CoreNLP, Apache cTAKES, and MetaMap of the National Library of Medicine (NLM) are natural language processing tools specifically designed for clinical documents.[16] [17] [18] Their functions, such as named entity recognition, are commonly utilized in mapping processes alongside other methods. In SNOMED CT mapping methodologies, many studies employ a combination of methods, including manual mapping.[19] [20] Classical approaches in mapping methodologies involve using explicit Unified Medical Language System (UMLS) relationships,[21] [22] string-based techniques, and lexical mapping through tools like UMLS Metathesaurus and MetaMap.[23] [24] Additional techniques, such as query expansion, incorporate synonyms[21] [25] [26] and lemmatization using WordNet,[22] [26] although the latter is not specialized in clinical terms. Frequently used techniques in mapping methodologies include leveraging hierarchy structures through SNOMED CT relationships, exploiting lexical similarities, and employing post-coordination mapping.[21] [22] [25] [27] However, implementing a combination of these diverse methods entails significant costs, requiring deep knowledge and extensive training. Given this methodological diversity and the wide variation in tools, targets, and evaluation procedures across past studies, the landscape of terminology mapping research is highly heterogeneous. Although numerous previous studies have proposed terminology-mapping approaches, direct quantitative performance comparisons are not feasible because these studies used heterogeneous datasets (e.g., ICPC-2 PLUS,[22] ICD-9-CM,[21] [23] [26] ICD-10-CM,[24] Spanish pathology procedures,[25] VBA disability codes[23]), different mapping objectives, incompatible evaluation metrics, and post-coordination. Earlier studies generally report precision values of 80 to 96% for specific subsets and recall or coverage of 40 to 80%.

We developed a method to map clinical terms to SNOMED CT concept descriptions using an information retrieval (IR) approach enhanced with rich synonyms. In this study, “rich synonym” refers to a diverse set of words or phrases with similar meanings to a given query term. For instance, when a user queries “malignant tumor resection” for mapping, the word “resection” in the query is broadened to encompass synonyms like “ectomy,” “excise,” and “extraction.” Similarly, the phrase “malignant tumor” is expanded to include terms such as “cancer,” “carcinomatous,” and “malignant neoplastic disease.” Such query expansion is critical for retrieving more relevant candidates, as it increases the likelihood of identifying appropriate SNOMED CT concepts by incorporating synonyms and related terms that may not exist within the standard terminology. The IR is to search unstructured text documents that satisfy an information need from within large collections.[28] It searches documents related to query statements using indexes and sorts results by ranking scores. In this study, a clinical term in the document is regarded as a document that can be a simple word or phrase.

Objective

We publicly provide a mapping support service that applies our methods to help terminology experts.[29] This service can alleviate the challenges of manual mapping without the need for additional manipulation, such as lexical and hierarchical structure analyses. It also simplifies labor-intensive manual mapping tasks and serves as a tool for evaluating existing results.

Methods

Edge n-grams and Synonyms

We used edge n-grams[30] and synonyms to create indexes. The edge n-gram facilitates an exact string match through partial stemming and lemmatization. For instance, with minimum and maximum length of edge n-gram set to two and six, “rectum” is indexed as “re,” “rec,” “rect,” “rectu,” and “rectum.” When querying “rectal,” results include both “rectal” and “rectum” leveraging the common part “rect.” In this context, noun and adjective norms are considered synonymous, encompassing singular and plural nouns.

The number of concept descriptions or synonyms in a terminology, a word, or a phrase increases through the indexing with rich synonyms. When querying, a query is also expanded with rich synonyms, which complements short synonyms in terminologies. The number of synonyms except concept description in terminologies is typically limited or insufficient. This scarcity of synonyms can adversely affect the search performance of IR systems. [Fig. 1] illustrates the effect of index and query expansion in mapping from “resection of malignant tumor” to “excision of malignant neoplasm” in SNOMED CT. As “excision” is a synonym for “resection” and “malignant neoplasm” for “malignant tumor,” the Fully Specified Name “excision of malignant neoplasm (procedure)” is mapped. Although “excision” and “resection” are not strict clinical synonyms under formal coding standards such as ICD-10-PCS, they are frequently treated as lexically related expressions in general clinical language. In this example, the lexical similarity enables retrieval of the SNOMED CT concept “excision of malignant neoplasm (procedure),” demonstrating how rich-synonym expansion improves retrieval coverage at the lexical–semantic level.

Fig. 1 Example of index and query expansion for mapping from “resection of malignant tumor” to “excision of malignant neoplasm” in SNOMED CT.

Elasticsearch

We adopted Elasticsearch[31] for index creation and query processing. The Elasticsearch based on Apache Lucene is an open IR tool known for its reliability, as evidenced by its successful implementation in large systems such as DataMed.[32] It provides various text processing filters, including synonyms and edge n-gram. A developer has the flexibility to create custom analyzers by combining these filters.

Create Synonym Files

We used SM.DB to create synonym files. The SM.DB is a synonym dataset from the SPECIALIST Lexicon of UMLS, as shown in [Fig. 2].[33] One is for words, and the other is for phrases. We applied them to the Elasticsearch synonym filter, allowing for easy integration of synonyms and synonym phrases into the index. The word synonym file contains a series of synonyms, such as “1st_cervical_vertebra,” “atlas,” and “first_cervical_vertebra.” The phrase synonym file includes phrase synonyms in a format such as “1st cervical vertebra => 1st_cervical_vertebra” on one line and “first cervical vertebra => first_cervical vertebra” on another line.

Fig. 2 Example of the SM.DB format, which contains comprehensive synonyms.

Developing the Edge n-gram Analyzer and Designing Indexes

We developed an edge n-gram analyzer and prepared eight indexes using edge n-gram and synonym analyzers: synonym without stopwords index (index 1), synonym index (index 2), edge n-gram without stopwords index (index 3), edge n-gram index (index 4), n-gram without stopwords (index 5), n-gram index (index 6), synonym and edge n-gram without stopwords index (index 7), and synonym and edge n-gram index (index 8). However, indexes 5, 6, 7, and 8 were dropped since they showed lower performance than indexes 1, 2, 3, and 4. In our exploratory experiments, full n-gram tokenization used in indexes 5 and 6 fragmented clinical expressions such as “malignant tumor” into overly small units (e.g., “ma,” “ali,” “nig,” “tum,” “neo”), disrupting the lexical structures necessary for meaningful semantic matching. This excessive fragmentation broadened the retrieval space, increasing irrelevant matches; thus, while recall could increase, precision inevitably decreased due to the weakened semantic matching signal. For indexes 7 and 8, which combine synonym expansion with edge n-gram processing, the interaction between the two components generated an exponential rise in non-informative tokens. Synonym expansion introduces a large set of alternative lexical forms, and subsequent edge n-gram decomposition further splits these forms into partial fragments. This combination significantly increased noise across the index and diluted the effective matching signal, leading to a notable reduction in retrieval precision. Due to these structural limitations, indexes 5 to 8 consistently exhibited lower performance than the synonym-based or edge n-gram–based indexes (1–4) and were excluded from further evaluation.

To determine the optimal max n-gram, it is configured between 2 and 15, while the min n-gram is fixed at two, as depicted in [Fig. 3].

Fig. 3 The concept architecture of clinical terminology mapping method based on information retrieval.

Test and Evaluation

We test mapping performance using individual indexes: indexes 1, 2, 3, and 4. Additionally, we explore combinations of these indexes to identify the optimal set, such as index 1 + 2, index 1 + 2 + 3, index 1 + 2 + 4, and so on. We employed MetaMap Batch[34] for comparison purposes. Since MetaMap is not designed for terminology mapping, a direct comparison may not be feasible. Nonetheless, given its usage in prior research alongside other methods, we conducted an indirect comparison with our method.

We configured MetaMap options, as shown in [Fig. 4], since adjusting many other options led to worse results. The option “-V USAbase” specifies the use of the UMLS Metathesaurus subset derived from U.S.-based vocabularies, while “-L 17” and “-Z 1718” indicate the UMLS version and dataset subset employed during processing. The “-E” flag enables Word Sense Disambiguation, allowing MetaMap to select the most contextually appropriate meaning when multiple interpretations exist. For output configuration, the “-A” flag was used with sub-options to hide plain syntax (-p), and show all candidates (-c), number of candidates (-n), and number of mappings (-f), providing structured and easily reviewable output. Additionally, to align MetaMap's retrieval behavior with our SNOMED CT–based evaluation, we restricted the vocabulary sources using “-R SNOMEDCT_US,” ensuring that all retrieved candidate concepts originated exclusively from SNOMED CT.

To evaluate our method's performance, we employed a dataset comprising 1,753 one-to-one mapped instances from the NLM's ICD-9-CM Procedure codes to the SNOMED CT Map. It is designed to support migrating legacy ICD-9-CM procedure codes to SNOMED CT.[35]

Ethical Considerations

This study presents a method to minimize manual mapping efforts and offers a publicly accessible mapping support service. The data utilized in this study are publicly available and do not require separate informed consent.

Results

The mapping results display terms with exact and similar matches by corresponding ranking scores. Evaluating the semantic relevance of similar matching terms can be challenging. Therefore, we employed precision at 10 (P@10) as an evaluation metric to assess whether the exact or similar target term appears within the top 10 ranked results. However, P@10 cannot be applied to evaluate MetaMap. Although MetaMap returns multiple candidates with associated scores, it does not provide ranked output in a manner compatible with P@10: in almost all cases, the top candidate is assigned a uniform score of 1,000, and most remaining candidates represent partial fragments of the original term rather than distinct alternatives. Because of this structural behavior, MetaMap cannot be meaningfully evaluated using ranked retrieval metrics. Therefore, for MetaMap, we evaluated only whether a correct matching term existed, rather than applying P@10. This asymmetric evaluation reflects inherent differences between MetaMap and our method: MetaMap is not designed as a ranked retrieval engine, whereas our IR-based approach generates ranking scores that can be evaluated with P@10. Accordingly, the comparison with MetaMap should be interpreted as a baseline reference rather than a direct performance comparison.

We conducted tests using both single and multiple combined indexes, incorporating max n-gram values ranging from 2 to 15 in edge n-gram. The optimal result is achieved with 1,686 out of 1,753 (96.1%) at max 5-gram, utilizing index 1 + 4. The variation in results among indexes, such as index 1 + 3 and index 1 + 4, is minimal, especially around max 5-gram. The mapping result of MetaMap Batch is 1,136 out of 1,753 (64.8%), which is lower than expected, as explained in the Methods section.

[Figs. 5] and [6] show the results of the single index and combined index mapping tests, respectively. The combined indexes were selected based on the single index test results. In [Fig. 5], the results of the synonym indexes, indexes 1 and 2, remain constant regardless of max n-gram. The performance of the edge n-gram indexes, indexes 3 and 4, changes depending on n-gram. The performance improvement effect is minimal from max 4-gram, converging around max 12-gram. The indexes without stopwords, indexes 1 and 3, perform similarly better than those with stopwords. Even the best result from the index of edge n-gram without synonyms could not reach that of the index of synonyms without stopwords. It means that the results of synonym indexes are superior to those of edge n-gram indexes.

Fig. 5 Single indexes with max n-gram from 2 to 15. (Index 1: Synonym without stopwords; Index 2: Synonym; Index 3: Edge n-gram without stopwords; and Index 4: Edge n-gram)

Fig. 6 Results of combined indexes through max n-gram from 2 to 15. (Index 1 + 3: Synonym without stopwords + edge n-gram without stopwords; Index 1 + 4: Synonym without stopwords edge n-gram; and Index 1: Synonym without stopwords)

[Fig. 6] shows the experiment's results by combining index 1, which showed the best result in the single index test in [Fig. 5], with index 3 and index 4. The index 1 + 4 shows the best result, though the distinction from index 1 + 3 is minimal. With regard to edge n-gram, the results indicate the worst performance at max 2-gram, gradually increasing to reach the best performance at max 5-gram. After that, the performance of the edge n-gram index slightly decreases and converges. These results show that the rich synonym plays a significant role in the mapping performance.

Discussion

We have developed a method that utilizes an IR approach with rich synonyms to map clinical terms to SNOMED CT concept descriptions. This method alleviates the challenges of manual mapping without requiring additional manipulations, such as adjusting boost weight or conducting complex lexical and hierarchical structure analyses. Although synonym expansion improves the retrieval of lexically related terms, it does not guarantee equivalence under strict clinical coding standards. For example, terms such as “excision” and “resection” may appear lexically similar, but coding systems like ICD-10-PCS differentiate them based on the extent of tissue removal. Therefore, our method should be interpreted as supporting lexical–semantic mapping rather than enforcing coding-rule distinctions, and cases requiring strict definitional separation may need additional mechanisms such as hierarchical reasoning or post-coordination in SNOMED CT.

We offer a free mapping support web service that applies our methods to assist terminology experts,[22] as shown in [Fig. 7A]. We also provide SNOMED CT and LOINC browsers to terminology experts, which is convenient for mapping local terms, as shown in [Fig. 7B]. Term searches are available not only through browsers but also via RESTful API. The SNOMED CT browser also supports Korean searches for specific terms.

Fig. 7 Screenshots of (A) mapping support service and (B) SNOMED CT browser that can be used for free.

We have identified synonyms as a significant factor in clinical mapping. We also verified that setting the max n-gram value to six or more is unnecessary when an adequate number of synonyms are available. However, it is important to note that additional synonyms may be required for local terms. In future research, we will consider applying methods like lexical analysis and advanced techniques such as Word2Vec[36] and Generative AI for expanding synonyms.

Analysis of Non-mapped Terms

Based on the results, we analyzed 40 non-mapped terms to identify challenges regardless of index types and ranking scores. The findings are categorized into six types. The first category pertains to the need to review the results of each mapping tool. In [Table 1], 16 recommendations are listed, and they do not align with NLM recommendations in the dataset. Mapping results from our research may be more suitable than NLM recommendations, necessitating further review. For example, NLM recommends “Administration of immune serum” as the mapping term for “Injection or infusion of immunoglobulin” in ICD-9-CM. However, both MetaMap and our research recommend “Passive Immunization.” In another instance, NLM recommends “280464000 | Revision of shoulder arthroplasty (procedure) |” as the mapping term for “Reverse total shoulder replacement” in ICD-9-CM. However, both MetaMap and our research recommend “42262007 | Total shoulder replacement (procedure) |” as the mapping term. It is important to note that “Reverse” is not a synonym for “revision.” Although terms containing “reverse” exist in SNOMED CT, such as “733592000 | Reverse total right shoulder replacement |” and “733591007 | Reverse total left shoulder replacement (procedure) |,” proper parent concepts like “reverse total shoulder replacement” for both concepts do not exist.

The NLM's mapping approach identifies synonymy relations between ICD-9-CM terms and SNOMED CT concepts using the UMLS. One-to-one synonym relations that are automatically generated between an ICD-9-CM rubric and a SNOMED CT concept are accepted as valid mappings and are not subjected to manual review. One-to-many mappings, however, undergo manual validation and are converted to one-to-one mappings whenever possible. During this manual review process, the selected SNOMED CT concept may represent a broader or narrower meaning than the corresponding ICD-9-CM term.[35]

We confirmed that the NLM treated “Reverse total shoulder replacement” → “280464000 | Revision of shoulder arthroplasty (procedure) |,” along with 15 additional terms, as 1:1 mappings and therefore did not apply manual review. The reason NLM's results differ from those produced by MetaMap or our method is likely attributable to technical limitations inherent in the automated mapping process or to characteristics of the ICD-9-CM coding system. First, because NLM relies on UMLS synonym identification during the initial phase, key lexical elements may have been assessed as highly similar, despite not being true clinical synonyms. Second, ICD-9-CM is less granular than SNOMED CT; thus, during automated processing, the system may have selected the closest SNOMED CT concept available, even if imperfect. Third, the automated mapping algorithm may have misinterpreted linguistic or syntactic cues, resulting in an incorrect semantic association.

The second category arises when SNOMED CT encounters issues with synonyms. Our study recommends the Fully Specified Name “386652006 | Colocystoplasty (procedure) |” to “Cystocolic anastomosis” in ICD-9-CM since “Cystocolic anastomosis” is a synonym of “Colocystoplasty” in SNOMED CT. However, it looks like an improper synonym and needs to be reevaluated by SNOMED International.

The third category is when the terminology in ICD-9-CM includes terms such as “and,” “or,” “NOS,” and “unspecified.” Terms in [Table 2] typically require either one-to-many or one-to-one broad mapping. In this study, the terms did not consider whether one-to-one mapping is proper due to the challenges associated with generalization.

Table 2
Terms including “AND,” “OR,” “NOS,” and “Unspecified” in mismatched terms
	ICD-9-CM term	NLM recommendation
AND	Repair and plastic operations on spleen	265861008 \| Suture/repair of spleen (procedure) \|
OR	Excisional debridement of wound, infection, or burn	225148005 \| Surgical debridement of wound (procedure) \|
	Thoracoscopic excision of lesion or tissue of lung	444188003 \| Thoracoscopic wedge resection of lung (procedure) \|
	Local excision or destruction of palate by cryotherapy	190630008 \| Cryotherapy of palate (procedure) \|
NOS	Diagnostic interview and evaluation, not otherwise specified	84100007 \| History taking (procedure) \|
	Intestinal anastomosis, not otherwise specified	235407009 \| Gastrointestinal tract anastomosis - intestine (procedure) \|
	Evisceration of orbit NOS	398328001 \| Orbitectomy (procedure) \|
Unspecified	Arthrodesis of unspecified joint	19578002 \| Arthrodesis (procedure) \|

The fourth category is where one-to-many mapping is more appropriate than one-to-one mapping. In this case, NLM recommends “19417000 | Debridement of open fracture of foot (procedure) |” as a broader term for “Debridement of open fracture of tarsals and metatarsals” in the ICD-9-CM. However, it might be appropriate to map narrower terms such as “448934004 |Debridement of open fracture of metatarsal bone (procedure)|.”

The fifth category pertains to situations where the addition of synonyms becomes necessary. [Table 3] lists 12 cases that are not mapped, but accurate mapping can be achieved by supplementing the index with additional synonyms.

Table 3
Twelve terms that needed addition of synonyms
	ICD-9-CM terms	NLM recommendation	Synonyms needed
1	Computerized axial tomography of thorax	169069000 \| Computed tomography of chest (procedure) \|	Thorax, chest
2	Catheter based invasive electrophysiologic testing	82982002 \| Cardiac electrophysiologic stimulation and recording study (procedure) \|	Electrophysiologic, cardiac electrophysiologic
3	Tenoplasty	397139008 \| Plastic operation on tendon (procedure) \|	Tenoplasty, plastic operation of tendon
4	Administration of tetanus toxoid	127786006 \| Tetanus vaccination (procedure) \|	Tetanus toxoid, tetanus vaccine
5	Sphincterotomy of bladder	46089005 \| Division of bladder neck (procedure) \|	Sphincterotomy, division of neck
6	Proctotomy	34414001 \| Incision of rectum (procedure) \|	Proctotomy, incision of rectum
7	Onychoplasty	405734007 \| Repair of nail bed (procedure) \|	Onychoplasty, repair of nail bed
8	Construction of auricle of ear	120136006 \| External ear reconstruction (procedure) \|	Construction, reconstruction
9	Soave submucosal resection of rectum	275014009 \| Soave endorectal pull-through operation for Hirschsprung's disease (procedure) \|	Submucosal resection of rectum, endorectal pull-through
10	Intravascular imaging of coronary vessels	33367005 \| Coronary angiography (procedure) \|	Intravascular imaging, angiography
11	Creation of septal defect in heart	305928009 \| Open atrial septectomy (procedure) \|	Creation of septal defect, septectomy
12	Duhamel resection of rectum	76062007 \| Duhamel operation, abdominoperineal pull-through (procedure) \|	Resection of rectum, abdominoperineal pull-through

The sixth category is that some terms are not successfully mapped using our approach, and adding synonyms alone does not lead to successful mapping. For instance, “Metacarpophalangeal fusion” in ICD-9-CM is recommended as “46504007 | Arthrodesis of metacarpophalangeal joint (procedure) |” by NLM, but our approach does not map it. Possible paraphrases of “arthrodesis” include expressions such as “fusion of joint” or “joint fusion.” In SNOMED CT, the concept “46504007 | Arthrodesis of metacarpophalangeal joint (procedure) |” is defined with the Method attribute “Fusion – action,” and, from a clinical perspective, metacarpophalangeal (MCP) fusion and MCP arthrodesis refer to the same surgical procedure. Our method failed to map “Metacarpophalangeal fusion” to this concept because it mainly depends on surface-form lexical similarity and synonym lists, without using the formal axioms and attribute–value structure of SNOMED CT. Another example is the case of “Angiocardiography of venae cavae” in ICD-9-CM, where NLM recommends “4438009 | Venography of vena cava (procedure) |.” Notably, “angiocardiography” is not a lexical synonym but a parent concept from which venography is derived. Our IR-based approach, which relies primarily on lexical similarity and synonym expansion, cannot detect such parent–child semantic relationships within SNOMED CT. These examples highlight a limitation of a purely synonym- and string-based IR approach and suggest that leveraging SNOMED CT's definitional logic and hierarchy could improve mapping performance in future work.

Among the previous six cases, the first, second, third, and fourth are mapping dataset and terminology issues. The fifth and sixth present challenges in our methodology, but the fifth can be resolved by simply adding a synonym. Only two cases, the sixth, require a different methodology. Detailed analysis of the unmapped data revealed some limitations to this study. It provides insights for enhancing our methodology, showcasing the potential applicability for mapping tasks, and validating the quality of existing data.

Limitations

This study has some limitations. First, we focused solely on testing procedure codes, and expansion is needed into other domains. Second, the dataset size is relatively small. A larger dataset would likely reveal more distinct gaps between combined indexes, such as indexes 1 and 3 or 1 and 4. Third, we processed original ICD-9-CM terms using only Elasticsearch's lowercase and stopwords filters for general-purpose application and complexity reduction. Fourth, we opted for an indirect comparison with MetaMap due to challenges in finding open applications and reproducing other studies. Fifth, as discussed earlier, direct comparison with MetaMap is not feasible. Although MetaMap can provide a baseline for reference, it is not a dedicated terminology-mapping tool and therefore serves only as a restricted baseline rather than a comprehensive comparator. In evaluations involving precision, recall, and F-measure, recommending the most appropriate mapping result is more meaningful than presenting a ranking list. Sixth, the evaluation in this study was conducted using a normalized dataset, which does not fully reflect the variability of real-world local terminologies. Local terminology often contains un-normalized expressions, misspellings, abbreviations, and other inconsistencies.

Conclusion

We developed a method using an information retrieval approach with rich synonyms to map clinical terms to SNOMED CT and evaluated it using 1,753 one-to-one mapped instances from the NLM's ICD-9-CM Procedure codes to the SNOMED CT Map. Based on the results, we analyzed the 42 unmapped terms and classified them into six types: first, terms that need review; second, synonym challenges in SNOMED CT; third, issues for “and,” “or,” “NOS,” and “unspecified”; fourth, one-to-many mapping; fifth, addition of synonyms needed; sixth, terms are not mapped using our approach.

The IR approach with rich synonyms exhibits significant potential for mapping disparate terminologies. In our future research, we plan to incorporate recent advancements, including the distributed representation method related to finding similar words, into our work. We anticipate that our study will streamline labor-intensive manual mapping efforts and serve as a valuable tool for evaluating existing results.

Clinical Health Implications

Standardized clinical terminology enables semantic interoperability, but mapping local terms to international standards like SNOMED CT is a complex, labor-intensive process. This study provides a free mapping support service to help terminology experts alleviate the challenges of manual mapping.

Conflict of Interest

The authors declare that they have no conflict of interest.

References
1 Palojoki S, Lehtonen L, Vuokko R. Semantic interoperability of electronic health records: systematic review of alternative approaches for enhancing patient information availability. JMIR Med Inform 2024; 12: e53535

Crossref PubMed Search in Google Scholar
Download RIS citation
2 Facile R, Chronaki C, van Reusel P, Kush R. Standards in sync: five principles to achieve semantic interoperability for TRUE research for healthcare. Front Digit Health 2025; 7: 1567624

Crossref PubMed Search in Google Scholar
Download RIS citation
3 Sung S, Park HA, Jung H, Kang H. A SNOMED CT mapping guideline for the local terms used to document clinical findings and procedures in electronic medical records in South Korea: methodological study. JMIR Med Inform 2023; 11: e46127

Crossref PubMed Search in Google Scholar
Download RIS citation
4 SNOMED International. Accessed August 27, 2025 at: https://www.snomed.org

Download RIS citation
5 World Health Organization. WHO-FIC Classifications and Terminology Mapping: Principles and Best Practice. Geneva: WHO; 2021

Search in Google Scholar
Download RIS citation
6 Thandi M, Brown S, Wong ST. Mapping frailty concepts to SNOMED CT. Int J Med Inform 2021; 149: 104409

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Lougheed MD, Thomas NJ, Wasilewski NV, Morra AH, Minard JP. Use of SNOMED CT® and LOINC® to standardize terminology for primary care asthma electronic health records. J Asthma 2018; 55 (06) 629-639

Crossref PubMed Search in Google Scholar
Download RIS citation
8 Block L, Handfield S. Mapping wound assessment data elements in SNOMED CT. Stud Health Technol Inform 2016; 225: 1078-1079

PubMed Search in Google Scholar
Download RIS citation
9 Mészáros Á, Kovács S, Héja T, Bagyura Z, Zemplényi A. Mapping Hungarian procedure codes to SNOMED CT. BMC Med Res Methodol 2023; 23 (01) 240

Crossref PubMed Search in Google Scholar
Download RIS citation
10 EDI-SNOMED CT mapping table. Accessed February 13, 2026; available at: https://hins.or.kr/menu/viewMenu.do?menuNo=3070200

Download RIS citation
11 KCD-SNOMED CT mapping table. Accessed February 13, 2026; available at: https://hins.or.kr/menu/viewMenu.do?menuNo=3070100

Download RIS citation
12 Pedersen MK, Eriksson R, Reguant R. et al. A unidirectional mapping of ICD-8 to ICD-10 codes, for harmonized longitudinal analysis of diseases. Eur J Epidemiol 2023; 38 (10) 1043-1052

Crossref PubMed Search in Google Scholar
Download RIS citation
13 Rajput AM, Triep K, Endrich O. Semi-automated approach to map clinical concepts to SNOMED CT terms by using terminology server. In: dHealth 2022. Amsterdam: IOS Press; 2022: 67-72

Search in Google Scholar
Download RIS citation
14 Gaudet-Blavignac C, Foufi V, Bjelogrlic M, Lovis C. Use of the systematized nomenclature of medicine clinical terms (SNOMED CT) for processing free text in health care: systematic scoping review. J Med Internet Res 2021; 23 (01) e24594

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Torres FBG, Gomes DC, Hino AAF, Moro C, Cubas MR. Comparison of the results of manual and automated processes of cross-mapping between nursing terms: quantitative study. JMIR Nurs 2020; 3 (01) e18501

Crossref PubMed Search in Google Scholar
Download RIS citation
16 Gupta S, MacLean DL, Heer J, Manning CD. Induced lexico-syntactic patterns improve information extraction from online medical forums. J Am Med Inform Assoc 2014; 21 (05) 902-909

Crossref PubMed Search in Google Scholar
Download RIS citation
17 de Bruijn B, Cherry C, Kiritchenko S, Martin J, Zhu X. Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assoc 2011; 18 (05) 557-562

Crossref PubMed Search in Google Scholar
Download RIS citation
18 Wu Y, Denny JC, Trent Rosenbloom S. et al. A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). J Am Med Inform Assoc 2017; 24 (e1): e79-e86

Crossref PubMed Search in Google Scholar
Download RIS citation
19 So EY, Park HA. Exploring the possibility of information sharing between the medical and nursing domains by mapping medical records to SNOMED CT and ICNP. Healthc Inform Res 2011; 17 (03) 156-161

Crossref PubMed Search in Google Scholar
Download RIS citation
20 Wade G, Rosenbloom ST. Experiences mapping a legacy interface terminology to SNOMED CT. BMC Med Inform Decis Mak 2008; 8 (Suppl. 01) S3

Crossref PubMed Search in Google Scholar
Download RIS citation
21 Fung KW, Bodenreider O. Utilizing the UMLS for semantic mapping between terminologies. AMIA Annu Symp Proc 2005; 2005: 266-270

PubMed Search in Google Scholar
Download RIS citation
22 Wang Y, Patrick J, Miller G, O'Hallaran J. A computational linguistics motivated mapping of ICPC-2 PLUS to SNOMED CT. BMC Med Inform Decis Mak 2008; 8 (Suppl. 01) S5

Crossref PubMed Search in Google Scholar
Download RIS citation
23 Brown SH, Husser CS, Wahner-Roedler D. et al. Using SNOMED CT as a reference terminology to cross map two highly pre-coordinated classification systems. Stud Health Technol Inform 2007; 129 (Pt 1): 636-639

PubMed Search in Google Scholar
Download RIS citation
24 Cartagena FP, Schaeffer M, Rifai D, Doroshenko V, Goldberg HS. Leveraging the NLM map from SNOMED CT to ICD-10-CM to facilitate adoption of ICD-10-CM. J Am Med Inform Assoc 2015; 22 (03) 659-670

Crossref PubMed Search in Google Scholar
Download RIS citation
25 Allones JL, Martinez D, Taboada M. Automated mapping of clinical terms into SNOMED-CT. An application to codify procedures in pathology. J Med Syst 2014; 38 (10) 134

Crossref PubMed Search in Google Scholar
Download RIS citation
26 Nadkarni PM, Darer JA. Migrating existing clinical content from ICD-9 to SNOMED. J Am Med Inform Assoc 2010; 17 (05) 602-607

Crossref PubMed Search in Google Scholar
Download RIS citation
27 Kate RJ. Towards converting clinical phrases into SNOMED CT expressions. Biomed Inform Insights 2013; 6 (Suppl. 01) 29-37

PubMed Search in Google Scholar
Download RIS citation
28 Schütze H, Manning CD, Raghavan P. Introduction to Information Retrieval. Vol 39. Cambridge: Cambridge University Press; 2008

Search in Google Scholar
Download RIS citation
29 InfoClinic. Mapping support service. Accessed August 27, 2025 at: http://stom.infoclinic.co

Download RIS citation
30 Gormley C, Tong Z. Elasticsearch: The Definitive Guide: A Distributed Real-time Search and Analytics Engine. Sebastopol, CA: O'Reilly Media, Inc.; 2015

Search in Google Scholar
Download RIS citation
31 Elasticsearch. Accessed August 27, 2025 at: https://www.elastic.co

Download RIS citation
32 Chen X, Gururaj AE, Ozyurt B. et al. DataMed—an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc 2018; 25 (03) 300-308

Crossref PubMed Search in Google Scholar
Download RIS citation
33 National Library of Medicine. Unified Medical Language System (UMLS): The SPECIALIST Lexicon. Accessed August 27, 2025 at: https://www.nlm.nih.gov/research/umls/new_users/online_learning/LEX_001.html

Download RIS citation
34 Batch MetaMap. Accessed August 27, 2025 at: https://ii.nlm.nih.gov/Batch/UTS_Required/MetaMap.html

Download RIS citation
35 ICD-9-CM procedure codes to SNOMED CT map. Accessed August 27, 2025 at: https://www.nlm.nih.gov/research/umls/mapping_projects/icd9cmv3_to_snomedct.html

Download RIS citation
36 Mikolov T, Le QV, Sutskever I. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168. Published 2013. Updated 2022

Crossref
Download RIS citation

Correspondence

Byoung-Kee Yi, PhD

Department of Artificial Intelligence Convergence, Kangwon National University

1 Kangwondaehak-gil, Chuncheon-si, Gangwon-do, 24341

Korea

Email: byoungkeeyi@gmail.com

Publication History

Received: 27 August 2025

Accepted: 23 January 2026

Accepted Manuscript online:
03 February 2026

Article published online:
16 February 2026

© 2026. The Author(s). This is an open access article published by Thieme under the terms of the Creative Commons Attribution-NonDerivative-NonCommercial License, permitting copying and reproduction so long as the original work is given appropriate credit. Contents may not be used for commercial purposes, or adapted, remixed, transformed or built upon. (https://creativecommons.org/licenses/by-nc-nd/4.0/)

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

References
1 Palojoki S, Lehtonen L, Vuokko R. Semantic interoperability of electronic health records: systematic review of alternative approaches for enhancing patient information availability. JMIR Med Inform 2024; 12: e53535

Crossref PubMed Search in Google Scholar
Download RIS citation
2 Facile R, Chronaki C, van Reusel P, Kush R. Standards in sync: five principles to achieve semantic interoperability for TRUE research for healthcare. Front Digit Health 2025; 7: 1567624

Crossref PubMed Search in Google Scholar
Download RIS citation
3 Sung S, Park HA, Jung H, Kang H. A SNOMED CT mapping guideline for the local terms used to document clinical findings and procedures in electronic medical records in South Korea: methodological study. JMIR Med Inform 2023; 11: e46127

Crossref PubMed Search in Google Scholar
Download RIS citation
4 SNOMED International. Accessed August 27, 2025 at: https://www.snomed.org

Download RIS citation
5 World Health Organization. WHO-FIC Classifications and Terminology Mapping: Principles and Best Practice. Geneva: WHO; 2021

Search in Google Scholar
Download RIS citation
6 Thandi M, Brown S, Wong ST. Mapping frailty concepts to SNOMED CT. Int J Med Inform 2021; 149: 104409

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Lougheed MD, Thomas NJ, Wasilewski NV, Morra AH, Minard JP. Use of SNOMED CT® and LOINC® to standardize terminology for primary care asthma electronic health records. J Asthma 2018; 55 (06) 629-639

Crossref PubMed Search in Google Scholar
Download RIS citation
8 Block L, Handfield S. Mapping wound assessment data elements in SNOMED CT. Stud Health Technol Inform 2016; 225: 1078-1079

PubMed Search in Google Scholar
Download RIS citation
9 Mészáros Á, Kovács S, Héja T, Bagyura Z, Zemplényi A. Mapping Hungarian procedure codes to SNOMED CT. BMC Med Res Methodol 2023; 23 (01) 240

Crossref PubMed Search in Google Scholar
Download RIS citation
10 EDI-SNOMED CT mapping table. Accessed February 13, 2026; available at: https://hins.or.kr/menu/viewMenu.do?menuNo=3070200

Download RIS citation
11 KCD-SNOMED CT mapping table. Accessed February 13, 2026; available at: https://hins.or.kr/menu/viewMenu.do?menuNo=3070100

Download RIS citation
12 Pedersen MK, Eriksson R, Reguant R. et al. A unidirectional mapping of ICD-8 to ICD-10 codes, for harmonized longitudinal analysis of diseases. Eur J Epidemiol 2023; 38 (10) 1043-1052

Crossref PubMed Search in Google Scholar
Download RIS citation
13 Rajput AM, Triep K, Endrich O. Semi-automated approach to map clinical concepts to SNOMED CT terms by using terminology server. In: dHealth 2022. Amsterdam: IOS Press; 2022: 67-72

Search in Google Scholar
Download RIS citation
14 Gaudet-Blavignac C, Foufi V, Bjelogrlic M, Lovis C. Use of the systematized nomenclature of medicine clinical terms (SNOMED CT) for processing free text in health care: systematic scoping review. J Med Internet Res 2021; 23 (01) e24594

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Torres FBG, Gomes DC, Hino AAF, Moro C, Cubas MR. Comparison of the results of manual and automated processes of cross-mapping between nursing terms: quantitative study. JMIR Nurs 2020; 3 (01) e18501

Crossref PubMed Search in Google Scholar
Download RIS citation
16 Gupta S, MacLean DL, Heer J, Manning CD. Induced lexico-syntactic patterns improve information extraction from online medical forums. J Am Med Inform Assoc 2014; 21 (05) 902-909

Crossref PubMed Search in Google Scholar
Download RIS citation
17 de Bruijn B, Cherry C, Kiritchenko S, Martin J, Zhu X. Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010. J Am Med Inform Assoc 2011; 18 (05) 557-562

Crossref PubMed Search in Google Scholar
Download RIS citation
18 Wu Y, Denny JC, Trent Rosenbloom S. et al. A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). J Am Med Inform Assoc 2017; 24 (e1): e79-e86

Crossref PubMed Search in Google Scholar
Download RIS citation
19 So EY, Park HA. Exploring the possibility of information sharing between the medical and nursing domains by mapping medical records to SNOMED CT and ICNP. Healthc Inform Res 2011; 17 (03) 156-161

Crossref PubMed Search in Google Scholar
Download RIS citation
20 Wade G, Rosenbloom ST. Experiences mapping a legacy interface terminology to SNOMED CT. BMC Med Inform Decis Mak 2008; 8 (Suppl. 01) S3

Crossref PubMed Search in Google Scholar
Download RIS citation
21 Fung KW, Bodenreider O. Utilizing the UMLS for semantic mapping between terminologies. AMIA Annu Symp Proc 2005; 2005: 266-270

PubMed Search in Google Scholar
Download RIS citation
22 Wang Y, Patrick J, Miller G, O'Hallaran J. A computational linguistics motivated mapping of ICPC-2 PLUS to SNOMED CT. BMC Med Inform Decis Mak 2008; 8 (Suppl. 01) S5

Crossref PubMed Search in Google Scholar
Download RIS citation
23 Brown SH, Husser CS, Wahner-Roedler D. et al. Using SNOMED CT as a reference terminology to cross map two highly pre-coordinated classification systems. Stud Health Technol Inform 2007; 129 (Pt 1): 636-639

PubMed Search in Google Scholar
Download RIS citation
24 Cartagena FP, Schaeffer M, Rifai D, Doroshenko V, Goldberg HS. Leveraging the NLM map from SNOMED CT to ICD-10-CM to facilitate adoption of ICD-10-CM. J Am Med Inform Assoc 2015; 22 (03) 659-670

Crossref PubMed Search in Google Scholar
Download RIS citation
25 Allones JL, Martinez D, Taboada M. Automated mapping of clinical terms into SNOMED-CT. An application to codify procedures in pathology. J Med Syst 2014; 38 (10) 134

Crossref PubMed Search in Google Scholar
Download RIS citation
26 Nadkarni PM, Darer JA. Migrating existing clinical content from ICD-9 to SNOMED. J Am Med Inform Assoc 2010; 17 (05) 602-607

Crossref PubMed Search in Google Scholar
Download RIS citation
27 Kate RJ. Towards converting clinical phrases into SNOMED CT expressions. Biomed Inform Insights 2013; 6 (Suppl. 01) 29-37

PubMed Search in Google Scholar
Download RIS citation
28 Schütze H, Manning CD, Raghavan P. Introduction to Information Retrieval. Vol 39. Cambridge: Cambridge University Press; 2008

Search in Google Scholar
Download RIS citation
29 InfoClinic. Mapping support service. Accessed August 27, 2025 at: http://stom.infoclinic.co

Download RIS citation
30 Gormley C, Tong Z. Elasticsearch: The Definitive Guide: A Distributed Real-time Search and Analytics Engine. Sebastopol, CA: O'Reilly Media, Inc.; 2015

Search in Google Scholar
Download RIS citation
31 Elasticsearch. Accessed August 27, 2025 at: https://www.elastic.co

Download RIS citation
32 Chen X, Gururaj AE, Ozyurt B. et al. DataMed—an open source discovery index for finding biomedical datasets. J Am Med Inform Assoc 2018; 25 (03) 300-308

Crossref PubMed Search in Google Scholar
Download RIS citation
33 National Library of Medicine. Unified Medical Language System (UMLS): The SPECIALIST Lexicon. Accessed August 27, 2025 at: https://www.nlm.nih.gov/research/umls/new_users/online_learning/LEX_001.html

Download RIS citation
34 Batch MetaMap. Accessed August 27, 2025 at: https://ii.nlm.nih.gov/Batch/UTS_Required/MetaMap.html

Download RIS citation
35 ICD-9-CM procedure codes to SNOMED CT map. Accessed August 27, 2025 at: https://www.nlm.nih.gov/research/umls/mapping_projects/icd9cmv3_to_snomedct.html

Download RIS citation
36 Mikolov T, Le QV, Sutskever I. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168. Published 2013. Updated 2022

Crossref
Download RIS citation

Permissions and Reprints

	ICD-9-CM term	NLM recommendation	MetaMap result	Our result
1	Insertion of indwelling urinary catheter	266765001 \|Indwelling catheter inserted (situation) \|	Insertion of indwelling urinary catheter	446583004 \|Insertion of indwelling catheter into urinary bladder (procedure)\|
2	Laparoscopic lysis of peritoneal adhesions	264971004 \| Endoscopic division of adhesions of peritoneum (procedure) \|	264971004 \| Endoscopic division of adhesions of peritoneum (procedure) \|	708614008 \|Laparoscopic lysis of adhesion of peritoneum (procedure)\|
3	Endoscopic insertion of stent (tube) into bile duct	83735008 \| Insertion of biliary stent by endoscopic retrograde cholangiopancreatography (procedure) \|	N/A	709197008 \|Endoscopic insertion of stent into bile duct (procedure)\|
4	Injection or infusion of immunoglobulin	48556009 \| Administration of immune serum (procedure) \|	51116004 \|Passive immunization (procedure)\|	51116004 \|Passive immunization (procedure)\|
5	Pulmonary scan	441677009 \| Imaging of lung (procedure) \|	241293008 \|Radionuclide study of lung (procedure)\|	241293008 \|Radionuclide study of lung (procedure)\|
6	Partial gastrectomy with anastomosis to jejunum	83985009 \| Resection of stomach with gastrojejunal anastomosis (procedure) \|	N/A	173720000 \|Partial gastrectomy and anastomosis of stomach to transposed jejunum (procedure)\|
7	Myotomy	178274007 \| Incision of muscle (procedure) \|	36453007 \|Division of muscle (procedure)\|	36453007 \|Division of muscle (procedure)\|
8	Obliteration of vaginal vault	120036009 \| Vagina closure (procedure) \|	708853002 \|Obliteration of vaginal vault (procedure)\|	708853002 \|Obliteration of vaginal vault (procedure)\|
9	Suture of laceration of rectum	42904002 \| Repair of rectal laceration (procedure) \|	42904002 \| Repair of rectal laceration (procedure) \|	448829002 \|Suturing of laceration of rectum (procedure)\|
10	Contrast arthrogram	33148003 \| Arthrography (procedure) \|	Arthrogram	446025003 \|Arthrography using contrast (procedure)\|
11	Transabdominal endoscopy of large intestine	34264006 \| Intraoperative colonoscopy (procedure) \|	N/A	446415001 \|Endoscopy of large intestine by transabdominal approach (procedure)\|
12	Nephroscopy	52621004 \| Endoscopy of kidney (procedure) \|	52621004 \| Endoscopy of kidney (procedure) \|	872341000168100 \|Nephroscopy (procedure)\|
13	Pyelostomy	44267002 \| Insertion of drainage tube into kidney pelvis (procedure) \|	X	88734005 \|Renal pyelostomy (procedure)\|
14	Osteopathic manipulative treatment using isotonic, isometric forces	417447009 \| Muscle energy technique (procedure) \|	16992002 \|Osteopathic manipulation (procedure)\|	16992002 \|Osteopathic manipulation (procedure)\|
15	Apicoectomy	23199002 \| Root amputation, per root (procedure) \|	446286008 \|Excision of root of single-rooted tooth (procedure)\|	446286008 \|Excision of root of single-rooted tooth (procedure)\|
16	Reverse total shoulder replacement	280464000 \| Revision of shoulder arthroplasty (procedure) \|	42262007 \| Total shoulder replacement (procedure)\|	42262007 \| Total shoulder replacement (procedure)\|

Related Journals

Subscribe to RSS

Share / Bookmark

Clinical Terminology Mapping Service Based on Information Retrieval

Authors

Abstract

Background

Objective

Methods

Results

Conclusion

Keywords

Introduction

Objective

Methods

Edge n-grams and Synonyms

Elasticsearch

Create Synonym Files

Developing the Edge n-gram Analyzer and Designing Indexes

Test and Evaluation

Ethical Considerations

Results

Discussion

Analysis of Non-mapped Terms

Sixteen terms needed to be reviewed

Terms including “AND,” “OR,” “NOS,” and “Unspecified” in mismatched terms

Twelve terms that needed addition of synonyms

Limitations

Conclusion

Clinical Health Implications

Conflict of Interest

References

Correspondence

Publication History

References