Hamostaseologie 2024; 44(06): 459-465
DOI: 10.1055/a-2407-7994
Review Article

Machine-Learning Applications in Thrombosis and Hemostasis

Henning Nilius
1   Department of Clinical Chemistry, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
2   Graduate School for Health Sciences, University of Bern, Bern, Switzerland
,
Michael Nagler
1   Department of Clinical Chemistry, Inselspital, Bern University Hospital, University of Bern, Bern, Switzerland
› Author Affiliations
Funding M.N. is supported by a research grant from the Swiss National Science Foundation.
 

Abstract

The use of machine-learning (ML) algorithms in medicine has sparked a heated discussion. It is considered one of the most disruptive general-purpose technologies in decades. It has already permeated many areas of our daily lives and produced applications that we can no longer do without, such as navigation apps or translation software. However, many people are still unsure if ML algorithms should be used in medicine in their current form. Doctors are doubtful to what extent they can trust the predictions of algorithms. Shortcomings in development and unclear regulatory oversight can lead to bias, inequality, applicability concerns, and nontransparent assessments. Past mistakes, however, have led to a better understanding of what is needed to develop effective models for clinical use. Physicians and clinical researchers must participate in all development phases and understand their pitfalls. In this review, we explain the basic concepts of ML, present examples in the field of thrombosis and hemostasis, discuss common pitfalls, and present a methodological framework that can be used to develop effective algorithms.


#

Introduction

Machine-learning algorithms are one of the most disruptive new technologies, but their use in medicine has been controversial.[1] They can handle multidimensional data, find patterns humans do not perceive, and model complex interactions.[2] This makes them ideal for many real-world applications. Already now, they are part of our everyday lives including navigation apps, content recommendation algorithms (e.g., YouTube or TikTok), and smartphone voice assistants. Especially for the future of medicine, they pose a significant chance as healthcare data steadily increases while the number of healthcare workers decreases.[3]

Still, many people are unsure if machine-learning algorithms can and should be used in their current form in clinical medicine.[4] [5] Concerns include questions about privacy, unclear regulatory oversight, biases against certain genders and races, and dangerous implementations.[6] [7] [8] [9] [10] A prime example is the EPIC sepsis prediction model rolled out during the pandemic without regulatory approval. It was supposed to alert doctors to sepsis risk but often gave false alarms, leading to worse health outcomes.[11] Major mistakes, such as including antibiotic therapy as a predictor for sepsis, happened because clinical experts were not involved.[12]

However, such experiences have led to a better understanding of what is needed to develop safe and effective machine-learning algorithms. It requires more than technical skills; life scientists and doctors are essential in defining clear clinical use cases, guiding development, and testing models in appropriately designed clinical studies.[13] This also means that the ball is now in our court, physicians, and clinical researchers alike. We need to understand how machine-learning algorithms work, their strengths and weaknesses, and how we can develop them. In this way, they can become useful tools that improve daily clinical practice and patient outcomes.

This review aims to introduce the fundamental principles of medical machine learning, outline potential use cases, and present common pitfalls in machine learning. We also discuss how to avoid these pitfalls using a methodological framework.


#

Fundamentals of Machine Learning

Machine-learning models are computer programs designed to perform tasks that would normally require human intelligence.[14] While they have not yet matched human experts in many situations, they offer key advantages: (1) they can handle multidimensional data from various sources, (2) they manage probabilities very well, and (3) they can find patterns in data that humans might miss by modeling complex interactions.[15] However, they lack other attributes of human intelligence, such as creativity and flexibility, which are also important when solving problems in medicine.[16] [Fig. 1] contrasts the strengths of artificial and human intelligence.

Zoom Image
Fig. 1 Illustration of the strengths of human and artificial intelligence [rerif].

Machine-learning algorithms have three main capabilities useful for medical applications: classification, pattern recognition, and optimization.[17] Different models are available, ranging from logistic regression to complex ones like deep neural networks.[18] In the following section, we will describe common machine-learning approaches and models typically used ([Fig. 2]).

Zoom Image
Fig. 2 Overview of key machine-learning capabilities. [rerif]

Diagnosis and prognosis are classification problems.[14] Patients need to be classified as having a disease or not or predicted to experience disease progression. Automated blood cell counting is another example of a classification problem.[19] Are the cells granulocytes or lymphocytes and which subtypes? A typical approach for classification problems is supervised learning, where the training data are labeled according to a reference standard, such as expert panel ratings, a reference laboratory test, or follow-up data.[20] This means that even a perfectly trained model will only perform as well as the reference test. Standard models used for supervised learning include logistic regression, random forests, and support vector machines.[20] As an example, our group recently developed a machine-learning–based decision support tool for the diagnosis of heparin-induced thrombocytopenia (HIT; https://toradi-hit.dbmr.unibe.ch/). [21] We demonstrated that our model can accurately predict HIT as defined by the reference standard, thus solving a salient diagnostic dilemma.[21]

Another main task for machine-learning models is pattern recognition. This can be used for subgroup identification.[22] An unsupervised learning approach is often employed, meaning the model works with unlabeled data where the true diagnosis is unknown.[23] The model aims to group patients based on shared characteristics, but these groups do not have to correlate with the outcome in question. In this situation, researchers and experts must assign meaning to the identified clusters.[24] Typical models used for unsupervised learning include K-nearest neighbor, hierarchical clustering, and Gaussian mixture models.[23] As an example, a group from Mainz used a hierarchical clustering algorithm to identify endotypes (based on clinical features at presentation) in patients with acute venous thromboembolism.[25] These endotypes were found to be associated with differences in recurrence and death rates.[25] However, validation is still pending.

A typical optimization problem is treatment monitoring, typically tackled with a technique called reinforcement learning.[26] These algorithms interact with their environment and receive rewards when they achieve specific goals. Available models include Q-learning and Asynchronous Advantage Actor-Critic.[26] For example, using retrospective data from the “Multiparameter Intelligent Monitoring in Intensive Care II” (MIMIC-II) dataset, Nemati et al proposed a reinforcement learning-based algorithm for monitoring unfractionated heparin treatment.[27] The model provided dosing recommendations for heparin and was rewarded during training when the activated partial thromboplastin time was within 60 to 100 seconds. While the model gave sensible dosing recommendations in the validation dataset, it has yet to be validated prospectively or in live patients.

Besides the main methods mentioned earlier, many other approaches and mixed methods are available. For example, semisupervised learning first uses a small set of labeled data to train the initial iteration of a classifier, then refines its predictions with unlabeled data.[28] The field is rapidly evolving, with many new algorithms developed every year.

Another important development was made possible by the steady increase in computing power: generative artificial intelligence (AI) models. The most important difference to the previous methods is that generative AI generates new, previously nonexistent content. Most of these models are based on foundational models trained on enormous datasets, such as social media posts, internet articles, or code repositories.[29] For example, large language models generate text based on user prompts by determining which next part of a sentence (encoded as a token) fits best.[30] The most well-known example is OpenAI's ChatGPT. The latest version, GPT-4, can handle multimodal prompt data, including files and images.[31] These models can then be further fine-tuned for specific applications.[32] While their performance is impressive, a significant risk with these large language models is “AI hallucinations,” where the models generate completely inaccurate answers due to issues like overfitting, extreme complexity, or biases in the training data.[33]


#

Use-Cases of Machine Learning in Medicine

Currently, most machine-learning models focus on two main goals: (1) improving processes through automation or simplification and (2) improving quality or utility. While process-enhancing models sometimes operate in a legal gray area, such as administrative software, quality-enhancing models are typically considered part of a medical device or “software as a medical device” and therefore require regulatory approval.[34]

Models that improve processes include large language models to generate discharge or handover notes automatically. For this purpose, Epic, a prominent U.S.-based electronic health record vendor, has announced plans to integrate GPT-4 into their systems.[35] However, these developments mainly focus on the U.S. and English markets, raising questions about how a similar model would perform in other medical cultures. Another proposed process improvement involves using AI-powered chatbots to answer or triage patients' medical questions. A preliminary study by Ayers et al compared physicians' answers to questions posted on a public medical forum with those given by GPT-3.5.[36] An expert panel evaluated the empathy and quality of the answers and decided which they preferred. The panel favored the chatbot's answers in 78.6% of the cases and rated the quality and empathy of the chatbot's responses higher.[36]

Machine-learning models that focus on quality improvement are mostly still in the premarket or research-use-only phase, and very few have made it to clinical practice.[37] As an example of a model that aims to improve quality, Nafee et al published a machine-learning model to improve venous thrombosis prediction in acutely ill patients using data from the phase 3 clinical trial for betrixaban.[38] Their ensemble model, which combines different architectures, outperformed the established IMPROVE score.[38] While their model was developed with high-quality clinical data, the description of the methods used is brief. Besides, the model is not yet validated, meaning the performance might differ in other populations.

As another example, Zaboras et al developed a classifier to predict bleeding in cancer patients on anticoagulation for cancer-associated thrombosis.[39] Their Extreme Gradient Boosting model outperformed the CAT-BLEED score, the only available score for this purpose, in predicting bleedings at 90, 365, and 365 + 90 days after VTE.[39] However, the authors noted that the model had limited sensitivity and would require refinement before clinical use. Additionally, methodological limitations such as reliance on registry data, limited calibration, and lack of external validation arose.

Besides these clinical use cases, machine-learning models can also improve or simplify research.[40] One example is the automated detection of certain diseases in electronic health records for retrospective studies. While diagnosis codes are often available, their accuracy varies, especially since it is not always clear if the diagnosis is current or historical. A recent meta-analysis by Lam et al pooled data from eight studies on natural language processing for venous thromboembolism detection.[41] The sensitivity and specificity of these models for detecting venous thromboembolism from free-text radiology or narrative reports in electronic health records were high. However, most of these studies were conducted in English-speaking countries.


#

The Implementation Gap

Despite their potential, few machine-learning models are currently used in clinical practice. In the European Union (EU) and the United States, machine-learning models must be registered as medical devices, except for those in certain legal grey areas as described earlier.[34] The EU's approval process is decentralized and handled by “Notified Bodies,” private institutions that perform conformity assessments.[42] Although the EU has had a centralized medical device register (EUDAMED) since 2011, it does not list machine-learning–enabled devices specifically.[43] Therefore, we will describe the landscape of machine-learning models for clinical use based on the American Food and Drug Administration's (FDA) list.

As of May 13, 2024, the FDA has approved 882 AI or machine-learning–enabled devices.[44] Most of these devices (671; 76.1%) are in radiology, which adopted machine learning early. In contrast, the hematology section has only 17 devices (1.9%). These are mostly computer vision–based peripheral blood smear analyzers for large medical labs. One example is the Scopio X100HT, which, according to the manufacturer, processes 40 slides per hour.[45] [46] Other blood count devices with different intended purposes include the Sight OLO, a point-of-care device using cassettes for differential blood counts, and the Athelas Home and One, patient self-sample devices that detect neutropenia in patients treated with Clozapine.[47] Both were validated by their respective manufacturers and found to perform similarly with established hematology analyzers. Additionally, 23andMe has registered a device estimating hereditary thrombophilia risk, focusing on mutations like factor V Leiden and prothrombin G20210A.[48]

Despite the large body of research in machine-learning models, very few have made it to clinical practice. The reasons for this are manifold. Often, models are not developed with a clear clinical question in mind but due to the availability of data. This leads to clinically useless models. Additionally, machine-learning experts and clinicians often do not work in the same teams. All these points toward the need for a concise methodological framework for the development and validation of machine-learning models. In the next section, we are going to outline such a framework.[49] [50] [51] [52]


#

A Methodological Framework to Avoid Common Pitfalls

Drawing on our experience with new biomarkers and other diagnostic tests, we proposed a step-by-step framework to ensure clear clinical use cases, validity, and efficiency.[49] [50] [51] [52] In the following paragraphs, we will discuss the most important development and implementation pitfalls and how to avoid them ([Fig. 3]).

Zoom Image
Fig. 3 Areas of potential pitfalls in the development of medical machine learning. [rerif]

Defining a Clinical Need and Research Question

Defining a clinical use case is the first step in developing any useful medical tool. Devices that do not meet a clear clinical need are not used, wasting scarce resources. These tools might even misinform physicians, potentially harming patients.[53] Therefore, the first phase of development should involve focus group discussions with relevant stakeholders, including patients and physicians.[54] The European Federation of Clinical Chemistry and Laboratory Medicine outlines four key questions to guide these discussions: (1) What clinical management problem needs solving? (2) Are there existing solutions? (3) What improvement or contribution will the new tool provide? (4) Is the new tool feasible in everyday clinical practice?[49] From the clinical need, a research question can be derived, clearly defining the study design, patient population, and desired outcomes.


#

Training Data Selection and Face Validity

One of the key advantages of machine-learning models is their ability to detect subtle data patterns, but this also makes them prone to overfitting, where they find patterns only present in the training data.[17] To avoid overfitting, selecting high-quality training data is crucial. The first consideration should be the appropriate patient population and study design.[55] Ideally, a model should be developed for the specific patient population it will serve. For diagnostic models, this means including patients suspected of having the disease, and for prognostic models, it should include patients at risk.[55] Prospective studies or randomized clinical trials are generally preferable to retrospective ones because they collect predictors before outcomes, inherently blinding the study. However, careful planning is necessary to avoid biases, such as selection or spectrum bias.[56] Retrospective studies, while less resource-intensive and able to generate larger datasets, are often biased.

To ensure a clinically useful model, special considerations are critical with regard to the selection of predictors (feature engineering and selection). While features with the highest predictive value should be selected from a strictly machine-learning perspective, additional factors are important in medicine.[57] Some features, like biopsies, may not be ethical or economically viable to collect from every patient. Therefore, it is important also to involve focus groups during the feature selection process and consider these factors. Additionally, since physicians are ultimately responsible for patient care, face validity and transparency are essential.[58] Face validity, a psychological concept, indicates whether a test appears to measure what it is supposed to. Tests with low face validity risk are not being used.[58] Surveys of U.S. physicians show that while most are open to using machine-learning models, they require an understanding of the model's inputs and how they arrive at their outputs.[59] This highlights the importance of also using interpretable machine-learning techniques to trace how a model makes its predictions.[60]


#

Implementation

The implementation of a model is often overlooked but crucial for clinical adoption. Even the most accurate model is useless if not used. Given the time constraints doctors face, the tool must integrate smoothly into the current workflow.[61] Ideally, it should be implemented within the electronic health record or laboratory information system. Another straightforward method is developing a web application using frameworks like Shiny for R or Flask for Python, which provide easy implementation for researchers.[62] [63]


#

External Validation

Validating the model in an external cohort is essential to confirm its diagnostic performance.[64] Internal validation, often the final development phase, involves testing the model against a hold-out or time-displaced set from the original training cohort. However, this provides only a rough performance estimate.[17] External validation is necessary to obtain an unbiased estimate and identify potential overfitting.[64] This involves conducting a similar study to the development study, applying the same considerations. Additionally, the impact of the diagnostic tool on patient outcomes can be measured through a randomized controlled trial, although this is rarely done due to high costs.


#

Regulatory Approval

Regulatory approval is the basis for the legal use and insurance reimbursement of a machine-learning model in clinical practice. In the European Union, machine-learning models are classified as “software as a medical device” and are governed by the Medical Device Regulation (MDR).[34] The MDR uses a risk-based approach, requiring all but the lowest-risk categories to undergo systematic clinical evaluation and post-market surveillance.[34] The regulation also emphasizes the importance of prospective real-world data. Obtaining approval is a long and costly process, feasible only in collaboration with an industry partner. Without regulatory approval, models can effectively be used only as research tests that should not impact patient care.


#
#

Conclusion

Machine-learning algorithms hold the potential to transform healthcare by enhancing care and optimizing processes amid rising costs and personnel shortages. Despite the promising advancements, an implementation gap remains due to various methodological and practical challenges. Ensuring high-quality training data, appropriate feature selection, and robust validation, including external cohort validation, are critical steps to mitigate overfitting and confirm performance. Effective integration into clinical workflows and obtaining regulatory approval, which involves systematic evaluation and post-market surveillance, are essential for clinical adoption. By adhering to a comprehensive methodological framework, these challenges can be addressed, enabling machine-learning models to realize their full potential in clinical practice.


#
#

Conflict of Interest

The authors declare that they have no conflict of interest.


Address for correspondence

Henning Nilius, MD
Inselspital, University Hospital of Bern
3010 Bern
Switzerland   

Publication History

Received: 15 August 2024

Accepted: 19 September 2024

Article published online:
05 November 2024

© 2024. Thieme. All rights reserved.

Georg Thieme Verlag KG
Stuttgart · New York


Zoom Image
Fig. 1 Illustration of the strengths of human and artificial intelligence [rerif].
Zoom Image
Fig. 2 Overview of key machine-learning capabilities. [rerif]
Zoom Image
Fig. 3 Areas of potential pitfalls in the development of medical machine learning. [rerif]