Enhancing Suicide Attempt Risk Prediction Models with Temporal Clinical Note Features

Kevin J. Krause; Sharon E. Davis; Zhijun Yin; Katherine M. Schafer; Samuel Trent Rosenbloom; Colin G. Walsh

doi:10.1055/a-2411-5796

Subscribe to RSS

Please copy the URL and add it into your RSS Feed Reader.

https://www.thieme-connect.de/rss/thieme/en/10.1055-s-00035026.xml

Download PDF

Appl Clin Inform 2024; 15(05): 1107-1120
DOI: 10.1055/a-2411-5796

Research Article

Enhancing Suicide Attempt Risk Prediction Models with Temporal Clinical Note Features

Authors

Kevin J. Krause

¹Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
Sharon E. Davis

¹Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
Zhijun Yin

¹Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
Katherine M. Schafer

¹Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
Samuel Trent Rosenbloom

¹Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
Colin G. Walsh

¹Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States

Funding This research has been supported by several funding bodies. The primary source of funding was the National Library of Medicine (NLM) T15 training grant (grant number: 2T15LM007450-20). Additional support came from the Evelyn Selby Stead Fund for Innovation, Vanderbilt University Medical Center, specifically grants R01 MH121455 and R01 MH116269. The Military Suicide Research Consortium also provided funding through grant W81XWH-10-2-0181. Finally, funding for the Research Derivative and BioVU Synthetic Derivative was provided by the National Center for Research Resources (grant number: UL1 RR024975/RR/NCRR). The funders had no role in study design, data collection and analysis, or manuscript preparation.

Further Information

Also available at

PDF Download Permissions and Reprints

Abstract

Objectives The objective of this study was to investigate the impact of enhancing a structured-data-based suicide attempt risk prediction model with temporal Concept Unique Identifiers (CUIs) derived from clinical notes. We aimed to examine how different temporal schemes, model types, and prediction ranges influenced the model's predictive performance. This research sought to improve our understanding of how the integration of temporal information and clinical variable transformation could enhance model predictions.

Methods We identified modeling targets using diagnostic codes for suicide attempts within 30, 90, or 365 days following a temporally grouped visit cluster. Structured data included medications, diagnoses, procedures, and demographics, whereas unstructured data consisted of terms extracted with regular expressions from clinical notes. We compared models trained only on structured data (controls) to hybrid models trained on both structured and unstructured data. We used two temporalization schemes for clinical notes: fixed 90-day windows and flexible epochs. We trained and assessed random forests and hybrid long short-term memory (LSTM) neural networks using area under the precision recall curve (AUPRC) and area under the receiver operating characteristic, with additional evaluation of sensitivity and positive predictive value at 95% specificity.

Results The training set included 2,364,183 visit clusters with 2,009 30-day suicide attempts, and the testing set contained 471,936 visit clusters with 480 suicide attempts. Models trained with temporal CUIs outperformed those trained with only structured data. The window-temporalized LSTM model achieved the highest AUPRC (0.056 ± 0.013) for the 30-day prediction range. Hybrid models generally showed better performance compared with controls across most metrics.

Conclusion This study demonstrated that incorporating electronic health record-derived clinical note features enhanced suicide attempt risk prediction models, particularly with window-temporalized LSTM models. Our results underscored the critical value of unstructured data in suicidality prediction, aligning with previous findings. Future research should focus on integrating more sophisticated methods to continue improving prediction accuracy, which will enhance the effectiveness of future intervention.

Keywords

suicide - machine learning - natural language processing - neural networks - random forest

Background and Significance

The Centers for Disease Control and Prevention reported that in 2021 approximately 48,000 people in the United States died of suicide.[1] Proven interventions such as psychiatric medication and removing access to firearms might help individuals at-risk for suicide.[2] Barriers to identifying at-risk individuals and delivering timely prevention efforts include limited access to mental health services, social stigma surrounding mental health, insufficient training for health care providers in recognizing suicide risk, and the fragmented nature of health records.[3] Informatics efforts have been underway to address the growing need for improved screening and treatment of at-risk individuals.[4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] Much of this work has explored the interplay of structured and unstructured electronic health record (EHR) data for clinical predictive machine learning tasks.[9] [13] [19] [20] [21] [22] [23] [24] [25]

In mental health assessments, information that isn't neatly organized in fields (like free-text notes from clinicians) plays a crucial role, especially when assessing suicide risk.[9] [11] [18] [26] Broadly, researchers have used this so-called “unstructured data” found in Reddit comments to detect suicidal ideation.[27] Likewise, more centrally to health care, the Veterans Health Administration demonstrated that sentiment analysis fueled by unstructured data in EHR clinic notes enhances suicidality prediction accuracy.[9] In previous work, we leveraged Concept Unique Identifier (CUI) counts from clinical notes to generate suicidality risk factor networks.[28] Techniques that analyze the natural language used in clinical notes (called NLP or Natural Language Processing) have been shown to improve the detection of suicidal thoughts in pregnant women, predict suicide risk after hospital discharge, and even perform better than mental health professionals in identifying suicide risk from notes.[11] [12] [18] This represents a clinically significant milestone in that the perinatal period represents a time of elevated risk for suicide among this vulnerable population.[18] The lack of unstructured data has also been noted as a serious problem that limits the utility in health care. Indeed, Boggs et al reported substantial gaps in follow-up assessments related to suicidal ideation due to the lack of structured EHR data on suicidality.[26] Together, these advancements underscore the potential of unstructured data in enhancing predictive models for suicide risk.

Researchers often avoid including the timing of events (temporality) in their models to keep the analysis simpler and to reduce the risk of the model becoming too tailored to the specific data (a problem known as overfitting).[29] However, recent research advocates for incorporating temporal elements in clinical models to enhance performance.[30] [31] [32] [33] [34] Indeed, the ideation-to-action framework suggests that only after interacting with acquired capability to inflict painful harm upon oneself (via lowered fear of death and increased pain tolerance) does suicidal ideation progress toward suicide attempts.[35] [36] [37] [38] [39] The order (i.e., relative temporal precedence) of these constructs is integral to the ideation-to-action framework. Integrating this theory into prediction models supports the temporalization of input data to reflect critical indicators along the ideation-to-action continuum.

Objective

The objective of this study was to investigate the impact of extending a validated structured-data-based suicide attempt risk prediction model with CUIs derived from clinical notes and to examine how temporal schemes, model types, and different prediction ranges affected predictive performance. Building upon previous research in suicide risk prediction, this research aimed to enhance our understanding of how clinical variable selection, transformation, and periodization could influence model outcomes.[9] [11] [12] [13] [18] [19] [40] [41] Specifically, we sought to leverage detailed clinical note data, enriched with temporal information, to develop a model that improves prediction of rare outcomes like suicide-related behaviors.[42] We compared the model with other baselines on area under the precision recall curve (AUPRC), sensitivity, positive predictive value (PPV), and risk stratification capability beyond prior models lacking these enriched features.

Methods

Study Setting

Vanderbilt University Medical Center (VUMC) operates a large regional health care network with over 1,000 beds across multiple facilities and manages over 1.5 million outpatient visits and 40,000 inpatient admissions annually. VUMC also includes a dedicated psychiatric facility to provide comprehensive mental health care services. The patient population includes urban and rural communities with diverse demographic and health care experiences, which is vital for research initiatives like suicide prevention. The Vanderbilt Suicide Attempt and Ideation Likelihood (VSAIL) model generates patient encounter suicide risk scores, using structured EHR data.[4] [5] It has been validated retrospectively, prospectively, and in the context of universal screening in high-acuity clinical settings.[4] [43] [44] A decision support module driven by VSAIL has been evaluated via randomized controlled trial and shown to increase face-to-face suicide risk assessment.[45]

Cohort, Clusters, and Outcomes

We analyzed adult outpatient and emergency department encounters at VUMC between January 1, 2010, and December 31, 2022. We grouped sequences of visits with gaps of 3 or fewer days into discrete clusters. Thus, patients could appear multiple times in the dataset if they visited VUMC care centers on separate occasions more than 3 days apart. We determined modeling targets (outcomes) by the presence or absence of at least one ICD diagnostic code for suicide attempt within 30, 90, or 365 days (prediction ranges) following a visit cluster. [Supplementary Material S1] (available in the online version) details the 1,526 ICD codes used to ascertain suicide-attempt—included within these codes are only diagnoses which explicitly indicate suicidal intent or intent to inflict lethal self-harm/injury. In 2015, health systems converted billing diagnostic codes from 9^th revision of International Classification of Diseases (ICD-9) to 10^th revision of International Classification of Diseases (ICD-10). Our previous work has shown ICD-10 diagnostic codes to have higher PPV (0.85)[6] for suicide attempt ascertainment than ICD-9 codes (PPV 0.58).[5] Early experimentation supported the inclusion of ICD-9 era data into the training set, while ensuring that only ICD-10 data were used for evaluation. We excluded visits occurring within 3 days of a suicide attempt to avoid inadvertently including predictions for visits initiated by a suicide event, i.e., visits in which prediction would not be necessary.

Features and Measurements

We collected structured and unstructured data from a 5-year window preceding each episode. Structured data, based on the original VSAIL model, included medications (n = 693), diagnoses (n = 83), procedures (n = 46), and demographics (n = 3). We imputed missing values with constant zeros and missing labels. Features were scaled to a maximum absolute value of 1 without shifting or centering. Structured features were not temporally aggregated, as preliminary analysis showed low temporal feature variance, a finding also supported by prior work by Shortreed et al.[46]

From unstructured clinical notes, we extracted note-level count aggregations of medical concepts using 13,997 CUIs from the Unified Medical Language System.[47] CUI extraction was performed with the VUMC Wordcloud Indexer, a negation-aware regular-expressions-based NLP tool.[48] We processed CUIs using two temporalization schemes: 21 fixed, 90-day windows (window scheme) and 21 flexible epochs grouping note activity with gaps of 30 or fewer days (epoch scheme). CUI counts were normalized using term frequency-inverse document frequency (TF-IDF)[49] transformation to emphasize terms that occur less frequently and reduced with latent semantic analysis (LSA)[50] into 100 components to reduce the high-dimensional data into fewer components of greater information density. The LSA transformer was first fitted on the 5-year aggregate of CUI counts before transforming individual windows and epochs, allowing for consistent comparison across temporal schemes.

Experimental Overview

We trained random forests and hybrid long short-term memory (LSTM) neural networks to test the impact of input temporality on suicide attempt risk prediction. Models trained on nontemporal structured features were compared with models trained on nontemporal structured features plus temporally aggregated CUIs [Supplementary Material S2] (available in the online version). We performed experiments with three modeling groups: random forest structured only (control), random forest structured plus temporal CUIs, and LSTM structured plus temporal CUIs. We used two CUI temporalization schemes: windows and epochs, and three prediction ranges: 30, 90, and 365 days. Thus, our experimental variations included random-forest-control (VSAIL), random-forest-epoch, random-forest-window, LSTM-epoch, and LSTM-window models across each prediction range.

We divided encounters using a mixed temporal split design. The training set consisted of the earliest 80% of visit clusters (January 1, 2010–September 2, 2021), whereas the latest records were reserved for testing, ensuring that only ICD-10 era outcomes were included in the test set for evaluation with more recent data. We randomly assigned the remaining 20% of visit clusters to either development or testing in a 1:4 ratio, resulting in an 80/4/16 training/development/testing split ([Fig. 1]).

Fig. 1 (A) depicts the training, development, and testing set split methodology. The training set is composed of the earliest 80% of visit clusters. The latest 20% is randomly split into development and testing sets, in a 1:4 ratio, yielding an 80/4/16 train/dev/test split. Diagnostic codes used to ascertain suicide attempts switch from ICD-9 to ICD-10 in October 2015. (B) depicts the window (W) and epoch (E) input data temporalization schemes. Windows split input data into fixed, 90-day periods. Epochs split input data into flexible periods defined by gaps of 30 days or greater in EHR activity. Both schemes use 20 total periods. EHR, electronic health record; ICD-9/-10, International Classification of Diseases, 9^th revision/10^th revision.

Model Implementation

We created random forest models using scikit-learn 1.2.0 for preprocessing and classification, with a custom pipeline for efficient hyperparameter selection.[51] These models were developed and evaluated in Python 3.8.0, managed with the built-in virtual environment venv. We created hybrid LSTM models using PyTorch 2.2.2 (CPU) with a custom neural network module to handle a mixture of temporal (CUI counts) and nontemporal (VSAIL) features.[52] The model has two input layers: an LSTM input layer for the temporal CUI count features and a dense linear input layer for the nontemporal VSAIL features, which are combined to feed into a single output prediction layer. [Fig. 2] depicts the hybrid LSTM model architecture in detail. [Supplementary Material S3] (available in the online version) provides additional modeling details.

Fig. 2 This diagram depicts the hybrid LSTM model architecture used to process a mixture of temporal and nontemporal features. Temporal features feed into an LSTM layer, and nontemporal features feed into a dense layer. A fully connected hidden layer connects both the LSTM layer and dense layer to an output layer with a single node. The diagram shows the LSTM, dense, and hidden layers with size two each—but we explore multiple configurations of layer sizes. The diagram also provides a simplified depiction of the inner workings of an LSTM cell. LSTM, Long Short-Term Memory.

Model Training

We used 10-fold stratified, grouped cross-validation for hyperparameter selection. Outcome stratification ensured each fold had nearly equal numbers of suicide attempts, while grouping prevented visit clusters from the same individual from being placed in different folds, reducing overfitting risk. We selected hyperparameters via a two-stage grid search. Initial ranges were geometrically distributed by a ratio of 2 (e.g., 4, 8, 16, 32). A second search used a linear distribution of values centered around the best-performing initial hyperparameters. We trained hybrid LSTM models with early stopping, continuing until cross-validation performance declined. We determined the best hyperparameters using mean cross-validation AUPRC. The best model was calibrated to the prevalence of suicide attempts in the development set using Platt's method.[53]

Evaluation

We assessed general performance on the final test set using AUPRC and area under the receiver operating characteristic (AUROC). We measured sensitivity and PPV at 95% specificity to characterize each model's potential cost-effectiveness as a screening tool, as described by Ross et al.[54] We used bootstrapping with 1,000 iterations to generate 95% confidence intervals for all metrics and applied one-sided Wilcoxon rank-sum tests to compare scores. We evaluated risk stratification by counting true positives within probability deciles, providing a visit-centered view of model improvements in identifying future suicide attempts. We measured model calibration [Supplementary Material S4] (available in the online version) on the development set before and after adjustment using Spiegelhalter's z-score.[55]

To quantify clinical note feature importances, we used random-forest-derived mean decrease in impurity (MDI). Given the transformation of temporal CUI counts with LSA, direct comparison of each temporalized CUI feature's importance was challenging. Therefore, we created an additional random forest model trained entirely on nontemporal CUI counts (13,997 features) that were only TF-IDF transformed. We averaged the MDI from 100 bootstrapped variations of this model to calculate approximate feature importances and 95% confidence intervals for our CUIs, repeated for each prediction window.[56]

Results

The training set contained 2,364,183 visit clusters and 2,009 30-day attempts. The development set contained 118,534 visit clusters and 117 30-day attempts. The testing set contained 471,936 visit clusters and 480 30-day attempts. The overall prevalence of 30-day attempts was approximately 1 in 1,000 (0.1%). [Table 1] describes the demographics of the entire study cohort. [Fig. 3] shows the distribution of health care utilization by demographics. The distribution of visit cluster counts per patient was similar among individuals without recorded suicide attempts across racial, ethnic, and gender groups, with Hispanic, Middle Eastern/North African, and Black individuals trending slightly higher than the rest. The distributions were higher among those with recorded suicide attempts, across all demographic groups. The highest distributions were among Hispanic, Middle Eastern/North African, American Indian/Native Alaskan, and Black individuals with recorded suicide attempts.

Table 1
Study cohort demographics (N = 1,281,001)
Demographics	Attempts				Total	Util.
Race	Any	30 d	90 d	365 d	Total	Util.
White	9,283 (1.01%)	1,528 (0.17%)	2,200 (0.24%)	3,241 (0.35%)	920,080	2.71
Black	2,087 (1.11%)	332 (0.18%)	500 (0.27%)	766 (0.41%)	88,130	3.39
Other	456 (0.31%)	131 (0.09%)	167 (0.11%)	198 (0.13%)	48,046	2.76
Asian	130 (0.53%)	26 (0.11%)	39 (0.16%)	62 (0.25%)	24,324	2.70
Hispanic	130 (0.69%)	27 (0.14%)	44 (0.23%)	64 (0.34%)	18,961	3.70
Pacific Islander	13 (0.28%)	3 (0.06%)	5 (0.11%)	7 (0.15%)	4,664	2.41
American Indian/Alaskan	50 (1.29%)	8 (0.21%)	16 (0.41%)	21 (0.54%)	3,868	3.28
Middle Eastern/North African	15 (0.57%)	4 (0.15%)	7 (0.27%)	11 (0.42%)	2,628	3.98
Ethnicity
Non-Hispanic	8,699 (1.12%)	1,360 (0.17%)	1,950 (0.25%)	2,908 (0.37%)	77,9243	2.64
Hispanic/Latino	622 (0.64%)	132 (0.14%)	189 (0.20%)	272 (0.28%)	96,885	2.73
Unknown	306 (0.34%)	52 (0.06%)	68 (0.08%)	84 (0.09%)	89,488	1.32
Legal gender
Female	6,862 (1.03%)	1,092 (0.16%)	1,618 (0.24%)	2,448 (0.37%)	665,108	2.73
Male	5,036 (0.82%)	903 (0.15%)	1,265 (0.21%)	1,788 (0.29%)	612,749	2.64
Unknown/other	1 (0.04%)	0 (0.00%)	0 (0.00%)	0 (0.00%)	2,750	1

Notes: This table summarizes the overall cohort demographics of the study, including suicide rates within each demographic. Suicide rates are divided into four groups, indicating either (1) any suicide attempt or (2–4) only suicide attempts within a fixed number of days after a visit. The total column indicates the total number of patients within each demographic. The utilization column (Util.) indicates the mean fraction of health care visit clusters per patient within each demographic.

Fig. 3 This boxplot compares the distribution of health care visit clusters per patient, divided by demographic group.

The distribution of bootstrapped AUPRC, AUROC, sensitivity at 95% specificity, and PPV at 95% specificity evaluation metrics are shown in [Figs. 4] [5] [6] [7], respectively. [Table 2] provides the exact means and 95% confidence intervals for each evaluation metric. As the prediction range increased (i.e., from 30 to 90 to 365 days), AUPRC and PPV increased, whereas AUROC and sensitivity decreased. Window temporalization schemes outperformed epochs across all four metrics, except in the case of LSTM with a 30-day prediction window, where the epoch scheme resulted in higher AUPRC. LSTMs performed better than random forests only in terms of AUPRC. Hybrid models (epochs and windows) outperformed controls in every metric except for PPV. In terms of our primary ranking metric (AUPRC) and primary use case (30-day prediction range), the highest performing model was the window-temporalized LSTM model (0.056 ± 0.016), followed by LSTM-epoch (0.041 ± 0.010), random-forest-window (0.036 ± 0.008), random-forest-epoch (0.028 ± 0.006), and control (0.015 ± 0.003). These rankings were confirmed by the Wilcoxon rank-sum tests (p < 0.001).

Table 2
Model evaluation summary
365 d	Metric (mean ± 95% CI)
Random Forest	AUPRC	AUROC	Sn. @ 95% Sp.	PPV @ 95% Sp.
Control	0.0152 ± 0.0033	0.9243 ± 0.0079	0.4557 ± 0.0386	0.0175 ± 0.0024
Epoch	0.0275 ± 0.0062	0.9455 ± 0.0087	0.6728 ± 0.0390	0.0136 ± 0.0015
Window	0.0363 ± 0.0079	0.9482 ± 0.0099	0.7481 ± 0.0437	0.0158 ± 0.0019
LSTM
Epoch	0.0407 ± 0.0097	0.9148 ± 0.0131	0.6332 ± 0.0439	0.0131 ± 0.0015
Window	0.0563 ± 0.0157	0.9256 ± 0.0127	0.6764 ± 0.0402	0.0140 ± 0.0015
90 d
Random Forest
Control	0.0316 ± 0.0062	0.9114 ± 0.0075	0.4497 ± 0.0283	0.0300 ± 0.0029
Epoch	0.0510 ± 0.0091	0.9251 ± 0.0079	0.6090 ± 0.0315	0.0231 ± 0.0021
Window	0.0697 ± 0.0129	0.9320 ± 0.0080	0.6699 ± 0.0294	0.0261 ± 0.0020
LSTM
Epoch	0.0624 ± 0.0104	0.9024 ± 0.0111	0.6505 ± 0.0327	0.0246 ± 0.0019
Window	0.0818 ± 0.0133	0.9012 ± 0.0113	0.6567 ± 0.0303	0.0256 ± 0.0020
365 d
Random Forest
Control	0.0610 ± 0.0072	0.8944 ± 0.0065	0.4809 ± 0.0195	0.0566 ± 0.0036
Epoch	0.0987 ± 0.0116	0.9021 ± 0.0066	0.6132 ± 0.0215	0.0490 ± 0.0027
Window	0.1153 ± 0.0128	0.9189 ± 0.0063	0.6681 ± 0.0213	0.0526 ± 0.0026
LSTM
Epoch	0.1288 ± 0.0143	0.8875 ± 0.0078	0.6177 ± 0.0216	0.0489 ± 0.0026
Window	0.1391 ± 0.0142	0.8968 ± 0.0078	0.6268 ± 0.0227	0.0505 ± 0.0027

Abbreviations: AUPRC, area under the precision recall curve; AUROC, area under the receiver operating characteristic; LSTM, long short-term memory.

Notes: This table summarizes the metric averages and 95% confidence intervals (CIs) for each model variation and prediction range across 1,000 bootstrapped evaluation iterations. CIs are calculated using the average of the 5^th and 95^th percentile score differences from the mean. Sensitivity (Sn.) and positive predictive value (PPV) are reported at 95% specificity (Sp.).

Fig. 4 This boxplot compares AUPRC performance across 1,000 bootstrap iterations, separated by model-type, temporalization scheme, and prediction range. AUPRC, area under the precision recall curve.

Fig. 5 This boxplot compares AUROC performance across 1,000 bootstrap iterations, separated by model-type, temporalization scheme, and prediction range. AUROC, area under the receiver operating characteristic.

Fig. 6 This boxplot compares sensitivity at 95% specificity performance across 1,000 bootstrap iterations, separated by model-type, temporalization scheme, and prediction range.

Fig. 7 This boxplot compares PPV at 95% specificity performance across 1,000 bootstrap iterations, separated by model-type, temporalization scheme, and prediction range. PPV, positive predictive value.

[Fig. 8] depicts the stratification of suicide attempts within prediction deciles, organized by model type, temporalization scheme, and prediction range. Increasing prediction ranges increased the number of suicide attempts but decreased the fraction of suicide attempts captured in the 10^th prediction decile of each model. In the 30-day prediction range 10^th-decile stratification, the rankings were: random-forest-window (90.6%), random-forest-epoch (87.1%), LSTM-window (80.0%), LSTM-epoch (75.8%), and control (72.7%). The gap in stratification performance between LSTM-window and random-forest-window increased in the 90-day prediction range (74.3 vs. 80.0%) and the 365-day prediction range (71.0 vs. 77.8%).

Fig. 8 This bar chart depicts the stratification of suicide attempts within prediction deciles, organized by model type, temporalization scheme, and prediction range. The blue bars at the base of the plot correspond to each model's highest prediction decile. The number of attempts captured by the highest prediction decile is given for each model; further, the total number of attempts for each prediction range is given beside the y-axis.

[Fig. 9] shows the top 20 feature importances by MDI across 100 bootstrap iterations for the 30-day prediction range. The top five most important features were suicide attempt, feeling hopeless, self-injurious behavior, active suicidal ideation, and impaired judgment. [Fig. 10] compares the relative scaled importances by MDI of the top 10 features for all three prediction ranges. The 365- and 90-day prediction ranges showed a tighter cluster of feature importances, whereas the 30-day prediction range showed greater variance in feature importances.

Fig. 9 This point plot shows the top 20 most important features by MDI, averaged across 100 bootstrap iterations, with 95% confidence intervals, for the 30-day prediction range. CUI, Concept Unique Identifier; MDI, mean decrease in impurity.

Fig. 10 This radio plot compares the scaled importances of the top-10 features from each prediction range (30, 90, and 365 days). Each feature is scaled on between 0 and 1, indicating relative importance of a single feature across prediction ranges.

Discussion

In this study, we demonstrated how EHR-derived clinical note features improve a deployed suicide attempt risk prediction model.[45] We also showed that both the clinical note data temporalization scheme and model type significantly impact model performance across various testing dimensions and prediction ranges. The importance of temporality echoes findings discussed by the ideation-to-action framework, wherein proximal risk factors influence the ability of suicidal ideation to progress toward or away from suicide attempts. That is, the order in which people experience the intensification or easing of risk factors for suicide ideation or attempts influence the presence or absence of suicide attempts. In clinical practice, this may appear as ambivalence regarding the desire for death or the intent to enact a suicide attempt. Although the model employed in the present project was not built to depend on or replicate theories of suicide, the fact that the model benefits from temporality offers support for theorists who in their own right are seeking to understand the causes of suicidal behavior.[38] [39]

Within the clinically preferred 30-day prediction range, window-temporalized LSTM models achieved the highest test AUPRC, whereas random forests achieved the highest test sensitivity and PPV at 95% specificity. AUPRC is more reliable than setpoint metrics for evaluation until the clinical burden of this model can be studied.[54] Models that included features from free-text clinical notes (such as specific medical terms related to suicidality and mental health) outperformed those trained purely on structured data, and the most impactful clinical terms were related to suicidality, mental health, depression, social stress, and drug use, complementing structured features.[57]

This work builds on the efforts of others by comparing different approaches to incorporating temporal features in suicide attempt risk prediction models and highlighting the effectiveness of using unstructured data from clinical notes. In comparing our methods and results to those of Shortreed et al, for example, several key differences and outcomes emerge.[46] While both studies explored the use of added temporal features for suicide attempt risk prediction, our study demonstrated improved performance with features derived from clinical notes, including TF-IDF and LSA-transformed clinical concepts. In contrast, Shortreed et al did not observe significant performance improvements with added temporal predictors engineered from clinical data. Unstructured free text might better capture temporal data relevant to suicide attempt risk prediction than structured data. Replication studies of temporal features for both structured and unstructured data are indicated.

Our findings align with the prior work of Tsui et al, who also found that incorporating unstructured data significantly enhanced their prediction model's accuracy compared with using only structured data.[13] Currently, we use a bag-of-words approach for suicide attempt risk prediction based on medical concept counts from clinical notes. Although not theory-driven, this method offers easy accessibility, fast implementation, and scalability. In contrast, Meerwijk et al advocated for a theory-driven approach using the three-step Theory of Suicide (3ST), which, while potentially more accurate, required extensive setup and manual annotation.[14] [36] Overall, our bag-of-words model remains a feasible and effective method until more refined strategies become practical.

In future work, we aim to enhance our risk prediction model using advanced NLP techniques, including vector embeddings like word2vec and cui2vec, which capture term similarities within the corpus.[21] [22] [58] Coppersmith et al and Ji highlighted the effectiveness of pretrained embeddings and large language models such as BERT, RoBERTa, MentalBERT, and MentalRoBERTa in fine-tuning suicide text classifiers.[10] [15] [58] We also suspect it may be prudent to retrain with additional NLP-derived features like lexical, syntactic, and sentimental elements, shown to improve outcomes in suicide note classification.[12] Levis et al suggested sentiment analysis of psychotherapy notes can improve prediction models, whereas Ji noted varying effectiveness depending on the data source.[9] [15] [16] [59] [60] [61] Given the reliance on clinician-entered ICD codes in the present study, which may under-identify suicide attempts, especially those presented at an outside facility,[7] future work may adopt a weakly supervised NLP approach as used by Bejan et al.[6] Importantly, our current study improved performance using a simple, understandable, fast, and more transportable method, underscoring the critical value of unstructured data in suicidality prediction. This highlights that even straightforward approaches can make significant contributions, paving the way for further enhancements with more advanced techniques.

In summary, the field of predictive modeling is steadily incorporating unstructured data and NLP methods to improve screening efforts. Our work, and the works of others discussed in this paper, support the integration of these complex data sources into risk prediction models. There are several interesting paths for future research, such as the use of vector embeddings, better ground-truth ascertainment methods, improved feature extraction techniques, additional sentiment analysis, and the use of theory-based approaches to inform model design. Differences in study outcomes could also stem from variations in datasets, model implementations, and population characteristics, which should be considered in future comparisons. These potential improvements show promise for better predicting suicide risk, which could lead to more effective interventions and fewer deaths from suicide. The challenge moving forward is finding ways to continually improve, fine-tune, and validate these models.

Clinical Relevance Statement

This study highlights the significant improvement in suicide attempt risk prediction models when incorporating temporal CUIs derived from clinical notes. By enhancing structured data with temporally organized unstructured data, particularly through window-temporalized LSTM models, predictive performance notably increased. These findings underscore the importance of utilizing both structured and unstructured EHR data in clinical risk assessments. Improved prediction models can lead to more accurate identification of high-risk individuals, potentially allowing for timely and targeted interventions. Future advancements in integrating sophisticated methods with clinical data hold promise for further enhancing predictive accuracy and ultimately improving patient outcomes.

Multiple-Choice Questions

Which of the following approaches to ascertaining suicide attempts in health records should have the highest PPV?
- Analyzing ICD-9 codes
- Analyzing ICD-10 codes
- Employing weakly supervised NLP
- Manual chart review by expert clinicians
Correct Answer: The correct answer is option d. ICD-10 codes have higher PPV for ascertaining suicide attempt than ICD-9, and novel weakly supervised NLP approaches show promise for further improving PPV. However, the gold standard to which all approaches are currently compared to is clinical chart review.
Which of the following evaluation metrics is prioritized to address the rarity of outcomes in this study?
- AUROC
- Accuracy
- AUPRC
- PPV
Correct Answer: The correct answer is option c. AUROC, accuracy, and PPV can all be artificially inflated with rare outcomes.
Which of the following is a foundational framework for studying suicidality?
- Ideation to Action (I2A)
- Spontaneous Action (SA)
- Hopelessness and Loneliness (HaL)
Correct Answer: The correct answer is option a. The leading theories on suicidality are collectively referred to as ideation to action frameworks.

Conflict of Interest

None declared.

Acknowledgments

The authors would like to express their sincere gratitude to Dario A. Giuse, the Director of Vanderbilt Health IT, for generously providing access to the Vanderbilt Wordcloud Indexer. This tool was invaluable in facilitating the research and contributing to the conclusions drawn in this work. The author also acknowledges the support from their team, peers, and the wider research community, whose collaboration and inputs have enriched this study.

Protection of Human and Animal Subjects

No human subjects were involved in this project.

Note

The study was performed in compliance with the World Medical Association Declaration of Helsinki on Ethical Principles for Medical Research Involving Human Subjects and was reviewed by the VUMC Institutional Review Board.

Supplementary Material

Supplementary Material (PDF) (opens in new window)

References
1 Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics System,. Mortality 2018–2021 on CDC WONDER Online Database; 2021

Download RIS citation
2 Zalsman G, Hawton K, Wasserman D. et al. Suicide prevention strategies revisited: 10-year systematic review. Lancet Psychiatry 2016; 3 (07) 646-659

Crossref PubMed Search in Google Scholar
Download RIS citation
3 Mann JJ, Apter A, Bertolote J. et al. Suicide prevention strategies: a systematic review. JAMA 2005; 294 (16) 2064-2074

Crossref PubMed Search in Google Scholar
Download RIS citation
4 Walsh CG, Johnson KB, Ripperger M. et al. Prospective validation of an electronic health record-based, real-time suicide risk model. JAMA Netw Open 2021; 4 (03) e211428

Crossref PubMed Search in Google Scholar
Download RIS citation
5 Walsh CG, Ribeiro JD, Franklin JC. Predicting risk of suicide attempts over time through machine learning. Clin Psychol Sci 2017; 5 (03) 457-469

Crossref Search in Google Scholar
Download RIS citation
6 Bejan CA, Ripperger M, Wilimitis D. et al. Improving ascertainment of suicidal ideation and suicide attempt with natural language processing. Sci Rep 2022; 12 (01) 15146

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Young J, Bishop S, Humphrey C, Pavlacic JM. A review of natural language processing in the identification of suicidal behavior. J Affect Disord Rep 2023; 12: 100507

Crossref Search in Google Scholar
Download RIS citation
8 Cohen J, Wright-Berryman J, Rohlfs L, Trocinski D, Daniel L, Klatt TW. Integration and validation of a natural language processing machine learning suicide risk prediction model based on open-ended interview language in the emergency department. Front Digit Health 2022; 4: 818705

Crossref PubMed Search in Google Scholar
Download RIS citation
9 Levis M, Leonard Westgate C, Gui J, Watts BV, Shiner B. Natural language processing of clinical mental health notes may add predictive value to existing suicide risk models. Psychol Med 2021; 51 (08) 1382-1391

Crossref PubMed Search in Google Scholar
Download RIS citation
10 Coppersmith G, Leary R, Crutchley P, Fine A. Natural language processing of social media as screening for suicide risk. Biomed Inform Insights 2018; 10: 11 78222618792860

Crossref PubMed Search in Google Scholar
Download RIS citation
11 McCoy Jr TH, Castro VM, Roberson AM, Snapper LA, Perlis RH. Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry 2016; 73 (10) 1064-1071

Crossref PubMed Search in Google Scholar
Download RIS citation
12 Pestian J, Nasrallah H, Matykiewicz P, Bennett A, Leenaars A. Suicide note classification using natural language processing: a content analysis. Biomed Inform Insights 2010; 2010 (03) 19-28

PubMed Search in Google Scholar
Download RIS citation
13 Tsui FR, Shi L, Ruiz V. et al. Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts. JAMIA Open 2021; 4 (01) ooab011

Crossref PubMed Search in Google Scholar
Download RIS citation
14 Meerwijk EL, Tamang SR, Finlay AK, Ilgen MA, Reeves RM, Harris AHS. Suicide theory-guided natural language processing of clinical progress notes to improve prediction of veteran suicide risk: protocol for a mixed-method study. BMJ Open 2022; 12 (08) e065088

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Ji S. Towards intention understanding in suicidal risk assessment with natural language processing. In: Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics; 2022 :4028–4038. Accessed September 15, 2024 at: https://aclanthology.org/2022.findings-emnlp.297

Search in Google Scholar
Download RIS citation
16 Ji S, Yu CP, Fung S fu, Pan S, Long G. Supervised learning for suicidal ideation detection in online user content. Complexity 2018; 2018: 1-10

Search in Google Scholar
Download RIS citation
17 Arowosegbe A, Oyelade T. Application of natural language processing (NLP) in detecting and preventing suicide ideation: a systematic review. Int J Environ Res Public Health 2023; 20 (02) 1514

Crossref PubMed Search in Google Scholar
Download RIS citation
18 Zhong QY, Mittal LP, Nathan MD. et al. Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem. Eur J Epidemiol 2019; 34 (02) 153-162

Crossref PubMed Search in Google Scholar
Download RIS citation
19 Zhang D, Yin C, Zeng J, Yuan X, Zhang P. Combining structured and unstructured data for predictive models: a deep learning approach. BMC Med Inform Decis Mak 2020; 20 (01) 280

Crossref PubMed Search in Google Scholar
Download RIS citation
20 Thompson K. Programming techniques: regular expression search algorithm. Commun ACM 1968; 11 (06) 419-422

Crossref Search in Google Scholar
Download RIS citation
21 Beam AL, Kompa B, Schmaltz A. et al. Clinical concept embeddings learned from massive sources of multimodal medical data. Pac Symp Biocomput 2020; 25: 295-306

PubMed Search in Google Scholar
Download RIS citation
22 Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. ArXiv13013781. Accessed February 13, 2022 at: http://arxiv.org/abs/1301.3781

Download RIS citation
23 Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res 2003; 3 (Jan): 993-1022

Search in Google Scholar
Download RIS citation
24 Dey L, Haque SKM. Opinion mining from noisy text data. In: Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data - AND '08. ACM Press; 2008: 83-90

Search in Google Scholar
Download RIS citation
25 Turney PD. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. arXiv:cs/0212032. Accessed February 13, 2022 at: http://arxiv.org/abs/cs/0212032

Download RIS citation
26 Boggs JM, Quintana LM, Powers JD, Hochberg S, Beck A. Frequency of clinicians' assessments for access to lethal means in persons at risk for suicide. Arch Suicide Res 2022; 26 (01) 127-136

Crossref PubMed Search in Google Scholar
Download RIS citation
27 Yeskuatov E, Chua SL, Foo LK. Leveraging reddit for suicidal ideation detection: a review of machine learning and natural language processing techniques. Int J Environ Res Public Health 2022; 19 (16) 10347

Crossref PubMed Search in Google Scholar
Download RIS citation
28 Krause KJ, Shelley J, Becker A, Walsh C. Exploring risk factors in suicidal ideation and attempt concept cooccurrence networks. AMIA Annu Symp Proc 2023; 2022: 644-652

PubMed Search in Google Scholar
Download RIS citation
29 Montesinos López OA, Montesinos López A, Crossa J. Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction. Springer International Publishing;; 2022: 109-139

Search in Google Scholar
Download RIS citation
30 Zhao J, Henriksson A. Learning temporal weights of clinical events using variable importance. BMC Med Inform Decis Mak 2016; 16 (Suppl. 02) 71

Crossref PubMed Search in Google Scholar
Download RIS citation
31 Zhao J, Henriksson A, Kvist M, Asker L, Boström H. Handling temporality of clinical events for drug safety surveillance. AMIA Annu Symp Proc 2015; 2015: 1371-1380

PubMed Search in Google Scholar
Download RIS citation
32 Singh A, Nadkarni G, Gottesman O, Ellis SB, Bottinger EP, Guttag JV. Incorporating temporal EHR data in predictive models for risk stratification of renal function deterioration. J Biomed Inform 2015; 53: 220-228

Crossref PubMed Search in Google Scholar
Download RIS citation
33 Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J, Doctor AI. Doctor AI: predicting clinical events via recurrent neural networks. JMLR Workshop Conf Proc 2016; 56: 301-318

PubMed Search in Google Scholar
Download RIS citation
34 Che Z, Purushotham S, Cho K, Sontag D, Liu Y. Recurrent neural networks for multivariate time series with missing values. Sci Rep 2018; 8 (01) 6085

Crossref PubMed Search in Google Scholar
Download RIS citation
35 Joiner TE. Why People Die by Suicide. Harvard University Press;; 2005

Search in Google Scholar
Download RIS citation
36 Klonsky ED, May AM. The three-step theory (3ST): a new theory of suicide rooted in the “ideation-to-action” framework. Int J Cogn Ther 2015; 8 (02) 114-129

Crossref Search in Google Scholar
Download RIS citation
37 Klonsky ED, May AM, Saffer BY. Suicide, suicide attempts, and suicidal ideation. Annu Rev Clin Psychol 2016; 12 (01) 307-330

Crossref PubMed Search in Google Scholar
Download RIS citation
38 Klonsky ED, Saffer BY, Bryan CJ. Ideation-to-action theories of suicide: a conceptual and empirical update. Curr Opin Psychol 2018; 22: 38-43

Crossref PubMed Search in Google Scholar
Download RIS citation
39 Van Orden KA, Witte TK, Cukrowicz KC, Braithwaite SR, Selby EA, Joiner Jr TE. The interpersonal theory of suicide. Psychol Rev 2010; 117 (02) 575-600

Crossref PubMed Search in Google Scholar
Download RIS citation
40 Schafer KM, Kennedy G, Gallyer A, Resnik P. A direct comparison of theory-driven and machine learning prediction of suicide: a meta-analysis. PLoS One 2021; 16 (04) e0249833

Crossref PubMed Search in Google Scholar
Download RIS citation
41 Walker RL, Shortreed SM, Ziebell RA. et al. Evaluation of electronic health record-based suicide risk prediction models on contemporary data. Appl Clin Inform 2021; 12 (04) 778-787

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
42 Carter G, Milner A, McGill K, Pirkis J, Kapur N, Spittal MJ. Predicting suicidal behaviours using clinical instruments: systematic review and meta-analysis of positive predictive values for risk scales. Br J Psychiatry 2017; 210 (06) 387-395

Crossref PubMed Search in Google Scholar
Download RIS citation
43 Wilimitis D, Turer RW, Ripperger M. et al. Integration of face-to-face screening with real-time machine learning to predict risk of suicide among adults. JAMA Netw Open 2022; 5 (05) e2212095

Crossref PubMed Search in Google Scholar
Download RIS citation
44 McKernan LC, Lenert MC, Crofford LJ, Walsh CG. Outpatient engagement and predicted risk of suicide attempts in fibromyalgia. Arthritis Care Res (Hoboken) 2019; 71 (09) 1255-1263

Crossref PubMed Search in Google Scholar
Download RIS citation
45 Walsh CG, Ripperger MA, Novak L. et al. Randomized controlled comparative effectiveness trial of risk model-guided clinical decision support for suicide screening. medRxiv 2024

Crossref Search in Google Scholar
Download RIS citation
46 Shortreed SM, Walker RL, Johnson E. et al. Complex modeling with detailed temporal predictors does not improve health records-based suicide risk prediction. NPJ Digit Med 2023; 6 (01) 47

Crossref PubMed Search in Google Scholar
Download RIS citation
47 Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004; 32 (Database issue): D267-D270

Crossref PubMed Search in Google Scholar
Download RIS citation
48 Mandani S, Giuse D, McLemore M, Weitkamp A. Augmenting NLP Results by Leveraging SNOMED CT Relationships for Identification of Implantable Cardiac Devices from Patient Notes. Presented at: SNOMED CT Expo 2019; October 31, 2019; Kuala Lumpur, Malaysia. Accessed September 15, 2024 at: https://confluence.ihtsdotools.org/display/FT/201905+Augmenting+NLP+results+by+leveraging+SNOMED+CT+relationships+for+identification+of+implantable+cardiac+devices+from+patient+notes?preview=/87042613/87043024/201905%20SCT%20Expo%202019%20-%20Madani.pdf

Download RIS citation
49 Sparck Jones K. A statistical interpretation of term specificity and its application in retrieval. J Doc 1972; 28 (01) 11-21

Crossref Search in Google Scholar
Download RIS citation
50 Landauer TK, Dumais ST. A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 1997; 104 (02) 211-240

Crossref Search in Google Scholar
Download RIS citation
51 Pedregosa F, Varoquaux G, Gramfort A. et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12: 2825-2830

Search in Google Scholar
Download RIS citation
52 Paszke A, Gross S, Massa F. et al. PyTorch: an imperative style, high-performance deep learning library. 2019; . Accessed September 15, 2024 at:

Crossref Search in Google Scholar
Download RIS citation
53 Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 2000: 10

Search in Google Scholar
Download RIS citation
54 Ross EL, Zuromski KL, Reis BY, Nock MK, Kessler RC, Smoller JW. Accuracy requirements for cost-effective suicide risk prediction among primary care patients in the US. JAMA Psychiatry 2021; 78 (06) 642-650

Crossref PubMed Search in Google Scholar
Download RIS citation
55 Spiegelhalter DJ. Probabilistic prediction in patient management and clinical trials. Stat Med 1986; 5 (05) 421-433

Crossref PubMed Search in Google Scholar
Download RIS citation
56 Scornet E. Trees, forests, and impurity-based variable importance. ; 2021. Accessed May 16, 2022 at: http://arxiv.org/abs/2001.04295

Download RIS citation
57 Boggs JM, Beck A, Hubley S. et al. General medical, mental health, and demographic risk factors associated with suicide by firearm compared with other means. Psychiatr Serv 2018; 69 (06) 677-684

Crossref PubMed Search in Google Scholar
Download RIS citation
58 Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics; 2014: 1532-1543

Search in Google Scholar
Download RIS citation
59 Sarsam SM, Al-Samarraie H, Alzahrani AI, Alnumay W, Smith AP. A lexicon-based approach to detecting suicide-related messages on Twitter. Biomed Signal Process Control 2021; 65: 102355

Crossref Search in Google Scholar
Download RIS citation
60 Gaur M, Aribandi V, Alambo A. et al. Characterization of time-variant and time-invariant assessment of suicidality on Reddit using C-SSRS. PLoS One 2021; 16 (05) e0250448

Crossref PubMed Search in Google Scholar
Download RIS citation
61 Cambria E, Li Y, Xing FZ, Poria S, Kwok K. SenticNet 6: ensemble application of symbolic and subsymbolic AI for sentiment analysis. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM; 2020: 105-114

Search in Google Scholar
Download RIS citation
62 Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 2002; 16: 321-357

Crossref Search in Google Scholar
Download RIS citation
63 Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 2017; 18 (17) 1-5

Search in Google Scholar
Download RIS citation

Address for correspondence

Kevin J. Krause, BS, MS

Department of Biomedical Informatics, Vanderbilt University Medical Center

2525 West End Ave #1475, Nashville, TN 37203

United States

Email: Kevin.krause@vanderbilt.edu

Publication History

Received: 22 August 2023

Accepted: 05 September 2024

Accepted Manuscript online:
09 September 2024

Article published online:
18 December 2024

Georg Thieme Verlag KG
Oswald-Hesse-Straße 50, 70469 Stuttgart, Germany

References
1 Centers for Disease Control and Prevention, National Center for Health Statistics. National Vital Statistics System,. Mortality 2018–2021 on CDC WONDER Online Database; 2021

Download RIS citation
2 Zalsman G, Hawton K, Wasserman D. et al. Suicide prevention strategies revisited: 10-year systematic review. Lancet Psychiatry 2016; 3 (07) 646-659

Crossref PubMed Search in Google Scholar
Download RIS citation
3 Mann JJ, Apter A, Bertolote J. et al. Suicide prevention strategies: a systematic review. JAMA 2005; 294 (16) 2064-2074

Crossref PubMed Search in Google Scholar
Download RIS citation
4 Walsh CG, Johnson KB, Ripperger M. et al. Prospective validation of an electronic health record-based, real-time suicide risk model. JAMA Netw Open 2021; 4 (03) e211428

Crossref PubMed Search in Google Scholar
Download RIS citation
5 Walsh CG, Ribeiro JD, Franklin JC. Predicting risk of suicide attempts over time through machine learning. Clin Psychol Sci 2017; 5 (03) 457-469

Crossref Search in Google Scholar
Download RIS citation
6 Bejan CA, Ripperger M, Wilimitis D. et al. Improving ascertainment of suicidal ideation and suicide attempt with natural language processing. Sci Rep 2022; 12 (01) 15146

Crossref PubMed Search in Google Scholar
Download RIS citation
7 Young J, Bishop S, Humphrey C, Pavlacic JM. A review of natural language processing in the identification of suicidal behavior. J Affect Disord Rep 2023; 12: 100507

Crossref Search in Google Scholar
Download RIS citation
8 Cohen J, Wright-Berryman J, Rohlfs L, Trocinski D, Daniel L, Klatt TW. Integration and validation of a natural language processing machine learning suicide risk prediction model based on open-ended interview language in the emergency department. Front Digit Health 2022; 4: 818705

Crossref PubMed Search in Google Scholar
Download RIS citation
9 Levis M, Leonard Westgate C, Gui J, Watts BV, Shiner B. Natural language processing of clinical mental health notes may add predictive value to existing suicide risk models. Psychol Med 2021; 51 (08) 1382-1391

Crossref PubMed Search in Google Scholar
Download RIS citation
10 Coppersmith G, Leary R, Crutchley P, Fine A. Natural language processing of social media as screening for suicide risk. Biomed Inform Insights 2018; 10: 11 78222618792860

Crossref PubMed Search in Google Scholar
Download RIS citation
11 McCoy Jr TH, Castro VM, Roberson AM, Snapper LA, Perlis RH. Improving prediction of suicide and accidental death after discharge from general hospitals with natural language processing. JAMA Psychiatry 2016; 73 (10) 1064-1071

Crossref PubMed Search in Google Scholar
Download RIS citation
12 Pestian J, Nasrallah H, Matykiewicz P, Bennett A, Leenaars A. Suicide note classification using natural language processing: a content analysis. Biomed Inform Insights 2010; 2010 (03) 19-28

PubMed Search in Google Scholar
Download RIS citation
13 Tsui FR, Shi L, Ruiz V. et al. Natural language processing and machine learning of electronic health records for prediction of first-time suicide attempts. JAMIA Open 2021; 4 (01) ooab011

Crossref PubMed Search in Google Scholar
Download RIS citation
14 Meerwijk EL, Tamang SR, Finlay AK, Ilgen MA, Reeves RM, Harris AHS. Suicide theory-guided natural language processing of clinical progress notes to improve prediction of veteran suicide risk: protocol for a mixed-method study. BMJ Open 2022; 12 (08) e065088

Crossref PubMed Search in Google Scholar
Download RIS citation
15 Ji S. Towards intention understanding in suicidal risk assessment with natural language processing. In: Findings of the Association for Computational Linguistics: EMNLP 2022. Association for Computational Linguistics; 2022 :4028–4038. Accessed September 15, 2024 at: https://aclanthology.org/2022.findings-emnlp.297

Search in Google Scholar
Download RIS citation
16 Ji S, Yu CP, Fung S fu, Pan S, Long G. Supervised learning for suicidal ideation detection in online user content. Complexity 2018; 2018: 1-10

Search in Google Scholar
Download RIS citation
17 Arowosegbe A, Oyelade T. Application of natural language processing (NLP) in detecting and preventing suicide ideation: a systematic review. Int J Environ Res Public Health 2023; 20 (02) 1514

Crossref PubMed Search in Google Scholar
Download RIS citation
18 Zhong QY, Mittal LP, Nathan MD. et al. Use of natural language processing in electronic medical records to identify pregnant women with suicidal behavior: towards a solution to the complex classification problem. Eur J Epidemiol 2019; 34 (02) 153-162

Crossref PubMed Search in Google Scholar
Download RIS citation
19 Zhang D, Yin C, Zeng J, Yuan X, Zhang P. Combining structured and unstructured data for predictive models: a deep learning approach. BMC Med Inform Decis Mak 2020; 20 (01) 280

Crossref PubMed Search in Google Scholar
Download RIS citation
20 Thompson K. Programming techniques: regular expression search algorithm. Commun ACM 1968; 11 (06) 419-422

Crossref Search in Google Scholar
Download RIS citation
21 Beam AL, Kompa B, Schmaltz A. et al. Clinical concept embeddings learned from massive sources of multimodal medical data. Pac Symp Biocomput 2020; 25: 295-306

PubMed Search in Google Scholar
Download RIS citation
22 Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. ArXiv13013781. Accessed February 13, 2022 at: http://arxiv.org/abs/1301.3781

Download RIS citation
23 Blei DM, Ng AY, Jordan MI. Latent Dirichlet allocation. J Mach Learn Res 2003; 3 (Jan): 993-1022

Search in Google Scholar
Download RIS citation
24 Dey L, Haque SKM. Opinion mining from noisy text data. In: Proceedings of the Second Workshop on Analytics for Noisy Unstructured Text Data - AND '08. ACM Press; 2008: 83-90

Search in Google Scholar
Download RIS citation
25 Turney PD. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. arXiv:cs/0212032. Accessed February 13, 2022 at: http://arxiv.org/abs/cs/0212032

Download RIS citation
26 Boggs JM, Quintana LM, Powers JD, Hochberg S, Beck A. Frequency of clinicians' assessments for access to lethal means in persons at risk for suicide. Arch Suicide Res 2022; 26 (01) 127-136

Crossref PubMed Search in Google Scholar
Download RIS citation
27 Yeskuatov E, Chua SL, Foo LK. Leveraging reddit for suicidal ideation detection: a review of machine learning and natural language processing techniques. Int J Environ Res Public Health 2022; 19 (16) 10347

Crossref PubMed Search in Google Scholar
Download RIS citation
28 Krause KJ, Shelley J, Becker A, Walsh C. Exploring risk factors in suicidal ideation and attempt concept cooccurrence networks. AMIA Annu Symp Proc 2023; 2022: 644-652

PubMed Search in Google Scholar
Download RIS citation
29 Montesinos López OA, Montesinos López A, Crossa J. Overfitting, model tuning, and evaluation of prediction performance. In: Multivariate Statistical Machine Learning Methods for Genomic Prediction. Springer International Publishing;; 2022: 109-139

Search in Google Scholar
Download RIS citation
30 Zhao J, Henriksson A. Learning temporal weights of clinical events using variable importance. BMC Med Inform Decis Mak 2016; 16 (Suppl. 02) 71

Crossref PubMed Search in Google Scholar
Download RIS citation
31 Zhao J, Henriksson A, Kvist M, Asker L, Boström H. Handling temporality of clinical events for drug safety surveillance. AMIA Annu Symp Proc 2015; 2015: 1371-1380

PubMed Search in Google Scholar
Download RIS citation
32 Singh A, Nadkarni G, Gottesman O, Ellis SB, Bottinger EP, Guttag JV. Incorporating temporal EHR data in predictive models for risk stratification of renal function deterioration. J Biomed Inform 2015; 53: 220-228

Crossref PubMed Search in Google Scholar
Download RIS citation
33 Choi E, Bahadori MT, Schuetz A, Stewart WF, Sun J, Doctor AI. Doctor AI: predicting clinical events via recurrent neural networks. JMLR Workshop Conf Proc 2016; 56: 301-318

PubMed Search in Google Scholar
Download RIS citation
34 Che Z, Purushotham S, Cho K, Sontag D, Liu Y. Recurrent neural networks for multivariate time series with missing values. Sci Rep 2018; 8 (01) 6085

Crossref PubMed Search in Google Scholar
Download RIS citation
35 Joiner TE. Why People Die by Suicide. Harvard University Press;; 2005

Search in Google Scholar
Download RIS citation
36 Klonsky ED, May AM. The three-step theory (3ST): a new theory of suicide rooted in the “ideation-to-action” framework. Int J Cogn Ther 2015; 8 (02) 114-129

Crossref Search in Google Scholar
Download RIS citation
37 Klonsky ED, May AM, Saffer BY. Suicide, suicide attempts, and suicidal ideation. Annu Rev Clin Psychol 2016; 12 (01) 307-330

Crossref PubMed Search in Google Scholar
Download RIS citation
38 Klonsky ED, Saffer BY, Bryan CJ. Ideation-to-action theories of suicide: a conceptual and empirical update. Curr Opin Psychol 2018; 22: 38-43

Crossref PubMed Search in Google Scholar
Download RIS citation
39 Van Orden KA, Witte TK, Cukrowicz KC, Braithwaite SR, Selby EA, Joiner Jr TE. The interpersonal theory of suicide. Psychol Rev 2010; 117 (02) 575-600

Crossref PubMed Search in Google Scholar
Download RIS citation
40 Schafer KM, Kennedy G, Gallyer A, Resnik P. A direct comparison of theory-driven and machine learning prediction of suicide: a meta-analysis. PLoS One 2021; 16 (04) e0249833

Crossref PubMed Search in Google Scholar
Download RIS citation
41 Walker RL, Shortreed SM, Ziebell RA. et al. Evaluation of electronic health record-based suicide risk prediction models on contemporary data. Appl Clin Inform 2021; 12 (04) 778-787

Thieme Connect PubMed Search in Google Scholar
Download RIS citation
42 Carter G, Milner A, McGill K, Pirkis J, Kapur N, Spittal MJ. Predicting suicidal behaviours using clinical instruments: systematic review and meta-analysis of positive predictive values for risk scales. Br J Psychiatry 2017; 210 (06) 387-395

Crossref PubMed Search in Google Scholar
Download RIS citation
43 Wilimitis D, Turer RW, Ripperger M. et al. Integration of face-to-face screening with real-time machine learning to predict risk of suicide among adults. JAMA Netw Open 2022; 5 (05) e2212095

Crossref PubMed Search in Google Scholar
Download RIS citation
44 McKernan LC, Lenert MC, Crofford LJ, Walsh CG. Outpatient engagement and predicted risk of suicide attempts in fibromyalgia. Arthritis Care Res (Hoboken) 2019; 71 (09) 1255-1263

Crossref PubMed Search in Google Scholar
Download RIS citation
45 Walsh CG, Ripperger MA, Novak L. et al. Randomized controlled comparative effectiveness trial of risk model-guided clinical decision support for suicide screening. medRxiv 2024

Crossref Search in Google Scholar
Download RIS citation
46 Shortreed SM, Walker RL, Johnson E. et al. Complex modeling with detailed temporal predictors does not improve health records-based suicide risk prediction. NPJ Digit Med 2023; 6 (01) 47

Crossref PubMed Search in Google Scholar
Download RIS citation
47 Bodenreider O. The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res 2004; 32 (Database issue): D267-D270

Crossref PubMed Search in Google Scholar
Download RIS citation
48 Mandani S, Giuse D, McLemore M, Weitkamp A. Augmenting NLP Results by Leveraging SNOMED CT Relationships for Identification of Implantable Cardiac Devices from Patient Notes. Presented at: SNOMED CT Expo 2019; October 31, 2019; Kuala Lumpur, Malaysia. Accessed September 15, 2024 at: https://confluence.ihtsdotools.org/display/FT/201905+Augmenting+NLP+results+by+leveraging+SNOMED+CT+relationships+for+identification+of+implantable+cardiac+devices+from+patient+notes?preview=/87042613/87043024/201905%20SCT%20Expo%202019%20-%20Madani.pdf

Download RIS citation
49 Sparck Jones K. A statistical interpretation of term specificity and its application in retrieval. J Doc 1972; 28 (01) 11-21

Crossref Search in Google Scholar
Download RIS citation
50 Landauer TK, Dumais ST. A solution to Plato's problem: the latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychol Rev 1997; 104 (02) 211-240

Crossref Search in Google Scholar
Download RIS citation
51 Pedregosa F, Varoquaux G, Gramfort A. et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011; 12: 2825-2830

Search in Google Scholar
Download RIS citation
52 Paszke A, Gross S, Massa F. et al. PyTorch: an imperative style, high-performance deep learning library. 2019; . Accessed September 15, 2024 at:

Crossref Search in Google Scholar
Download RIS citation
53 Platt J. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 2000: 10

Search in Google Scholar
Download RIS citation
54 Ross EL, Zuromski KL, Reis BY, Nock MK, Kessler RC, Smoller JW. Accuracy requirements for cost-effective suicide risk prediction among primary care patients in the US. JAMA Psychiatry 2021; 78 (06) 642-650

Crossref PubMed Search in Google Scholar
Download RIS citation
55 Spiegelhalter DJ. Probabilistic prediction in patient management and clinical trials. Stat Med 1986; 5 (05) 421-433

Crossref PubMed Search in Google Scholar
Download RIS citation
56 Scornet E. Trees, forests, and impurity-based variable importance. ; 2021. Accessed May 16, 2022 at: http://arxiv.org/abs/2001.04295

Download RIS citation
57 Boggs JM, Beck A, Hubley S. et al. General medical, mental health, and demographic risk factors associated with suicide by firearm compared with other means. Psychiatr Serv 2018; 69 (06) 677-684

Crossref PubMed Search in Google Scholar
Download RIS citation
58 Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics; 2014: 1532-1543

Search in Google Scholar
Download RIS citation
59 Sarsam SM, Al-Samarraie H, Alzahrani AI, Alnumay W, Smith AP. A lexicon-based approach to detecting suicide-related messages on Twitter. Biomed Signal Process Control 2021; 65: 102355

Crossref Search in Google Scholar
Download RIS citation
60 Gaur M, Aribandi V, Alambo A. et al. Characterization of time-variant and time-invariant assessment of suicidality on Reddit using C-SSRS. PLoS One 2021; 16 (05) e0250448

Crossref PubMed Search in Google Scholar
Download RIS citation
61 Cambria E, Li Y, Xing FZ, Poria S, Kwok K. SenticNet 6: ensemble application of symbolic and subsymbolic AI for sentiment analysis. In: Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM; 2020: 105-114

Search in Google Scholar
Download RIS citation
62 Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 2002; 16: 321-357

Crossref Search in Google Scholar
Download RIS citation
63 Lemaître G, Nogueira F, Aridas CK. Imbalanced-learn: a Python toolbox to tackle the curse of imbalanced datasets in machine learning. J Mach Learn Res 2017; 18 (17) 1-5

Search in Google Scholar
Download RIS citation

Permissions and Reprints

Supplementary Material

Supplementary Material (PDF) (opens in new window)

Related Journals

Subscribe to RSS

Share / Bookmark

Enhancing Suicide Attempt Risk Prediction Models with Temporal Clinical Note Features

Authors

Abstract

Keywords

Background and Significance

Objective

Methods

Study Setting

Cohort, Clusters, and Outcomes

Features and Measurements

Experimental Overview

Model Implementation

Model Training

Evaluation

Results

Study cohort demographics (N = 1,281,001)

Model evaluation summary

Discussion

Clinical Relevance Statement

Multiple-Choice Questions

Conflict of Interest

Acknowledgments

Protection of Human and Animal Subjects

Note

Supplementary Material

References

Address for correspondence

Publication History

References