Appl Clin Inform 2022; 13(01): 161-179
DOI: 10.1055/s-0041-1742218
Review Article

Data Science Trends Relevant to Nursing Practice: A Rapid Review of the 2020 Literature

Brian J. Douthit
1   Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs; Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, United States
,
Rachel L. Walden
2   Annette and Irwin Eskind Family Biomedical Library, Vanderbilt University, Nashville, Tennessee, United States
,
Kenrick Cato
3   Department of Emergency Medicine, Columbia University School of Nursing, New York, New York, United States
,
Cynthia P. Coviak
4   Professor Emerita of Nursing, Grand Valley State University, Allendale, Michigan, United States
,
Christopher Cruz
5   Global Health Technology and Informatics, Chevron, San Ramon, California, United States
,
Fabio D'Agostino
6   Department of Medicine and Surgery, Saint Camillus International University of Health Sciences, Rome, Italy
,
Thompson Forbes
7   College of Nursing, East Carolina University, Greenville, North California, United States
,
Grace Gao
8   Department of Nursing, St Catherine University, Saint Paul, Minnesota, United States
,
Theresa A. Kapetanovic
9   College of Nursing, East Carolina University, Greenville, North California, United States
,
Mikyoung A. Lee
10   College of Nursing, Texas Woman's University, Denton, Texas, United States
,
Lisiane Pruinelli
11   School of Nursing, University of Minnesota, Minneapolis, Minnesota, United States
,
Mary A. Schultz
12   Department of Nursing, California State University, San Bernardino, California, United States
,
Ann Wieben
13   School of Nursing, University of Wisconsin-Madison, Wisconsin, United States
,
Alvin D. Jeffery
14   School of Nursing, Vanderbilt University; Tennessee Valley Healthcare System, U.S. Department of Veterans Affairs, Nashville, Tennessee, United States
› Institutsangaben

Funding This study was funded by the Agency for Healthcare Research and Quality and the Patient-Centered Outcomes Research Institute (grant: K12 HS026395) as well as the resources and use of facilities at the Department of Veterans Affairs, Tennessee Valley Healthcare System.
 

Abstract

Background The term “data science” encompasses several methods, many of which are considered cutting edge and are being used to influence care processes across the world. Nursing is an applied science and a key discipline in health care systems in both clinical and administrative areas, making the profession increasingly influenced by the latest advances in data science. The greater informatics community should be aware of current trends regarding the intersection of nursing and data science, as developments in nursing practice have cross-professional implications.

Objectives This study aimed to summarize the latest (calendar year 2020) research and applications of nursing-relevant patient outcomes and clinical processes in the data science literature.

Methods We conducted a rapid review of the literature to identify relevant research published during the year 2020. We explored the following 16 topics: (1) artificial intelligence/machine learning credibility and acceptance, (2) burnout, (3) complex care (outpatient), (4) emergency department visits, (5) falls, (6) health care–acquired infections, (7) health care utilization and costs, (8) hospitalization, (9) in-hospital mortality, (10) length of stay, (11) pain, (12) patient safety, (13) pressure injuries, (14) readmissions, (15) staffing, and (16) unit culture.

Results Of 16,589 articles, 244 were included in the review. All topics were represented by literature published in 2020, ranging from 1 article to 59 articles. Numerous contemporary data science methods were represented in the literature including the use of machine learning, neural networks, and natural language processing.

Conclusion This review provides an overview of the data science trends that were relevant to nursing practice in 2020. Examinations of such literature are important to monitor the status of data science's influence in nursing practice.


Background and Significance

Data science has become a ubiquitous term in health care. As we have transitioned from the “big data problem” to embracing new opportunities to deploy large-scale analytics,[1] [2] [3] the use of data science methods has become an increasingly important part of all health professions. Nursing is no exception, as data science developments are creating new opportunities to leverage both clinician and patient-generated data to augment nursing practice. Nurses permeate health care across all specialties and clinical areas, from inpatient to community based, bedside to provider, and pediatrics to geriatrics. Given nurses have such varied roles, the influences of data science on nursing can be widespread and could have implications on how nurses make decisions, collaborate with other professions, and provide care to their patients. To assess the impacts of data science on nursing practice, data science trends in nursing-related topics should be periodically examined.

Although it is positive to see the proliferation of data science methods being used in the literature, it can be overwhelming for most to keep abreast of the latest evidence. The idea of conducting a “data science in nursing year in review” was conceived by the members of the Data Science Workgroup of the Nursing Knowledge: Big Data Science Conference[4] hosted annually by the University of Minnesota School of Nursing. To our knowledge, such a review is not available in the literature. By conducting a yearly review, we seek to establish a reliable summary of how data science trends impact and augment nursing practice. Through this effort, nurses and the informatics community can efficiently review relevant studies published in the last year, highlighting strengths as well as areas for improvement with further research.


Objectives

The goal of this work is to provide a succinct rapid review of the literature which reflects data science trends relevant to nursing from the past year (2020). By conducting this review, we aim to inform readers of how nursing is being influenced by data science methods and to reveal general trends of their use in selected topics from the past year.


Methods

To examine the intersections of nursing practice and data science in the past year, we opted to conduct a rapid review of the literature.[5] A rapid review differs from more classic scoping review methodology in that it prioritizes search and appraisal strategies which allows for the identification of general trends in research areas without needing to conduct exhaustive literature searches. As dissemination of yearly contributions must be timely, we opted to provide highlights of data science in tandem with key topics related to nursing practice. While the appraisal methods and search strategies do not produce an exhaustive review of the literature, we believe we have retained enough key contributions to facilitate high-quality discussion and recommendations for future research.

First, we selected 16 topics for review. These topics were selected based on nursing-sensitive indicators as identified in the literature which included patient-, setting-, and nursing-related outcomes.[6] [7] The authors (consisting of nursing experts, leaders, and scientists) engaged in discussion regarding the coverage of these topics as they relate to nursing practice, resulting in the addition of the topics, “Artificial Intelligence/Machine Learning Credibility and Acceptance,” “Complex Care (Outpatient),” and “Emergency Department Visit,” which further represents the diversity of nursing and its presence as a key stakeholder in these areas.

We conducted the review using PubMed and CINAHL databases in March of 2021 for studies published during the past year (2020) in the English language. Other databases were considered (including Embase), but through careful review, we noted that the inclusion of other databases complicated the search by introducing articles that were not relevant to the topics. As we employed a rapid review of the literature, we did not find that the addition of other databases yielded high-impact studies that were not otherwise captured in PubMed and CINAHL. The inclusion of CINAHL allowed us to check for articles that were not included in PubMed and also ensured that nursing-relevant articles were included.

Studies were limited to human studies. We crafted one main search strategy to find studies discussing the use of data science with a combination of keywords and subject headings. We used the following terms to create that strategy: data science, data analytics, artificial intelligence (AI), machine learning (ML), risk assessment, decision-support techniques, clinical prediction rule, natural language processing (NLP), computer-assisted image processing, along with analytic, forecast, prediction, risk, and statistical models. Terms related to “nursing” were not included, as the associated medical subject headings (MeSH) terms did not return additional results. In addition, when used as a title/abstract term, the corpus of articles increased significantly due to irrelevant articles related to breastfeeding.

We then combined an outcome-specific search strategy with the data science search terms for all 16 topics ([Supplementary Appendix S1] [available in the online version] which presents full search strategies). Articles covering multiple topics were included in each topic summary if all reviewers acknowledged it equally represented each topic. We used the Rayyan web application[8] to perform both abstract and full-text screening. We developed an inclusion and exclusion review form via group consensus with the intention of providing a representative sample of data science publications rather than an exhaustive review of all publications ([Supplementary Appendix S2], available in the online version). We opted for more conservative methods for inclusion to further emphasize the exemplars of each topic. We did not require articles to include nurse authors nor nurse participants. Rather, we focused on including articles of interest that used data science methods relevant to nursing practice. Such studies with nonnursing study populations are useful to highlight, as they could either be applied to nursing-specific practice or could be replicated in the nursing population in the future. Specific interpretations are noted in each topic subsection.

To begin the review process, one reviewer per topic conducted an initial title and abstract review to eliminate nonrelevant articles. Reviewers were selected based on their individual expertise with the topic. Next, the authors conducted a full-text review. If there were questions whether an article should be included, the reviewer would ask a second reviewer to verify. We included publications that were either primary studies, systematic reviews, or meta-analyses. Studies were required to use data science methods which we defined as ML, NLP, unsupervised learning, and image analysis and/or sensor analysis. Studies that primarily used regression were included if they were prediction-focused and used a novel data source or were used in conjunction with other more advanced methods. Studies that used basic statistical tests (e.g., t-tests), evaluated psychometric properties, or written as opinion pieces were not included.

After full-text screening, we extracted information related to each study's purpose, study design, data sources, samples, settings, populations, operational definitions of outcomes, predictor variables, and data science methods into a standardized form. We summarized this extracted information for each topic in the results.


Results

Search Results and Screening

Overall, we screened 16,589 abstracts with 244 unique studies being included in this review with 11 instances of study-topic overlap ([Supplementary Appendix S3] [available in the online version] for inclusion/exclusion numbers by outcome). The most represented topics were in-hospital mortality, pain, and length of stay. The least commonly represented topics included unit culture, burnout, and AI/ML credibility and acceptance. [Fig. 1] illustrates the most common predictor variables, among the most common being demographics, diagnoses, and laboratory tests. Evaluating the most represented topics from year to year could help in making recommendations for future study areas.

Zoom
Fig. 1 Twelve most frequent predictor variables among the data science literature relating to nursing practice.

Several data science methods were identified in the literature as outlined in [Fig. 2]. Generalized linear models were the most common and were only included in this review if they were used in tandem with more advanced methods or were used as prospective, clinical prediction models. More advanced data science methods were also used, including supervised ML (n = 102), unsupervised ML (n = 42), neural networks (n = 28), and NLP (n = 19).

Zoom
Fig. 2 Use of methods among the data science literature relating to nursing practice.

The data extracted from each study are listed in [Supplementary Appendix S4] (available in the online version). The following results are presented by topic, summarizing study designs, data science methods, and implications for nursing practice.


Artificial Intelligence/Machine Learning Credibility and Acceptance

We identified three studies for the topic of AI/ML credibility and acceptance. All three studies used a retrospective cohort design.[9] [10] [11] The data sources used by the studies varied among an Italian national injury database,[11] nine public computer-aided diagnostic datasets,[10] and endoscopic images.[9] Study populations included adults[9] [10] and an adult/pediatric mixture.[11] One study was based in the United States,[10] one in Korea,[9] and one in Italy.[11] Sample sizes ranged from 13 to 76,911 observations.

Studies of AI/ML credibility and acceptance explored various outcomes. These included violent injury,[11] risk prediction, and colorectal cancer.[9] Studies also used a variety of predictor variables with two[9] [10] using demographic and diagnostic data, one using case reports,[11] and one using endoscopic scans.[9] Studies used several different data science methods to address explainability for these two primary outcomes. First, a hybrid model using semantic frames and long short-term memory (LSTM) was used in NLP to extract concepts related to violent injury from notes.[11] Second, an adaptive-weighted method was used with a gradient-boosted classifier (adaBoost) to better understand feature contribution in diagnosis classification.[10] Third, Choi et al[9] combined class activation and neural network learning to display concerning regions that their computer-aided colorectal cancer diagnostic approach identified from colonoscopy images.

AI/ML-based risk stratification tools can support clinicians in decision-making. This is especially true for nurses who are expected to be the last check for most treatments and interventions that patients receive.[12] [13] Increased explainability or interpretability may provide nurses with the necessary explanation that builds trust in AI/ML-based advice to support their expertise. However, only one of the studies includes nurses in the research[10] as domain experts. We found studies that approached explainability from a more quantitative analysis rather than a qualitative assessment. For example, these studies tried to improve explainability by increasing interpretation of contributing predictors[10] [11] and visually highlighting concerning anatomical areas for human confirmation.[9]


Burnout

In all, only two articles met inclusion criteria for the topic of burnout. One article used applied system dynamics modeling[14] and the other employed an open trial design.[15] Although only two studies were able to be included for this topic, the data sources were somewhat novel: one used synthetic data with an unreported sample size[14] and the other used mobile devices and sensors to collect primary data from 83 medical students. Medical students[14] and nurses[15] were the subjects of the burnout studies. One study[15] was based in Portugal, and the other[14] used synthetic data not tied to any country (but authors were based in Canada).

The two studies approached burnout from different perspectives as follows: one focused on the stress level of nurses,[15] directly measuring physical responses to stress through a smartphone, while the other focused on burnout's effects on quality of care[14] using synthetic “clinician-generated” data. The methods were also very diverse, with the study focusing on system dynamics modeling[14] following a more established mathematical approach, while the study that collected sensor and smartphone data,[15] opted to predict stress levels with an ML model.

Even with the introduction of the quadruple aim in health care,[16] clinician well-being does not seem to be a primary focus in the data science literature. It could be speculated that data relating to clinician burnout is not readily accessible. Many of the study designs in this review are noted to be retrospective, meaning that data had, at one point, already been collected. In the health care field, including nursing, we do not see regular data collection about the clinician. Although a limited sample, this literature shows that it is possible to analyze burnout in clinicians using data science methods, but one of the challenges remains in how to facilitate consistent data capture that is clinician centric. Multiple studies were noted to address electronic health record (EHR) burden in this review but did not use data science methods. Future use of data science may be helpful in furthering our understanding of how to address burnout in health care professionals.


Complex Care (Outpatient)

We identified nine studies for the topic of complex care. Of these studies, five used a retrospective cohort design.[17] [18] [19] [20] [21] Other study designs included a combination of a retrospective cohort and a prognostic design built on an ML model,[22] a prognostic design,[23] a longitudinal analysis of a continuously recruited national cohort,[24] and a comparative design with a retrospectively identified cohort which was then matched to a referent cohort from the general population.[25] A majority of studies used administrative database.[17] [18] [19] [21] [22] Two studies used EHR data,[20] [23] while the remaining two studies used either data warehouse/registry from the National Patient Register[25] or a questionnaire/survey.[24] Study populations included older adults with intellectual disability,[25] home health care,[19] [20] complex care needs,[17] [24] adults with cancer,[23] veterans with diabetes,[22] Medicare recipients with dementia,[18] and sepsis survivors.[21] Most studies were based in the United States, but other study locations included Sweden,[25] Canada,[17] and New Zealand.[24] Average sample sizes were large, ranging from 7,936 to 275,190 observations.

Studies of complex care explored various outcomes. These included hospitalizations,[17] [18] [19] emergency department (ED) use,[17] [19] mortality,[21] [22] [23] hospice use,[21] health care utilization,[20] [25] and falls risk.[24] Studies used a wide range of predictor or explanatory variables, including home health care agency characteristics,[19] continuity of primary and specialty physician care,[17] prognostic indices based on patient demographics, comorbidities, procedure codes, laboratory values and anthropomorphic measurements, medication history, and previous health service utilization,[22] patients' demographic characteristics, comorbidities,[21] [25] clinical characteristics,[21] racial/ethnic disparities,[20] dementia severity,[18] and urinary and fecal incontinence.[24] Regression was a popular method being used in eight studies with a variety of approaches. Six studies used either multivariable or multilevel regression model to find predictors of home health care agency characteristics for hospitalization and emergency room visits,[19] predictors of patients' demographic characteristics, comorbidities, and clinical characteristics for mortality,[21] predictors of racial/ethnic disparities for health care utilization,[20] predictors of dementia severity for hospitalization,[18] associations of urinary and fecal incontinence with fall risks,[24] and associations of different diagnoses and specialist psychiatric health care utilization.[25] One study used a Cox's regression model to explore continuity of primary and specialty physician care for hospitalization and emergency room visits.[17] A second study used a combination of regression and ML methods to select variables associated with mortality risk and create prognostic indices for 5- and 10-year mortality.[22] A third study used a gradient-boosted ML algorithm to predict 180-day mortality among outpatients with cancer.[23]

Outpatient complex care in reviewed studies often occurred in home health care in the community setting as a continuation or transition of care from hospital settings. Data science methods relied heavily on administrative databases and sometimes on EHR data. Management of complex care requires comprehensive data sources and inputs of health care teams, and it might obscure nursing specific data and render nursing specific data not easily distinguishable. Outcomes and measures for complex care used to build prediction models reflect the all-encompassing nature of addressing complex care management that involves the whole health care team. The reviewed studies demonstrated a lack of electronic data to represent nurses' presence and contributions in home health care. There also appeared to be a lack of method diversity in building predictive models or exploring associations of variables and outcomes related to outpatient complex care.


Emergency Department Visits

For the topic of ED visits, we identified 14 studies. These studies used a retrospective cohort design except for one[26] which used a prospective cohort design. A vast majority of studies used EHR data, while two studies used administrative and claims as the primary dataset.[27] [28] Study populations included adults in the ED,[26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] home care patients,[38] and a mixture of adult and pediatric ED patients.[39] Most studies were based in the United States, but other study locations included Hong Kong,[27] Germany,[32] Italy,[39] Portugal,[37] and South Korea.[34] [35] Sample size ranged from 199 to 2,910,321 observations.

Outcomes addressed in studies included mortality,[29] [30] future posttraumatic stress disorder (PTSD) sequelae,[26] the novel coronavirus disease 2019 (COVID-19) infection status,[31] [35] [39] ED wait time,[27] intensive care unit (ICU) admission from the ED,[37] need for head computed tomography (CT),[32] cardiac arrest,[34] stroke severity,[28] and ED utilization.[30] [33] [36] [38] ML methods were used in several studies, including the use of logistic regression, generalized linear models, neural network, and decision tree–based models to apply statistical learning to the prediction of deterioration,[30] [37] stroke severity,[28] COVID-19 diagnosis,[31] [35] [39] wait time,[27] and need for head CT,[32] while the other study used autoregressive integrated moving average to explore time-dependent patient flow.[33] [36] NLP was an alternative method used in four studies. Two used NLP to predict patient deterioration,[37] [38] one used NLP to extract concepts related to the need for a head CT,[32] and another identified stroke-related concepts from notes to aid in stroke scoring.[28]

The ED is a highly collaborative setting where the medical and nursing domains often overlap. Most of the ED-specific AI/ML studies were related to both nurses and physicians, except for one that predicted ICU admission[37] which is in the physician domain. Many studies have the potential to impact nurse's future clinical practice. First, the study by Schultebraucks et al[26] prospectively creates risk prediction for the development of PTSD after ED visits. This work may influence the discharge teaching that patients at high risk for PTSD will receive from nurses. Second, three studies modeled ED utilization[33] [36] and wait time,[27] including the innovative use of weather as a predictive factor. These studies promise to solve the intractable problem of nurse surge (short-term) staffing where it is difficult to understand who will be entering the ED for care. Third, the study by Topaz et al[38] in the home care setting helps to risk stratify prehospital patients at high risk for ED visits. This study may help EDs to forecast home care patients that will be visiting the hospital. Fourth, nurses are increasingly being asked to collect patient's socioeconomic status (SES) data in the hospital. Schuler et al.[30] used SES data in their modeling to improve health care utilization prediction. Finally, 2020 was the year of the COVID-19 pandemic, with ED's being impacted significantly. There were fewer ED COVID-19 papers than expected, possibly because ED clinicians have been too burdened by work demands to publish. However, three studies used data science methods to help answer ED COVID-19 clinical questions: if computer vision could be used to aid in diagnosing COVID-19-related pneumonia,[35] if EHR data predict COVID-19 absent laboratory test confirmation,[39] and if COVID-19 predicts routine blood tests.[31]


Falls

For the topic of falls, we identified 24 studies that met inclusion criteria. Of these studies, eight used a retrospective cohort design[40] [41] [42] [43] [44] [45] [46] [47]; seven used a prospective cohort design[48] [49] [50] [51] [52] [53] [54]; six were secondary analyses of research data obtained from prospective, retrospective, and cross-sectional studies[55] [56] [57] [58] [59] [60]; one used mixed methods wherein data from a public dataset were used in conjunction with measurements collected from sensors[61]; and one was a meta-analysis of prospective cohort and observational studies.[62] Ten of the studies used health records as a source of data but in two of these studies,[44] [47] it was not clear whether the records were electronic when they were obtained. Several of the studies, including two of the secondary analyses, incorporated data from mobility and gait sensors.[48] [49] [51] [53] [55] [60] [61] Registries and administrative datasets were used in eight studies,[40] [41] [42] [43] [45] [46] [50] [56] while questionnaires or surveys were a source of data for four studies.[49] [51] [57] [60] With the exception of one study that employed sensor data from 17-year-old persons,[55] all study participants were community dwelling, inpatient, and outpatient adults. Adults with chronic diseases of all types were included, but three of the studies included adults with specific conditions. The conditions were postpolio syndrome,[51] neurology, neurosurgery, hematology, oncology,[52] and neurology.[53] Most studies were conducted in the United States, but studies were also completed in Italy,[49] England,[43] Japan,[44] [51] Poland,[59] and South Korea.[52] The 11 studies included in the meta-analysis[62] were conducted in seven countries. Sample sizes ranged from 42 to 275,940 observations.

The outcome of falls was defined and measured in a variety of ways. In several studies, the fall was self-reported,[49] [51] [57] [60] [63] but if it occurred while the participant was in an inpatient setting or being tracked in an outpatient setting, it was often documented in medical records, registries, or administrative databases used to track adverse events.[40] [41] [42] [43] [44] [45] [46] [47] [48] [50] [52] [56] These differences in measuring the outcome are important, in that predictive models may then be more or less accurate, merely because of the accuracy or inaccuracy of the outcome measurement. The types of predictors across studies were quite consistent, nevertheless. Age was a demographic predictor for all studies. Gender was tested but not always a significant predictor. Diagnoses and/or symptoms of the participants were tested as predictors in most of the studies.[40] [41] [42] [43] [44] [45] [46] [47] [49] [50] [51] [52] [54] [56] [57] [58] [59] [60] [62] [63] Several categories of predictors were noteworthy, including strength, balance, and gait test scores[40] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [59] [60] [61] [62] [63] and nutritional status.[42] [56] [59] In 15 studies, prediction models were developed and evaluated with regression models.[40] [42] [43] [44] [45] [46] [47] [51] [52] [56] [57] [59] [60] [62] [63] Data science methods also leveraged supervised and unsupervised ML methods, including neural networks for developing risk prediction models, improving prediction of fall risk, and automating selection of data from electronic records for use in fall risk prediction algorithms.[41] [45] [46] [48] [49] [50] [52] [53] [54] [55] [58] [61]

Data science studies included in this review appeared to reveal a step forward in methods for predicting fall risk. Various activity monitors and robotics technology are capable of creating large datasets of time series tracings that can be examined for patterns suggesting motor movements or muscle weaknesses that predispose a person to falls.[48] [49] [53] [55] [60] [61] Preprocessing and analysis of such datasets present major challenges that are difficult to manage using traditional statistical techniques and programs, but it is now possible to use ML and other data science methods to determine the patterns in data that are associated with the devastating outcome of falls. From what is observed in the studies published in 2020, the use of devices and sensors is likely to increase in the future exploration of factors that predict falls.


Healthcare-associated Infections

We identified 11 studies for the topic of health care–associated infections (HAIs). Of these 11 studies (five used a retrospective cohort design,[64] [65] [66] [67] [68] three used an observational design,[69] [70] [71] two used a case-control design,[72] [73] and one used a prospective cohort design) were included.[74] A vast majority of studies solely used EHR data, while two studies added to EHR data with breath sensor data[73] and National Database of Nursing Quality Indicator (NDNQI) and Catheter Associated Urinary Tract Infection (CAUTI) datasets.[71] One study used the National Institutes of Health (NIH) Gene Expression Omnibus data.[72] Most studies focused on adult inpatients, while three studies included adult surgical patients[64] [65] [67] and one study focused on pediatric cardiology surgery patients.[70] Most studies were based in the United States but other study locations included Taiwan,[73] Italy,[66] China,[70] France, and Switzerland.[74] Patients were the unit of analysis for most studies, while one study analyzed ICU admissions,[71] one examined operative events,[64] and one focused on hospitalizations.[68] Samples sizes varied widely from study to study, ranging from 20 to 897,344 observations.

Studies explored various outcomes and included candidemia infection,[74] cardiac implantable device infections,[67] CAUTIs,[71] Clostridium difficile infections (CDIs),[69] urinary tract infection (UTIs),[66] [68] surgical site infections (SSIs),[64] [65] and ventilator-associated pneumonia (VAP).[72] [73] The majority of studies used some combination of demographic, diagnosis, laboratory, vital sign, and/or medication data as predictor variables. Several studies used additional unique predictors such as data on patient movement,[69] nurse staffing,[71] breath compounds,[73] and differentially expressed genes.[72] Several different data science methods were used. First, logistic regression was used to predict HAI outcomes in three studies.[67] [70] [74] Many studies compared the predictive performance of various supervised ML models including support vector machines,[66] [71] [73] neural networks,[66] [68] [73] decision trees,[68] [71] [73] and/or random forest models.[72] [73] Two studies used multilayer perceptrons,[65] [72] one study used naïve Bayes' classification[73] and one study conducted social network analysis.[69] NLP was used in one study to extract data from clinical notes and operative reports for surveillance of SSIs,[64] and another study used text mining of clinical notes to inform model development and case identification.[67]

Of the studies using data science techniques to examine or predict HAIs, two specifically addressed nursing implications and included nurse authors.[68] [71] Park et al demonstrated a knowledge discovery and data mining approach and aimed to describe techniques that could be used to further nursing practice and guide nursing professionals in the use of data science methods. Zachariah et al[68] described the benefit of risk stratification systems in relieving the burden on nurses to complete and document traditional risk assessment forms. While Mancini et al[66] do not specifically name nurses as a target audience, they do describe their data-science-as-a-service system as an online, user friendly platform that can help domain experts, such as clinicians and validate simple predictive models. From these studies, nursing administrators may gain valuable insights on the role of intrahospital transfers on HAIs to inform patient-placement strategies[69] and the use of predictive risk models in dressing type selection to prevent SSIs and the estimated cost savings.[65] For nurses interested in exemplars of data visualization techniques, the publication by Cai et al[72] showcases some impressive data visualizations.


Health Care Utilization and Costs

For the topic of health care utilization and costs, 24 articles were included in this review. Most were retrospective cohort studies,[75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85] [86] [87] while six used prospective cohort studies,[88] [89] [90] [91] [92] [93] four used a cross-sectional design,[94] [95] [96] [97] and one used a survey for primary data collection.[98] Most studies used the EHR and administrative databases to collect data but three used surveys,[87] [96] [98] two used public datasets,[78] [95] one used mobile phone data,[97] one used images,[90] and one used data from social media.[92] All studies were adult based with the exception of one study examining families.[98] Most studies were based in the United States, with the exception of three from Singapore,[83] [85] [93] one from China,[80] one from Brazil,[77] one from Italy,[88] one from Canada,[90] and one from the United Kingdom.[87] Sample sizes ranged from 190 to 780,295 observations.

While most studies focused on cost and included some form of cost analysis, several studies examined behaviors related to costs such as predicting health care utilization,[83] quantifying reliance on health care services,[84] verifying complete surgical removal of tumors,[90] and predicting no-shows.[93] As expected, many studies incorporated costs and insurance status as predictor variables but patient-reported variables were also common. Most studies used supervised ML. Unsupervised learning and linear models were common too, and often, multiple models were compared in search for the most accurate. One study conducted geospatial analysis.[98]

As would be expected, many of the articles included in this review used data science methods to predict cost. Such information would be helpful to hospital administration, but this does not necessarily pertain directly to nursing practice. Instead, nursing may focus its efforts on developing interventions to increase adherence to care. One example may be following-up with patients who are predicted to have a high risk of missing an important magnetic resonance imaging (MRI). While this information would be important for executives to know and potentially avoid loss of revenue, nursing can use this as an opportunity to support continuity of care.


Hospitalization

We identified 21 studies for the topic of hospitalization. Of these studies, 13 used a retrospective cohort design,[68] [86] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] 1 used an observational design,[110] 1 used a cross-sectional design,[111] 2 adopted a prognostic approach,[30] [112] 2 performed a longitudinal analysis,[113] [114] and 2 used survey data.[115] [116] A vast majority of studies used EHR data, while the remaining eight studies used administrative databases[100] [106] [107] [109] [111] [114] or surveys as the primary collection tool.[115] [116] Study populations included adults[30] with chronic illnesses,[100] [105] [107] [114] pediatrics,[86] [106] veterans,[116] COVID-19 patients,[99] [101] [102] [103] [108] [110] [113] hospice patients,[104] inpatients,[68] [112] and Medicare recipients.[109] [111] [115] All studies were based in the United States. Sample sizes ranged from 207 to 3,100,000 observations.

Studies of hospitalization explored various areas. These included hospitalization,[101] [107] [108] hospital readmissions,[110] hospitalization rates,[30] [105] [106] [114] hospitalization risks,[103] [113] [116] health care utilization,[86] [109] level-of-care requirements,[102] medication orders,[112] risk of urinary tract infections (UTI) during hospitalization,[68] risk for critical COVID-19,[99] risk of live discharge,[104] ischemic strokes,[111] recovery of function following hospitalization,[115] and the Functional Independence Measure (FIM) instrument score.[100] Interestingly, most studies used a cluster of various characteristics as predictor variables including demographics,[30] [100] [103] [105] [106] [108] [110] [111] [113] [116] sociodemographic,[106] [110] [113] or neighborhood SES[30] or neighborhood level characteristics,[104] patient level characteristics,[104] clinical data,[30] [68] [99] [100] [101] [102] [103] [105] [108] [110] [113] [116] medication data,[112] social determinants of health (SDH) data,[86] [111] administrative data,[100] claims data,[30] patient reported outcomes,[108] [116] geriatric syndrome risk factors,[109] air quality,[106] and cost trajectories.[115] Two studies used a single variable, either body mass index[107] or food swamp severity,[114] as their predictor variable. Regression was applied in most of the studies. The remaining studies used ML,[30] [68] [86] [99] [102] [112] NLP,[102] and geospatial coding.[110] [114] Studies used ML to build a prediction model of clinical data for risk for critical COVID-19,[99] level-of-care requirements,[102] and risk of UTI during hospitalization,[68] SDH data for health care utilization,[86] and EHR data for medication orders.[112]

The reviewed studies demonstrated a broad range of foci, from unique patient populations and conditions to health care management and utilization. Data science methods employed in these studies incorporated mostly EHR data sources in addition to administrative databases and occasionally survey data. Study outcomes and variables were often a cluster of characteristics that branched into the administrative and clinical domains and occasionally neighborhood and community level of characteristics. Nursing specific data were embedded and not easily distinguished. There appears to be continued needs for nursing specific considerations in studies related to hospitalization using data science methods. However, many outcomes and variables have great implications for nursing care because nursing plays a critical part in health care teams.


In-Hospital Mortality

We identified 59 studies for the topic of in-hospital mortality.[117] [118] [119] [120] [121] [122] [123] [124] [125] [126] [127] [128] [129] [130] [131] [132] [133] [134] [135] [136] [137] [138] [139] [140] [141] [142] [143] [144] [145] [146] [147] [148] [149] [150] [151] [152] [153] [154] [155] [156] [157] [158] [159] [160] [161] [162] [163] [164] [165] [166] [167] [168] [169] [170] [171] [172] [173] [174] [175] While the majority of studies used a retrospective cohort design, 11 used a prospective approach,[120] [122] [129] [140] [143] [166] [169] [170] [172] [174] [175] and two were meta-analysis studies.[154] [157] The majority of studies used EHR data, while 16 studies used some kind of registry data,[117] [118] [121] [127] [128] [131] [132] [138] [140] [141] [142] [150] [153] [155] [156] [161] [174] four studies used questionaries/surveys,[166] [169] [170] [172] and two studies used administrative data.[134] [167] With the emergence of public databases containing COVID-19 data, since 2020, many studies used these databases for their studies.[120] [123] [128] [131] [140] [142] [143] [146] [169] Study populations primarily comprised adults (sometimes limited to subpopulations such as those with chronic illnesses [e.g., Takada et al[164] and Sukmark et al[163]]), but three studies included pediatric populations,[138] [154] [161] one study included newborns,[150] and two studies included elder patients aged over 65 years.[121] [155] Sample sizes ranged from 15 to 9,000,000 observations.

To predict in-hospital mortality, studies used several methodological approaches. The majority of studies used a regression model including Cox's proportional hazards[131] [167] and mixed effect models.[141] [158] More contemporary techniques included neural networks,[117] [118] [125] [126] [127] [134] [139] [142] [165] [171] [173] random forests,[124] [126] [133] [135] [139] [142] [144] gradient boosting,[127] [128] [129] [130] [134] [135] [139] [140] [144] [168] and NLP.[127] Four studies[137] [149] [159] [173] leveraged unsupervised methods, with or without supervised methods. Almost all studies performed some level of validation, such as bootstrapping, cross-validation, or a hold-out approach. A variety of predictors were used as input for these models. Almost all studies included demographics and medical diagnosis. The majority of the studies also used medications and some sort of diagnostic techniques (e.g., laboratory values, images, vital signs, or surgical data). Some studies used COVID-19-specific data.[120] [123] [128] [131] [140] [142] [143] [146] [169] Some studies used clinical notes,[127] [145] [154] [170] [173] and two studies used socioeconomic data.[126] [167] The inclusion of a variety of variables is possible as a result of a large sample size for the majority of the studies.

Notable aspects of the in-hospital mortality literature include the use of frailty as a predictor in two studies, either as a way of predicting mortality or as a better clinical measure for symptom representation,[160] [162] as well as the use of portable lung ultrasound findings as predictors.[122] Although many studies included vital signs which are often collected by nurses, there were no studies evaluating how other aspects of nursing care delivery can predict in-hospital mortality. The use of publicly available datasets (e.g., Awad et al,[124] Baxter et al,[127] and Kong et al[144]) facilitates reproducibility and allows future investigators to explore additional data science methods, including the use of novel predictors, such as innovative features generated from text data. Notably, there were limited pediatric/neonatal population studies and limited inclusion of socioeconomic predictors which could be opportunities for future research.


Length of Stay

We identified 26 studies regarding the prediction of the hospital length of stay that used data science methods. Twenty-three studies used a retrospective cohort design,[126] [133] [159] [173] [176] [177] [178] [179] [180] [181] [182] [183] [184] [185] [186] [187] [188] [189] [190] [191] [192] [193] [194] while three were prospective cohort studies.[129] [195] [196] Data sources mostly used administrative databases[126] [133] [179] [180] [182] [186] [191] [192] [194] and EHRs,[129] [133] [176] [183] [184] [188] [190] [192] [196] while other studies used publicly available datasets,[159] [173] [178] [187] [189] data warehouses and registries,[133] [177] [180] [195] paper clinical notes,[193] paper patient records,[185] research electronic data capture systems,[188] trial datasets,[181] questionnaires,[196] and routine bedside monitors.[176] Sample sizes ranged from 143 to 2,997,249 patients. Study populations included surgical patients,[133] [159] [177] [179] [181] [182] [183] [195] [196] ICU patients,[173] [176] [178] [187] [189] [190] medical-surgical patients,[126] [129] [180] [191] patients presenting to the ED,[184] [188] [193] [194] and psychiatric patients.[185] [186] [192] Most studies were conducted using U.S. patient data,[129] [133] [159] [173] [176] [177] [178] [181] [182] [183] [187] [189] [191] while other studies used patient data from Australia,[126] [179] [193] [194] Brazil,[186] [188] Canada,[195] [196] China,[180] England,[190] Germany,[192] Switzerland,[185] and Taiwan.[184]

Studies about length of stay also investigated other outcomes, such as mortality,[126] [129] [133] [159] [173] [178] [181] [187] [188] [196] clinical and functional complications (e.g., surgical, respiratory complications, or disability),[126] [133] [159] [183] [192] [196] readmission,[129] [173] [182] discharge destination,[126] [183] [193] patient-reported outcome measures,[182] patient phenotyping,[178] and hospital admission.[188] Demographic data were used as a predictor variable in all studies, while another common predictor was medical diagnosis.[126] [133] [159] [177] [178] [181] [182] [183] [186] [188] [190] [191] [192] [195] Other predictors used clinical data,[126] [176] [177] [178] [180] [181] [184] [185] [187] [190] [194] [195] laboratory tests,[126] [129] [133] [173] [178] [181] [182] [187] [189] [190] vital signs,[126] [129] [133] [173] [176] [178] [184] [187] [189] hospitalization data (e.g., admission/discharge data and hospital characteristics),[126] [129] [133] [159] [186] [191] [192] [194] surgery data,[133] [177] [179] [181] [182] [195] [196] anthropometric data,[159] [177] [178] [181] [184] [187] [195] scales/instruments,[126] [180] [188] [192] [196] social data,[126] [181] [185] [186] [195] medications,[129] [133] [177] [183] insurance status/type,[133] [179] [180] [191] clinical notes,[173] [184] [193] services used,[183] [186] [194] and data collected by nurses using the International Classification of Functioning, Disability and Health (ICF).[180] Studies used supervised ML algorithms,[126] [129] [133] [159] [176] [177] [181] [183] [185] [187] [189] [192] [193] [194] generalized linear models,[178] [180] [182] [184] [186] [188] [190] [191] [192] [195] [196] deep learning models,[126] [173] [178] [179] [181] [187] [189] [193] as well as unsupervised ML algorithms,[176] [186] [187] and NLP[184] [193]. Among the supervised ML methods, random forest was one of the most used classification algorithms.[126] [133] [159] [176] [177] [181] [183] [189] [193] Deep learning architectures, such as neural networks, were applied in studies with a large amount of data. Unsupervised algorithms used clustering methods to mine datasets and find patient data features to be used for predicting length of stay. NLP was used to extract data from clinical notes for predicting length of stay and discharge destinations. Interestingly, more than one data science method was used in some studies. For example, in one study,[187] supervised, unsupervised, and deep learning algorithms were applied to develop a predictive model for determining length of stay. In another study,[193] supervised, deep learning algorithms, and NLP were used to predict length of stay and discharge destination.

Future prospective studies are needed for external validation of the models developed. Unstructured data (e.g., clinical notes) and structured data (e.g., administrative data) have commonly been used in the studies. However, we did not find any study that used a combination of both. Further studies are required to incorporate these two types of data in the same prediction model because patient information is typically found in unstructured and structured data. Nursing-generated data were mentioned only in two studies with nursing notes and assessment data using a nonmedical classification (i.e., the ICF). Nurses represent the largest health care profession worldwide and the profession that generates the most data about the patient condition; therefore, failing to use these nursing-generated data could become a significant issue. Further studies should use both unstructured and structured nursing-generated data (e.g., standard nursing terminologies) jointly with the commonly used predictors to build prediction models.


Pain

Out of the total of 27 studies that were identified for the topic of pain, 14 used a prospective cohort design,[197] [198] [199] [200] [201] [202] [203] [204] [205] [206] [207] [208] [209] [210] 11 used an observational design,[200] [204] [205] [207] [210] [211] [212] [213] [214] [215] [216] 6 used a retrospective cohort design,[211] [214] [215] [217] [218] [219] 4 used a randomized control trial,[201] [212] [220] [221] 1 used a cross-sectional design,[222] and 1 used mixed methods.[223] Most studies used questionnaire/survey data, but eight used administrative databases,[206] [207] [208] [210] [212] [220] [221] [222] seven used mobile devices/sensors,[200] [203] [204] [205] [210] [216] [220] and four used a data warehouse or registry.[198] [203] [208] [214] Study populations were mostly done with adults in the outpatient setting but four were inpatient[197] [201] [211] [223] and one was done with a pediatric population.[205] Although many studies were conducted in the United States, others included China,[213] [214] [215] [222] Australia,[207] Canada,[202] the Netherlands,[199] [212] Germany,[208] [210] [211] Norway,[201] Finland,[204] South Korea,[203] Argentina,[219] Portugal,[197] Japan,[209] and Spain.[206] Sample sizes ranged from 10 to 6,316 observations.

Studies explored various outcomes including surgical applications such as determination of postsurgical measures based on residual pain,[197] predicting patellofemoral pain 1 year after intervention,[201] predicting neuropathic pain,[202] predicting chronic pain of 7 to 10 years into the future,[217] predicting complex regional pain syndrome,[207] predicting pain relief for knee osteoarthritis patients,[209] detection of pain,[210] [214] [216] [222] and pain intensity estimation/classification.[205] [213] [215] [220] Other outcomes focused on pain as a predictor of anxiety and depression, coronary heart disease,[199] health status,[218] noncancer pain as predictor of brain aging,[208] and length of stay.[211] For patients with low back pain, societal cost,[212] and clinical and sociodemographic predictors of increased disability[221] have been studied. Some outcomes focused on the data science method as a clinical tool such as NLP of pain context from clinical notes.[219] [223] There were several novel data sources included, such as the use of physiologic signals from electroencephalograms (EEG),[213] electromyography,[204] [220] spectrogram,[205] electrodermal activity,[216] sensor data,[200] MRIs,[198] [208] [214] [222] kinematics/motion data,[203] [210] [220] and medical images.[197] [198] [208] [210] [214] [215] [222]

Many of these studies have significant impact on nursing, most notably in situations where pain cannot be feasibly assessed (e.g., patients who are nonverbal). The ability to use data science methods for analyzing facial expressions, medical images, vital signs, and other biomechanical data could augment existing conventional methods in classifying and quantifying pain experience. Using EEG and electromyography data have high potential for improving pain assessment. Leveraging ML on geospatial and kinematic data can provide benefits not just for nursing assessment but also in other medical/health disciplines.


Patient Safety

We identified seven studies exploring patient safety. The majority of studies were retrospective cohort designs.[52] [224] [225] [226] [227] Two used cross-sectional designs.[228] [229] Four studies used patient safety or incident reports as primary data for analysis,[224] [227] [228] [229] two used EHR data,[52] [226] and one used a publicly available dataset.[225] Study populations primarily consisted of adult inpatients who had an event or safety report submitted during their inpatient stay.[52] [226] [227] [228] [229] Studies were based in the United States,[224] [225] [226] [229] China,[227] Korea,[52] and the United Kingdom.[228] Sample sizes ranged from 348 to 1,740,770 observations.

Studies explored various outcomes, including predicting allergic reactions,[229] classifying medication incidents,[227] identifying falls incidents from event reports,[52] [226] identifying drug-to-drug interactions,[225] and classifying the contents of safety reports.[224] [228] Data science methods included NLP,[226] deep neural networks,[227] [229] support vector machines,[225] [228] logistic regression,[52] and naïve Bayes' classification.[224]

Maintaining patient safety in the inpatient setting requires a high level of diligence and oversight by members of the health care team and primarily rests with nurses who provide the majority of care while patients are hospitalized. Patient safety studies using data science methods could advance the health care team's ability to intervene before events occur or improve the efficiency and accuracy in the classification of patient safety events, so that improvement activities are more focused. While studies of patient safety and the reporting of patient safety events are directly related to the daily work of nurses and their diligence at the bedside, only one study was led by a nurse.[52] Two other studies included one nurse in the study team.[224] [226]


Pressure Injuries

We identified 13 studies for the topic of pressure injuries (PIs). Of these 13 studies, 7 studies used a retrospective cohort design,[230] [231] [232] [233] [234] [235] [236] 3 used a prospective cohort design,[237] [238] [239] 1 used a clinical trial,[240] 1 used a cross-sectional design,[241] and 1 used secondary data analysis.[242] A variety of data sources were used for the studies, including EHR data,[233] [234] [238] [239] data warehouses,[230] [231] [235] [236] a publicly available dataset,[232] sensor data,[240] [242] and surveys as the primary collection tool.[237] The samples across studies were adult patients admitted in hospitals,[230] [231] [232] [234] [236] [238] [239] adults in residential hospices,[237] elderly patients in nursing homes (NHs),[241] Medicare beneficiaries,[235] and adults (unspecified).[240] [242] Six studies were based in the United States,[230] [231] [232] [234] [235] [236] with other study locations including Brazil,[239] Canada,[240] France,[242] Indonesia,[241] Italy,[237] South Korea,[238] and Taiwan.[233] Sample sizes ranged from 12 to 2,091,058 observations.

Most studies used the incidence rate of PIs as the outcome, except for one study[234] that projected PI closure and two studies[240] [242] that explored PI images. Various data science methods were used to detect or predict PIs including logistic regression,[230] [231] [232] [234] [239] generalized estimating equations,[237] multiple regression,[235] path analysis,[241] supervised ML,[233] [236] [238] and imaging processing.[240] [242] The predominant predictor variables used across studies included demographics and diagnoses,[231] [232] [233] [234] [235] [236] [237] [239] followed by clinical assessment data,[231] [233] [235] [236] [237] [238] [239] [241] Braden's scale,[231] [232] [236] [237] [239] laboratory tests,[233] [234] [236] [238] [239] and medications.[232] [236] [239] Two studies used organizational factors such as nursing unit characteristics, nurse job satisfaction, facility types, or rural/urban hospital location.[230] [235]

The prevention and management of PIs remains a challenge. The prediction models developed in these studies can help nurses screen high-risk groups and manage risk factors of PIs. The predictive models could create a monitoring system that provides real-time warnings of PI onset or worsening trajectory to nurses and other health care providers and prompt them to personalize PI prevention interventions. The use of bed sheet sensors through PI classification or prediction modeling could develop an automated feedback system with body pressure mapping and consequently, changing posture or redistributing pressure, which would allow remote monitoring.[240] Repositioning in bed could be rescheduled or individualized according to patient conditions. Also, the study by Baernholdt et al[230] on the predictive impact of organizational factors on PI rates suggests that hospitals should focus on organizational structures to improve nurses' work environments and workflow, so that nurses can enhance PI interventions. Although these predictive models are promising, the generalizability and overfitting possibility need to be carefully considered due to the high heterogeneity of samples across studies and the small sample sizes in some studies. Further validation studies of such risk prediction models are needed.


Readmissions

We identified nine studies for the topic of readmissions. Of these nine studies, seven used a retrospective cohort design[243] [244] [245] [246] [247] [248] [249] and two used a prospective cohort design.[244] [246] Seven studies primarily used EHR data[243] [244] [245] [246] [247] [248] [250] stored in a data warehouse of the affiliated facility,[243] [246] [248] [249] [250] with one in combination with other data sources that included mobile device sensor data[244] and one with a governmental administrative database (Medicare).[245] Study populations included adults in hospital intensive care,[173] [250] those hospitalized with medical conditions,[243] those having had cardiac surgery in a progressive care unit (PCU),[244] and those who utilized Medicare services.[245] Two studies focused on Medicare patient data,[245] [249] and one study of Medicare patients included encounter information from a nonhospital setting (i.e., inpatient rehabilitation, skilled nursing, and home health services).[245] Sample sizes ranged from 100 patients[244] to over 1 million patients.[249] Data in each study were collected from health care facilities in the United States.

Risk prediction outcomes in each study included acute care readmissions occurring at 7, 30, or 90 days of hospital discharge. One study looked at readmission back to the ICU.[250] In addition to acute care readmissions, some data were used to predict length of stay of postoperative cardiac patients,[244] hospital or 180 mortality,[246] and elective surgery mortality at 30 and 90 days.[249] Studies generally included predictor variables comprising length of stay, gender, number of recent admissions, age, surgical procedure, admission location (e.g., ED, clinic, and physician referral), insurance type, diagnosis, procedures, medication, vital signs, and comorbidities. Methods used to predict readmissions included ML,[173] [243] [244] [246] [247] [248] [249] [250] NLP,[173] [246] [248] [249] [250] general linear regression,[173] [243] [247] [248] a combination of statistical modeling and ML,[245] and combined structured and unstructured data neural network.[173] Interestingly, Saleh et al[248] used an existing 30-day prediction model to compare strengths of predictors in 7-day readmissions. Only one study focused on social determinants of health that may be predictors in readmissions.[247]

The importance of hospital patient readmissions in a 30-day (or less) time interval is viewed as a quality metric by the Medicare program and other insurers. Reimbursement changes are occurring in government programs that incentivize hospitals for quality and penalize hospitals if quality metrics are not maintained. Nurses have a role in assessment, planning, and implementation of an accurate discharge plan that can help identify patients most at risk for readmission due to health condition, comorbidities, or other risk factors. ML, NLP, and predictive modeling with EHR data can provide valuable information to assist in risk identification of importance to nursing care and discharge planning. As structured and unstructured data in the health record can be combined through the design of multimodal architecture to support understanding of risk reduction, nurses can use these data in the care of at-risk populations.


Staffing

We identified four studies for the topic of staffing. One study used a prospective cohort design[251] and the remaining three used a retrospective cohort design.[252] [253] [254] Three of these studies were conducted in the United States[252] [253] [254] with one study[251] using data from a single ICU in an Italian medical center. All studies used a combination of EHR data plus administrative or systems data.[251] [252] [253] [254] Study populations varied with the adult medical-surgical population used in two studies,[251] [253] a NH population in one[252] and one study used a pediatric population.[254] Scheduling or workload studies were not discovered in the search. Sample sizes ranged from 148 to 30,679 observations.

Operational outcomes comprised (pediatric) readmissions,[254] the prediction of adverse events,[251] leaving (ED) without being seen,[253] and infection risk.[252] Unadjusted logistic regression was used to evaluate each response on the tool (insurance type, home medical equipment, home nursing, home therapy, and others) with weighted scores assigned to each category. In attempting to determine if a tool (the Patient Acuity and Complexity Score) developed for their study of the prediction of adverse events, the Sanson research group (2020) sought to discriminate between patients having experienced/not experienced a serious event in the discharged unit after intensive care was received. In a study of NH quality,[252] tree-based gradient-boosting algorithms were used to evaluate the risk of COVID-19 infection (the presence of at least one confirmed COVID-19 resident in the NH). A logistic regression model and two-layer feedforward neural network were also developed using the identified stable predictors (including the number of care personnel/1,000 feet) to serve as benchmark predictive models for comparison.

Interestingly, only one study reported a traditional measure of staffing,[252] the number of patients per nurse. A new variable, leaving without being seen,[253] could spark further interest in the layered relationships of systems/administrative data when coupled with what is traditionally termed “clinical data,” particularly when clear administrative implications emerge as is the case in this study (administrative actions on ED process variables, e.g., wait times). The collection of data in 1-hour increments[253] could also prove a necessary improvement in studies with administrative variables (e.g., door-to-provider time), yet will demand further methodological scrutiny if the variability of certain hourly measures (number of persons in waiting room) outdistances that of nurse or other provider variables known to impact outcomes.


Unit Culture

Only one study explored a unit cultural element using data science methods. This study used the Hospital Survey of Patient Safety Culture to predict if a patient safety event would be voluntarily reported.[255] This study was conducted in the United States with a sample size of 526,645 survey responses.

The study included regression techniques to validate that many of elements of patient safety culture influence the possibility that a patient safety event would be voluntarily reported. Some examples of these elements include communication openness, teamwork, staffing, and hand-offs and transitions. Outcomes explored in this study included frequency of events reported, near-miss events, no potential for harm events, and potential for harm events.

The study included in this review explored how a culture of patient safety influenced voluntary reporting of patient safety events. While an argument could be made that this article may be better suited in the patient safety category, we included this as a unit cultural element due to the impact unit level dynamics have on creating a patient safety culture. More exploration is needed on unit culture using data science methods that could help explain and explore those behaviors from leaders and nurses that promote positive cultures on patient units.



Discussion

Applications of data science have a profound impact on nursing practice because our ability to meaningfully use data are expanding. Once such area that is apparent in this review is the use of predictive modeling and forecasting. As nursing shortages persist,[256] a global pandemic introduces complexities to care, and as patient populations are aging with increasing rates of comorbidities,[257] the need to accurately target interventions, resources, and clinician time is at an all-time high. While the ability of data science to augment nursing practice is not yet fully realized, this review has helped to highlight some specific use in cases and has identified some areas for future development. As noted in the results, the most represented topics were in-hospital mortality, pain, and length of stay and the least commonly represented topics included unit culture, burnout, and AI/ML credibility and acceptance. Currently, it is not clear why some topics were more represented in the literature aside from the possibility of data availability. For instance, unit culture most likely did not have the quantity of data readily available to analyze in comparison to in-hospital mortality. However, as single year cannot determine the coverage of science, future iterations of this review should note year-to-year trends.

This rapid review highlights the important intersections between nursing practice and data science. We were able to examine both patient-centered topics (such as pain) and clinician-centered topics (such as burnout), calling attention to the multifaceted approaches in which data science can support and study nursing practice. In this review, we opted to include studies that were conducted by nonnurse authors and studies that were conducted on nonnurse populations. While nursing is a unique discipline defined by a diverse set of roles in a variety of health care settings, the inclusion of works that are relevant to nursing practice but studied among nonnursing populations, provides useful information. Foremost, this informs us of data science trends that are transforming health care, giving us insight into how nursing may change, or how nursing may support the latest practice recommendations. Additionally, this provides the informatics community insights into work that needs to be validated in nurse populations. We have identified several areas for future work in the discussion, including gaps in nurse-focused research. The necessary inclusion of these multidisciplinary studies among a nursing-focused review also raises the question of whether or not the informatics community is utilizing nurses and nursing data frequently enough, especially when considering data science methods. As nurses make up the largest portion of clinicians in health care, the volume of data and availability of patients is not a barrier. Continual assessments of nursing presence among studies that examine nurse-relevant topics are important to ensure representation of nursing in data science.

Finally, the year 2020 presented unique challenges, as the COVID-19 pandemic impacted research and hospital operations,[258] publication times, and research foci. For this reason, we expected that a large amount of literature to focus on addressing the pandemic. Somewhat surprisingly, only 16 of the 244 unique articles included in this review addressed COVID-19 (albeit we did not specifically search for COVID-19-related articles). There was also significant attention on racial justice in 2020, so we examined whether any articles addressed racial bias. Only one study[235] included an emphasis on race-based health inequities. These low counts could be due to publishing times and should be examined in reviews of the literature for 2021.


Limitations

This review had three notable limitations. First, the rapid search strategy and screening process was designed to be nonexhaustive. While the purpose of this review was to provide a high-level overview of nursing-relevant literature using data science methods, it is possible that not all of the relevant literature was captured for each topic. Second, the abstract review and literature inclusion process was conducted by single authors (i.e., validation of inclusion was not done by a second reviewer). While this was necessary to help expedite this rapid review, the omission of validation by a second reviewer may have yielded different results and introduced bias. Third, not all topics related to nursing practice were included in this review. While this is not possible, every effort was made to include a representative sample of topics as determined by the literature and in-depth discussion among the nurse authors.


Conclusion

The intersection of nursing and data science provides new opportunities to improve health care by augmenting care processes. This rapid literature review has revealed several areas that have been widely studied in the past year, and some that could benefit from more research. In particular, effort should be made in improving the availability of nursing generated data in an interoperable form. It is in the best interest of the informatics community to monitor the most current trends in data science across different disciplines, as the latest methods are only helpful if they can be applied to real-world practice. Nursing is especially rife with opportunity, as it permeates inpatient, outpatient, and community settings, with nurses generating data at exponential rates.


Clinical Relevance Statement

An understanding of how data science methods influence research regarding nursing-relevant patient outcomes and clinical processes is important for nurses and the health informatics community. Data science is shaping how nursing can be practiced and how care can be delivered, as is evidenced by the literature highlighted in this review. Examining such literature is crucial to monitoring the uptake of research into real-world practice.


Multiple Choice Questions

  1. Of all the studies included in this review, which topics had the most representation?

    • In-hospital mortality, pain, and length of stay

    • Unit culture, burnout, and AI/ML credibility and acceptance

    • Health care cost and utilization, complex care, and falls

    • Pressure injuries, until culture, and falls

    Correct Answer: The correct answer is option a. While the other topics had representative literature, in-hospital mortality, pain, and length of stay were most frequently represented. By understanding what topics are most frequently represented (and those that are less frequently represented) in the literature, we may make informed decisions regarding our approach to future research.

  2. What method was used to identify the literature in this review?

    • Natural language processing

    • Deep learning model

    • Rapid literature review

    • Systematic literature review

    Correct Answer: The correct answer is option c. This review followed a rapid literature review protocol. We examined data science methods such as natural language processing and deep learning, but these methods were not used to conduct this review. A systematic review follows a more robust protocol, but we opted to report our findings following rapid review to expedite the dissemination of this 2020 review.



Conflict of Interest

None declared.

Protection of Human and Animal Subject

This research does not involve human subjects.


Supplementary Material


Address for correspondence

Alvin D. Jeffery, PhD, RN-BC, CCRN-K, FNP-BC
461 21st Avenue South, Nashville, TN 37240
United States   

Publikationsverlauf

Eingereicht: 24. Juni 2021

Angenommen: 12. Dezember 2021

Artikel online veröffentlicht:
09. Februar 2022

© 2022. Thieme. All rights reserved.

Georg Thieme Verlag KG
Rüdigerstraße 14, 70469 Stuttgart, Germany


Zoom
Fig. 1 Twelve most frequent predictor variables among the data science literature relating to nursing practice.
Zoom
Fig. 2 Use of methods among the data science literature relating to nursing practice.