Keywords
Social determinants of health - public health informatics - electronic health records - exposome
1 Introduction
In the past decade, a rapidly growing body of literature has argued and demonstrated the important role of social determinants of health (SDoH) in shaping human health and well-being [[1]]. According to the World Health Organization (WHO), SDoH are the conditions in which “people are born, grow, live, work, and age” [[2]]. These non-medical factors include social, societal, and environmental conditions such as income, education, employment, insurance, social relationships, physical environments, and more. Prior research has demonstrated that SDoH are major drivers of health outcomes and more importantly, the main contributors to the widespread health inequities. It was estimated that, in the United States, SDoH could be responsible for up to 40% of all preventable deaths, significantly higher than the 10-15% for which better medical care can be accounted for [[3]
[4]
[5]]. Public health interventions that target SDoH are instrumental for improving health and reducing the long-standing health inequities.
Recognizing the importance of SDoH on health, various professional societies and organizations, including the WHO [[2]], Healthy People 2030 [[6]], and the National Academy of Medicine (NAM) [[7]], have published frameworks that define SDoH and advocated for the collection of SDoH data. In particular, the NAM organized a committee on “Capturing Social and Behavioral Domains and Measures in Electronic Health Records”, which identified 12 SDoH measures to be included in patients' electronic health records (EHRs) to inform the meaningful use of EHRs [[8]]. Internationally, the WHO European Health Equity Status Report initiative (HESRi) is being developed to promote policy making for health equity and well-being for the European Region [[9], [10]]. Canadian researchers also developed a rural-specific SDoH framework called “Rural Community Health and Well-being Framework”, which includes 13 categories of SDoH that are pertinent to rural residents [[11]]. SDoH influence health and well-being through a complex interplay between individual- and contextual-level factors. Individual-level SDoH are factors measured from an individual, such as education, occupation, and health behaviors, while contextual-level SDoH are factors measured from an individual's surroundings, including both social and physical environments, such as built environment, healthcare quality, and community environment. At the individual level, collecting SDoH for a patient provides clinicians with a complete social context of a patient's health status, facilitating shared decision-making and individualized treatment planning. At the community and societal levels, a successful public health intervention should simultaneously target individual- and contextual-level SDoH considering the powerful role and interacting nature of SDoH.
In recent years, there is an increasing number of studies that harness real-world data (RWD) with SDoH to support public and population health, with a particular emphasis on EHRs, claims and billing data, public health survey data (e.g., the National Health and Nutrition Examination Survey [[12]]), and other data such as exposome data. In particular, to support precision prevention and treatment of diseases, SDoH is being incorporated in the models for disease screening and prediction to identify social risks. Efforts such as social prescribing would link patients with non-medical sources of support, such as community services, social services, and local organizations to address the underlying social and lifestyle factors that contribute to poor health and wellbeing, and to promote positive health outcomes [[13]]. Importantly, SDoH-enriched RWD can also support public health interventions by identifying populations at higher risk for certain health problems, allowing for targeted and more effective interventions for early prevention. By better understanding the impact of social and environmental factors on health and health care access, health systems and healthcare providers can work with communities and public health organizations to address these factors, reduce health disparities, and improve health equity, as SDoH are often the root causes of disparities and account for 80% of modifiable factors [[14], [15]]. Nonetheless, it is challenging to identify SDoH information and appropriately link it to clinical and public health data to enable these applications. Informatics approaches such as natural language processing, ontologies, spatiotemporal data integration offer promising solutions to tack these challenges.
Given the prominent role of EHRs for disease prevention and treatment, and the increasing popularity of integrating SDoH into EHRs for health outcomes and health equity, this article reviews recent informatics approaches covering a wide range of methods and applications, including: (1) the collection of SDoH from both structured and unstructured EHR data, (2) linking public surveys and environmental data to EHR for measuring contextual-level SDoH, (3) the standardization of SDoH with ontologies, and (4)the utility of SDoH-enriched EHRs in public and population health applications including public health intervention, risk stratification, and prediction of unmet social needs.
We invited leading experts who have published extensively in these areas [[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
[28]
[29]] to conceptualize this review article and co-author sections corresponding to their expertise. [Figure 1] shows our conceptual framework, adopting the social-ecological model [[30]] and the National Institute on Minority Health and Health Disparities (NIMHD) Research Framework [[31]], for integrating SDoH data with EHR data to support various health applications at the individual, family, community, and societal levels. The rest of this opinion review is organized as follows:
-
In Section 2, we review the techniques for SDoH data capture and data engineering, as well as use cases;
-
In Section 3, we review the techniques for creating SDoH ontologies;
-
In Section 4, we review the public and population health applications using RWD enriched with SDoH;
-
In Section 5, we summarize the challenges and promising pathways towards successful public and population health applications, leveraging RWD and SDoH.
Fig. 1 A conceptual framework for integrating SDoH data with EHR data to support public and population health applications.
2 SDoH Data Engineering
In this section, we will first review the standards for SDoH screening and discuss the challenges and opportunities for capturing SDoH information in structured EHR data (Section 2.1). Then, we will review natural language processing approaches for extracting SDoH information from clinical notes in EHRs and point out the low documentation rate of certain categories of SDoH in clinical notes (Section 2.2). Then, we will review recent efforts, challenges, and techniques of linking contextual SDoH data to EHR data (Section 2.3).
2.1 Structured Data and Tools in EHR Systems for Capturing SDoH
Given the increasing recognition of the importance of SDoH for patient care and population health, EHR vendors have started to implement structured SDoH fields to collect this information directly from patients during the course of care [[32]]. These structured fields typically cover commonly recognized SDoH domains including healthcare access, child care, financial strain, housing, transportation, food insecurity, education, and employment, among others [[33]].
While the implementation of structured SDoH fields is a positive first step towards interoperability, there is still a lack of standardization among EHR vendors and health care systems regarding how and from whom SDoH information should be collected. This results in inconsistencies in the data collected and presents a challenge for the use and exchange of SDoH data across different systems. As identified by Arons et al. [[33]], the six most popular SDoH instruments and screening tools implemented in EHR systems are the NAM Recommended Social and Behavior Domains and Measures report [[8]]; the National Association of Community Health Center (NACHC)'s Protocol for Responding to and Assessing Patients' Assets, Risks, and Experiences (PRAPARE) survey [[34]]; the Centers for Medicare and Medicaid Services (CMS)'s Accountable Health Communities (AHC) survey [[35]]; the Health Leads questionnaire [[36]]; the Safe Environment for Every Kid (SEEK) questionnaire [[37]]; and the WE CARE survey instrument [[38]]. There is considerable variation among these tools and instruments in the questions that are asked and the SDoH domains that are covered. This variation is compounded by the fact that many health care systems make additional customizations upon implementation, further limiting the opportunities for interoperability and standardization. In interviews, some top EHR vendors described the built-in flexibility of their SDoH data collection modules as a feature, noting that patient populations and reporting requirements vary from health system to health system. Those same vendors, however, also noted the disadvantages of this flexibility in terms of data sharing, interoperability, data aggregation, and analytics [[39]].
In the absence of a uniform standard for SDoH screening, mapping structured SDoH fields in EHRs to existing standard clinical terminologies, such as the International Classification of Diseases-Tenth Revision (ICD-10), Logical Observation Identifiers Names and Codes (LOINC), and the Systematized Nomenclature of Medicine (SNOMED), is a step toward greater interoperability. Both questions and answers have the potential to be mapped to these standard terminologies, as shown in [Table 1].
Table 1 Mapping a SDoH question and the associated answer set to its LOINC equivalents [[40]].
EHR vendors have attempted to support this type of mapping with varying degrees of success. Unfortunately, many SDoH question/answer sets (particularly those designed to collect detailed information, such as “In a typical week, how many times do you talk on the phone with family, friends, or neighbors?”), do not have good matches within the standard terminologies [[33],[39]]. In the absence of structured fields to collect SDoH data, the ICD-10 Z55-Z65 codes can be used to capture some SDoH in a standardized way in the EHR's Problem List (e.g., Z56, “Problems related to employment and unemployment”). However, while these codes have been available since 2016, a lack of clear guidelines for use, training, or incentives has led to slow and inconsistent uptake [[41],[42]]. As of 2019, only 1.6% of Medicare beneficiaries had any Z-code in their records [[43]]. In a 2020 study [[17]], Guo et al. assessed the documentation of Z-codes in EHRs using data from a large clinical research network (the OneFlorida+ Clinical Research Consortium, covering ~15 million Floridians), and also found a low utilization rate (270.61 per 100,000 at the encounter level and 2.03% at the patient level), although the utilization rate increased slightly from 255.62 to 292.79 per 100,000 since 2018.
Based on their finding of uneven use of structured SDoH fields at University of California San Francisco Health, Wang et al. suggest that it is not enough to simply add SDoH fields to EHRs and expect them to be used. Rather, those fields must be made an integral part of clinical workflow [[44]] and SDoH documentation must be incentivized with institutional policies and procedures [[32]]. Moreover, clinicians should be specifically trained to establish the empathy and trust necessary to collect this sensitive information from their patients [[45]]. Screening for social needs probes potentially stigmatizing aspects of individuals' lives (e.g., poverty and racism), leading to potential harms through trauma, discrimination, or legal consequences. This concern is especially pronounced in existing survey-based SDoH screening without adequate face-to-face discussions [[35]]. Once the information is obtained, clinicians also lack training to use the SDoH information in clinical decision-making and formulating care plans accordingly [[41]]. Almost all existing SDoH screening tools were developed for universal screening but were not validated to predict specific outcomes, and there are often no actionable next steps even if certain SDoH issues were identified in the clinical settings. In other words, clinicians often do not know whether addressing the identified social risks would lead to any specific health outcome improvements of patients at hand, nor do they necessarily have meaningful ways to address those identified social risks. Compounding these two issues, clinicians are less inclined to adopt SDoH screening tools in their routine care.
Incentives, policies, and training are non-technical gaps in the current methods of collecting structured SDoH information in EHRs, but must be addressed in order to set up the technical solutions for success. Nevertheless, as informaticians, we must also develop these technologies tailored to the clinician and patient needs.
2.2 SDoH Extraction from Clinical Narratives
Clinical notes and other free text fields offer a flexible and intuitive way for clinicians to document SDoH information. The informal nature of a clinical note allows for recording in-depth personal information such as a patient's unstable housing situation or struggles with food insecurity. However, the lack of standardization for free text makes it challenging to analyze the data for both operational and research purposes and does not lend itself to interoperability. To better utilize SDoH information embedded in clinical notes, natural language processing (NLP) methods and tools have been developed to extract SDoH from clinical narratives.
Prior research has developed NLP systems to extract individual-level SDoH critical for public health studies, such as substance use [[46]], homelessness and housing insecurity [[47],[48]], employment status [[49]], and suicide attempt or ideation [[18],[50]]. Both rule-based and machine learning-based methods have been applied. However, these systems can only extract a single SDoH at a time, and there is a lack of comprehensive NLP systems to extract multiple common SDoH from clinical narratives. Recent studies have developed clinical corpora with multiple common SDoH categories and applied more advanced deep learning-based NLP models. Feller et al. [[51]] developed a corpus of five SDoH categories and approached SDoH detection as a classification task using machine learning models. Lybarger et al. [[52]] developed a corpus containing 12 SDoH categories using clinical notes from the Medical Information Mart for Intensive Care (MIMIC)-III dataset and an existing dataset from the University of Washington and Harborview Medical Center. Han et al. [[53]] developed a corpus of 13 SDoH categories using MIMIC-III and approached SDoH detection as a classification task but used deep learning models. Similarly, Stemerman et al. [[54]] applied machine learning methods to detect 6 categories of SDoH through classification tasks. Yu et al. [[19]] developed a corpus of 19 SDoH categories using clinical notes from cancer patients at University of Florida Health and applied state-of-the-art transformer-based methods for SDoH extraction. [Table 2] summarizes the detailed SDoH categories and data sources used in these studies. More recently, the well-known 2022 n2c2 NLP challenge organized an open challenge with a shared task focusing on SDoH [[55]].
Table 2 SDoH category and data sources in recently published NLP systems for SDoH extraction from clinical notes.
While NLP methods based on the transformer models have shown promising results in extracting SDoH captured in clinical narratives, challenges remain. First, there is not an off-the-shelf comprehensive package for SDoH extraction, and the adoption of an NLP pipeline trained on one corpus often requires extensive fine-tuning when applied on a different corpus or at another institution. The accuracy of NLP methods depends on several factors, including the quality and consistency of the data, the choice of the NLP models, and the development and training of the NLP algorithms. Furthermore, the complex and nuanced nature of SDoH information, as well as the challenges in standardizing free text, can make it difficult to extract and accurately categorize this information. Despite these challenges, NLP methods have the potential to greatly improve the analysis and utilization of SDoH information recorded in clinical narratives.
Second, the documentation of certain SDoH categories is poor in clinical notes. NLP is only useful when the SDoH are prevalent in clinical narratives. Yu et al. [[56]] reported that in the training corpus of 640 clinical notes from cancer patients, only 19 out of the 38 SDoH categories (based on a review of SDoH definitions from WHO, Healthy People 2030, and CDC) were observed. When the authors applied the trained NLP pipeline on a corpus of breast (n=7,971), lung (n=11,804), and colorectal cancer (n=6,240) patients from the University of Florida Health, among the 19 SDoH categories, 10 had an extraction rate of over 70%, including gender, race, tobacco use, alcohol use, drug use, education, living supply, marital status, occupation, and sexual activity. The other 9 categories had a fairly low extraction rate, including abuse (physical and mental), ethnicity, financial constraint, language, living condition, physical activity, social cohesion, transportation, and ICD-10 Z codes of SDoH.
2.3 Identification of Contextual SDoH through Novel Data Linkage
Contextual SDoH are increasingly recognized as playing critical roles in not only population health but also disparities and structural inequities [[57],[58]]. In environmental epidemiology, the exposome concept was coined to draw attention to a more comprehensive assessment of environmental exposures [[59]], where the internal exposome refers to “exposures that impact the internal environment of the body” such as metabolic factors and microbiota, while the external exposome refers to the “social, cultural and ecological contexts in which the person lives their life” such as climate factors and social capital, as well as “the specific external agents to which one is exposed” such as specific contaminants, poor diet and lack of exercise [[20]]. In the United States, these external exposome data (or contextual-level SDoH) can be obtained from numerous publicly accessible data sources such as the American Community Survey (ACS) [[60]], County Health Rankings (CHR) [[61]], and Food Environment Atlas (FEA) [[62]]. Researchers have constructed comprehensive external exposome databases that include multiple domains of contextual-level SDoH. For example, Hu et al. have integrated external exposome data from multiple well-validated sources into a comprehensive set of variables of different spatial and temporal resolutions [[63]]. These contextual-level SDoH can be spatiotemporally linked using residential histories documented in EHR data to study a wide range of population health issues. Previous studies have documented that contextual SDoH have significant associations with health care access and various health outcomes. These associations can be uncovered by analyzing EHR data linked with exposome data [[63],[64]], as well as the Exposome-Wide Association Study (ExWAS) approach (similar to the concept of Genome-Wide Association Study (GWAS) analyses), which enables to systematically screen the associations between thousands of contextual SDoH/environmental exposures and health outcomes based on an agnostic, untargeted, and hypothesis-generating approach [[65]]. A recently published Social and Environmental Determinants Address Enhancement toolkit (SEnDAE) [[66]] includes optional components for geocoding addresses that can extend the OMOP common data model. As OMOP operates on a global scale, this and similar initiatives should be seen as an essential step in the process of internationalizing the digitization of the SDoH.
Nevertheless, challenges remain in using contextual SDoH data. External exposome data sources are heterogenous and lack semantic standards [[20]]. Such heterogeneity also leads to methodological challenges with data engineering (e.g., data source identification, variable selection, and data harmonization), spatiotemporal linkage (e.g., geocoding of patient addresses and spatiotemporal aggregation), and analyses and interpretation (e.g., ExWAS, prediction, and causation [[67]]). A very important caveat of ExWASs is the well-known idiom “associations are not causations” considering the ecological fallacy. Further, the old “so what” question still exists: what do we do with these statistically important contextual SDoH (even if causality was established)?
3 Semantic Standards of SDoH via Ontologies
3 Semantic Standards of SDoH via Ontologies
Standardization of the measurement and management of SDoH for individuals, households, and communities as well as linking SDoH information to EHR data, is of utmost importance. Ontologies, usually defined as formal representations of a specific domain, can facilitate semantic interoperability across systems with formal definitions of concepts and their relationships. A few ontologies or terminologies exist that cover certain aspects of SDoH. For example, on the individual-level, the Ontology of Medically Related Social Entities (OMRSE) focuses on health-related social roles [[68]], while the Semantic Mining of Activity, Social, and Health (SMASH) data system ontology focuses on the interrelations of health, social activities, and daily physical activities [[69]]. Additionally, there are ontologies on the contextual-level such as the Environment Ontology (ENVO) [[70]], the Human Health Exposure Analysis Resource (HHEAR) ontology [[71]], the Child Health Exposure Analysis Resource (CHEAR) ontology [[72]], and the Environment Conditions, Treatments, and Exposures Ontology (ECTO) [[73]]. For ontologies related to contextual SDoH (or external exposome data), a previous review and assessment of existing semantic standards for external exposome data have detailed the current landscape, challenges, and future opportunities [[20]]. Despite the availability of ontologies that cover certain aspects of SDoH, they do not provide a comprehensive representation of SDoH, nor were they designed with the intention to link SDoH information to EHR data, among a number of other limitations [[20]].
In the recent two years, Rousseau et al. [[74]] developed an ontology-driven information model to integrate SDoH data with the EHRs for pediatric asthma. To achieve this, they identified a list of important environmental measures for pediatric asthma and then assessed existing SDoH frameworks, assessment tools, and terminologies to identify representative data standards for these measures. They found that even though there are LOINC and SNOMED CT concepts relevant to indoor and outdoor air quality measures, these terminologies do not align well with environmental exposure measurements and the concepts in these terminologies often lack the specificity with regards to the data elements from the air quality measurements and questionnaire. Kollapally et al. [[75]] prototyped the Social Determinant of Health Ontology (SOHO) aiming to cover terms related to negative societal phenomena that affect clinical outcomes. After a manual review of relevant publications, Healthy People 2030, and County Ranking models, the prototype of SOHO was developed with 189 classes among which 40% are covered by SNOMED CT, ICD-10-CM, or National Cancer Institute (NCI) Thesaurus with inconsistent coverage. SOHO only has IS-A relations and may not have the desired level of granularity for all the SDoH applications. In a more recent work, Dang et al. [[21]] developed a more comprehensive SDoH ontology called SDoHO whose category and topics were defined by incorporating mainstream sources including WHO, CDC, Healthy People 2020 & 2030, Kaiser Family Foundation, and NAM. Among others, SDoHO is a more formally defined ontology with 706 classes, 105 object properties, and 20 data properties, with 1,542 logical axioms and 966 declaration axioms. Their top-level classes include elements relevant to behavior and lifestyle, social and community context, health care, economic stability, neighborhood, food, and measures/indices/scores. SDoHO is aligned with standard terminologies including LOINC, SNOMED CT, and broadly the UMLS. [Table 3] summarizes the recent SDoH ontologies.
Table 3 Recent SDoH ontologies
The challenges of developing and adopting SDoH ontologies for public and population health applications are multi-faceted. First, there is a lack of consensus on the information models and dimensions for SDoH. Leading organizations such as the WHO, NAM, CDC, Healthy People 2020 & 2030, all have developed their own frameworks for SDoH. It is challenging to consolidate the concepts and measures in these frameworks and define relationships between SDoH concepts. Second, even though existing standard terminologies such as ICD-10, SNOMED CT, and LOINC have added certain concepts for SDoH, their coverage is limited, the actual use of these codes in EHR is low [[17]], and the semantic alignment between these coding schemes are challenging [[76]]. Researchers often reported a lack of granularity in these terminologies. Small application ontologies that cover certain narrow aspects of SDoH or focus on a specific clinical domain continue to exist, and the categorization of the SDoH factors is not uniform across studies. Related to public and population health, there is a lack of alignment between exposome measures (such as air quality and water quality measures) and the representation of these measures in standard terminologies.
So far, we have not found studies that demonstrate the use of ontologies for linking SDoH to EHR data. To make effective use of SDoH ontologies for public health and epidemiology, a few questions await to be answered:
-
What level of formalism is required for SDoH ontologies?
-
How to use ontologies to standardize the measures of SDoH information?
-
How granular should these ontologies be?
-
How to effectively use ontologies to standardize SDoH data that can be integrated with EHR data at the patient level, neighborhood level, and regional level to model factors that impacts on health such as disease burden?
-
How should SDoH ontologies be integrated with other ontologies to facilitate downstream use cases? For example, many SDoH-targeted interventions are closely related to behavioral changes, and lead to the needs to be linked to ontologies such as the Behavior Change Intervention Ontology (BCIO) [[77]] to guide intervention development.
In addition, besides recommending SDoH data elements, regulatory agencies should also recommend ontologies and provide a guideline on the standardization and integration of SDoH information.
4 Applications of SDoH
4.1 Incorporating SDoH in Disease Screening and Social Risk Prediction
To help guide precision prevention and treatment of diseases, it is critical to consider social risks and incorporate SDoH when developing disease screening and prediction models. Although there is overlap, individual-level and contextual-level SDoH approaches for assessing patient social risks are not equivalent [[48]] and it is important to consider both individual-level and contextual-level SDoH when developing prediction models. In fact, there have been some models of predicting social risks published, such as the polysocial risk score [[78]], polyexposure risk score [[79]], and polyexposomic risk score [[80]]. Taking the polysocial risk score as an example, it could help predict individual-level social risk of a disease or a particular health outcome with different combinations of social conditions without knowing the precise contribution of each social factor [[78]]. The social factors considered by polysocial risk scores include both individual SDoH factors such as income, education, religion, sex, race-ethnicity as well as contextual social, community, and physical environmental factors such as quality of housing, local crime level, and air and water quality. Although not explicitly stated as an approach for developing the polysocial risk score, Guo et al. examined both individual-level (extracted via NLP over clinical notes) and contextual-level (via spatiotemporal linkage) SDoH linked with EHRs and found novel SDoH associated with lower initiation of cardioprotective drugs in patients with type 2 diabetes (T2D) and varying effect across racial groups [[81]]. The polyexposure risk score, which combines multiple correlated nongenetic exposure and lifestyle factors, has been shown to provide modest incremental prediction accuracy of predicting T2D over established clinical risk factors [[79]]. The polyexposomic risk score was initially developed for hypertensive disorders of pregnancy using external exposome-wide data consisting of 5,510 factors characterizing women's surrounding natural, built, and social environment during pregnancy [[80]]. The study found that neighborhood socioeconomic status, housing characteristics, meteorology factors, and air pollutants are predictive of hypertensive disorders of pregnancy.
SDoH data may also play a critical role in developing predictive models for critical public health crises such as opioid use crisis. Gao et al. found that Medicaid enrollees with a documented SDoH vulnerability had 26% higher odds of having an opioid use disorder than those without. However, the authors noted a high level of SDoH missingness in their data, suggesting that more consistent and thorough SDoH documentation may have major implications for such predictive models in the future [[82]]. In their study of factors leading to non-fatal overdose leading to intensive care unit admission, Mitra et al. addressed this missingness with NLP of clinical notes to fill gaps left by structured SDoH documentation and ultimately captured >99% of their SDoH variables from the free text [[83]].
4.2 Development of SDoH-related Interventions
To develop effective public health interventions that target population at higher risk for certain health problems and improve health equity, it is critical to identify effective social risk management strategies, particularly for marginalized groups. Advances in artificial intelligence combined with the increasing availability of RWD offer a unique opportunity to develop innovative approaches that improve both health outcomes and health equity by addressing SDoH. However, key data and methodologic barriers exist, some of which are extensively discussed above, such as the fact that RWD are not well-integrated with either contextual or individual-level SDoH data although factors from both levels are associated with T2D, with complex interplay among them.
Furthermore, from a modeling methodology perspective, although associations of multiple SDoH with health care and health outcomes are well documented [[6]], predictors may not be causally associated with outcomes. Therefore, there are critical gaps in understanding who may benefit from a given SDoH-targeted intervention (e.g., food pharmacy, transportation support for medical needs [[84]]). Machine learning (ML) has led to success in various RWD analysis tasks. However, RWD are observational in nature; thus, the causal inference framework needs to be incorporated with ML approaches (i.e., causal-principled ML models such as causal forest) to account for inherent biases (e.g., confounding and selection biases) when providing cause-and-effect estimates of potential SDoH interventions in RWD [[85]]. For example, Tang et al. used a causal ML method (i.e., doubly robust learning) to estimate the conditional average treatment effects and found a heterogenous effect of SDoH on the risk of dementia [[86]]. Through causal-principled ML models, researchers can fill critical gaps in the causal effects of key actionable SDoH on healthcare and outcomes. Establishing causal effects of individual SDoH on the health outcomes of interest is critical as clinical practices are built on causality (e.g., via randomized controlled trials), so that we know exactly the potential benefits and harms of prescribing an intervention, regardless of whether it is a medical treatment or a SDoH intervention.
Nevertheless, knowing the causal effects of SDoH on health is not sufficient, as there are a number of other challenges beyond the data and methods, such as the lack of a social risk management tool in EHRs that can leverage existing rich data sources and consider the totality of both contextual and individual-level SDoH, to semi-automatically identify individuals at high social risk (e.g., social risk screening via polysocial risk scores) while limiting documentation burden. The field also lacks tools that can not only provide critical decision support information (e.g., prioritized key actionable SDoH and causal effect estimates), but also guide the next steps to address individual patients' unmet social needs (e.g., referral to community-based organizations for specific SDoH identified). From the informatics' perspective, the usability (ease of use), acceptability (perceived usefulness), and how such tools are integrated in existing clinical workflows (considering the limited time providers already have with each patient) are critical. Addressing SDoH and unmet social needs is not necessarily a job of clinicians or even nurses, but an effort of the community with multiple stakeholders ranging from patients and caregivers to providers and health systems, to community organizations and government agencies. Informaticians play a critical role in providing novel technologies to support these activities ranging from data integration of heterogenous sources to modeling with causal-principled methods to tool development via user-centered design considering human factors to EHR integration and implementation science via a learning health system framework.
5 Conclusions and Future Directions
5 Conclusions and Future Directions
SDoH factors affect people's health at the individual, family, community, and society levels. There is an increasing interest in examining the role of SDoH in public and population health, as well as health disparities using RWD. In this opinion review, we summarized data resources and recent informatics approaches to screen and harmonize different levels of SDoH from heterogeneous data sources and utilize SDoH with RWD. We also identified potential challenges and barriers to the low documenting rate of SDoH in EHR systems [[87]], including lack of integration into clinical workflows, lack of incentives for SDoH data collection, and lack of training and tools for clinicians to derive actionable insights for decision making. The informatics community has made strides in developing NLP methods to extract SDoH from clinical narratives, linking EHRs with public surveys and environmental data, creating SDoH ontologies for standardization, and developing SDoH-based social risk scores. To better leverage SDoH, future work should establish incentives, policies [[88]], quality measures, and training [[42]] to improve the collection and use of SDoH. Technical solutions such as social risk management tools should follow user-center design and be integrated into real-world clinical workflows to identify social risks and address unmet social needs. Note that even though the majority of recent studies on the methods and applications about linking EHRs with SDoH to improve public health were conducted in the United States, there are approaches for collecting SDoH data in the global context. A recent paper by Cossio [[89]] reviewed different approaches for digitally collecting or predicting SDoH such as quality of public transportation (Lisbon [[90]], Brazilian cities [[91]]), air quality (a city in Turkey [[92]]), and education (Sweden [[93]]). We believe that integrating SDoH into health care can improve public health, reduce healthcare disparities, and help inform public policies for effective interventions.