CC BY-NC-ND 4.0 · Yearb Med Inform 2020; 29(01): 226-230
DOI: 10.1055/s-0040-1701989
Section 11: Public Health and Epidemiology Informatics
Survey
Georg Thieme Verlag KG Stuttgart

Precision, Equity, and Public Health and Epidemiology Informatics – A Scoping Review

David L. Buckeridge
1   McGill University, Montreal, Canada
› Author Affiliations
Further Information

Correspondence to

David L Buckeridge, MD PhD
Professor, School of Population and Global Health, McGill University
Canada Research Chair, Health Informatics and Data Science, Montreal, Quebec
Canada   
Phone: +1 (514) 398 8355   

Publication History

Publication Date:
21 August 2020 (online)

 

Summary

Objectives: This scoping review synthesizes the recent literature on precision public health and the influence of predictive models on health equity with the intent to highlight central concepts for each topic and identify research opportunities for the biomedical informatics community.

Methods: Searches were conducted using PubMed for publications between 2017-01-01 and 2019-12-31.

Results: Precision public health is defined as the use of data and evidence to tailor interventions to the characteristics of a single population. It differs from precision medicine in terms of its focus on populations and the limited role of human genomics. High-resolution spatial analysis in a global health context and application of genomics to infectious organisms are areas of progress. Opportunities for informatics research include (i) the development of frameworks for measuring non-clinical concepts, such as social position, (ii) the development of methods for learning from similar populations, and (iii) the evaluation of precision public health implementations. Just as the effects of interventions can differ across populations, predictive models can perform systematically differently across subpopulations due to information bias, sampling bias, random error, and the choice of the output. Algorithm developers, professional societies, and governments can take steps to prevent and mitigate these biases. However, even if the steps to avoid bias are clear in theory, they can be very challenging to accomplish in practice.

Conclusions: Both precision public health and predictive modelling require careful consideration in how subpopulations are defined and access to data on subpopulations can be challenging. While the theory for both topics has advanced considerably, there is much work to be done in understanding how to implement and evaluate these approaches in practice.


#

Introduction

Precision public health and the influence of predictive models on health equity are two topics that have received considerable attention recently. Common drivers for both topics include the increasing amount of data available and advances in statistical and machine learning methods. This scoping review synthesizes the recent literature on these two topics with the intent to highlight central concepts for each topic and identify research opportunities for the biomedical informatics community.


#

Methods

Searches were conducted using PubMed with a single query for each topic and publication dates between 2017-01-01 and 2019-12-31, the date of the final search. Each title, and the abstract if necessary, was reviewed to determine if the article addressed the topic. More specifically, articles were sought that explicitly considered and commented on the topic. Potentially relevant articles were retrieved, and references reviewed to identify other relevant articles. The search for “precision public health” returned 77 articles and 20 were retained. There were 4 articles matching this query prior to Jan 1, 2017. The search for “equity and (algorithm or prediction)” returned 156 articles and after review 15 were retained. There were 204 articles matching this query prior to Jan 1, 2017.


#

Results

Precision Public Health

The most generally accepted definition of precision public health (PPH) is the use of data and evidence to tailor interventions to the characteristics of a single population [1]. Achieving ‘precision’ requires high-resolution surveillance data to set priorities that are tailored to a specific population and the ability to select an evidence-based public health intervention best matched to the characteristics of a population [2] [3]. In this section, the author reviews recent publications about PPH to characterize the issue, to examine how the concept has been applied, and to identify opportunities for informatics research.

Although precision medicine [4] laid the foundation for precision public health, there are two important distinctions between ‘precision’ in medicine and in public health. One fundamental difference is the unit of interest, namely a population as compared to a single patient [5]. Although some have argued that the aggregate effects of precision medicine can improve population health one patient at a time [6], there is broad support in the public health community for considering populations explicitly [7] so that equity across sub-populations can be assessed, and inequities addressed. The second distinction is that the role of genomic information is currently much more limited in PPH. However, the concept of PPH has been applied to population health genomics [8], for example pharmacogenomics[9] [10], and some have proposed that polygenic risk scores may have application within a PPH context [11].

While a consensus appears to be emerging around the concept of PPH, the topic is not without controversy, which has led some junior investigators to argue for the importance of continuing research in this area [12]. Most notably, some have suggested that PPH could divert attention away from the broader determinants of health towards clinical concepts, which tend to be measured with greater accuracy [13]. For example, the social determinants of health, such as social status, are generally not measured well, so they may play a limited role in characterizing populations and identifying optimal interventions [7] [14]. Concerns have also been raised about attempts to use genomic data in PPH, due to the limited availability of such data for many sub-populations and the usually small effect size at a population scale compared to other determinants of health [7]. Even if genomic data are not used, some have argued that PPH is simply a new term for what public health has always done [1] [15], although others have countered that PPH highlights the role of ‘Big Data’ and computational methods in targeting public health interventions to improve population health [3].

Research in PPH has tended to address the ‘diagnostic’ (i.e., measurement and priority setting) more so than the ‘therapeutic’ (i.e., selection and management of evidence-based public health interventions) aspects of PPH. In terms of measurement, advances in methods such as spatial statistics have allowed highly accurate sub-regional mapping of population characteristics in global health research. For example, one method estimates the resolution of 5 km by 5 km cells for the African continent, HIV prevalence [16] and exclusive breastfeeding until 6 months of age [17]. In another study, the authors created similar high-resolution estimates of educational attainment across low and middle-income countries [18]. Such high-resolution measurement of important health indicators facilitates consideration of intervention strategies with greater precision than was previously possible [19] [20]. This application of PPH in a global health context has been called ‘precision global health’ [21], and it draws on a range of technologies and methods to improve measurement in global health [20] [21].

From a therapeutic or public health intervention perspective, some of the most promising applications of PPH have been in infectious disease control. In this context, which some have called ‘precision epidemiology’ [22], the genomics of the infectious organism plays a central role [22] [23] [24]. Through the identification of transmission networks [25] and optimal antimicrobial therapy [26] rapid sequencing and analysis of the genomes of infectious organisms can be used to identify the best strategies for preventing transmission and treating disease. However, some have noted that genomic data on organisms is rarely sufficient to understand the mechanisms of disease transmission, and that data for ‘deep phenotyping’, or describing precisely the characteristics of disease, are also needed [22].

While the concept of PPH is gaining traction and initial research results in global health and infectious diseases are promising, there remain many opportunities for informatics research in this area. From a measurement or diagnostic perspective, widely adopted frameworks for measuring non-clinical concepts, such as social position [14] are needed to consistently classify populations for precisely identifying interventions [27]. These advances in measurement, especially through the use of new and large data sources, have some overlap with the concept of digital epidemiology [28]. In terms of therapy or the identification of interventions that best matched with a population, most efforts in PPH have so far relied on experts to interpret the data and identify interventions. However, as in precision medicine, there is considerable opportunity to develop methods for learning from similar populations [29] and for using causal reasoning to integrate evidence from different sources to estimate the effect of an intervention in a specific population [30]. For example, if public health agencies were to systematically record and share information about implemented interventions along with characteristics of populations and effects of interventions, it would create a foundation for a “learning public health system”. Finally, there are many research opportunities in the implementation and evaluation of PPH strategies, although some have noted that the digitization of public health practice is a prerequisite for implementation of PPH [31].


#

Prediction and Equity

The ethical and equitable distribution of healthcare resources and health outcomes, a focus of this yearbook, has long been an explicit goal of modern health systems and is central to global sustainable development goals [32]. In the context of the recent renaissance of machine learning [33], and particularly deep neural networks and reinforcement learning, the potential for prediction models in clinical medicine and public health to exacerbate health inequities is increasingly recognized [34] [35]. In this section, we review recent publications on this topic to characterize the issue, examine how model biases may have inequitable effects, and identify what can be done to prevent and mitigate these biases.

At a high level, prediction models in healthcare, whether statistical or machine-learning in nature, take inputs in the form of patient data and produce an output, usually in the form of a probability or a predicted class. From a population or public health perspective, if the validity of the outputs differs systematically across subpopulations then the use of the model to guide decisions in practice can exacerbate health inequalities. For example, in a justice context, a model predicting recidivism to guide decisions about granting parole, could increase sex-based inequalities if it systematically overpredicts recidivism in women [36]. In a healthcare context, a model that systematically underpredicts the resources needed by black patients, could increase racial inequalities if it is used to direct proportionally more resources to white patients [37].

Models can perform systematically differently across subpopulations due to information bias, sampling bias, random error, and the choice of the output [34] [38] [39] [40]. Information bias can occur where the quality or amount of data differs systematically between subpopulations. For example, people from areas with lower socioeconomic status tend to visit more clinicians and have fewer tests ordered, which could produce systematic differences in data held within electronic medical records [38] [41]. Sampling bias can arise when the proportion sampled differs systematically across subpopulations. For example, an algorithm trained to predict depression from language used on Facebook [42] may not work well when applied to text from teenagers, who are less likely to use that platform [43]. However even if sampling is uniform across subpopulations, for subpopulations with fewer individuals, the number available for training a model may be too low to achieve an acceptable precision when making predictions [38]. The choice of the output for the model to predict can also be a source of bias if the output is not aligned with what the model is expected to accomplish [39]. For example, using healthcare cost as an output, as opposed to a composite of cost and health status, can reinforce existing racial inequalities in the allocation of healthcare resources [37] [44].

Algorithm developers, professional societies, and governments can take steps to prevent and mitigate the biases described. Many papers have been published suggesting steps that model developers can take to address biases that may lead to inequalities [34] [38] [39] [40] [45]. Suggested actions have been identified at the stages of model conception, model training and testing, and deployment and monitoring [34]. Mistaking the objective [39] is a potential problem at the conception stage that may be prevented by consulting with diverse groups, considering the ethical implications of using the model, and ensuring that its outputs are aligned with its intended use in the health system[34] [38]. In model training, authors have identified pitfalls [39] and challenges [45], with suggestions to build and test algorithms in diverse socioeconomic health systems [38] and to measure important metrics [46] and allocation across subpopulations [34]. Once implemented and in routine use, it is possible to monitor the outputs and associated outcomes using automated alerts and through feedback from patients and clinicians [34].

Although the steps to avoid biases may be clear in theory, some have noted that taking these steps in practice can be very challenging [47]. In particular, access to data across diverse settings has been noted as a challenge [48]. A potential approach, proposed in New Zealand, is to develop a national data resource that model developers can use to generate predictive models of cardiovascular disease [49]. Recommendations to avoid biases in machine learning were also made in a report on artificial intelligence in medicine from the National Academies in the US [50] and a similar report commissioned by the National Health System in the UK [51].


#
#

Discussion

The definition of subpopulations or population strata is central to both of the topics explored in this survey. In precision public health, a subpopulation must be identified in order to characterize the needs and identify the interventions that best matched those needs. In assessing the potential impact of a model on health equity, subpopulations must also be identified, so that the distribution of model outcomes across subpopulations can be assessed. In other words, PPH tends to focus within a subpopulation to optimize interventions for that subpopulation while assessment of equity tends to look across subpopulations to fairly distribute resources across subpopulations. The approach to defining subpopulations tends to differ, however, with subpopulations generally defined spatially based on geographical boundaries in PPH, especially in global health settings. In contrast, for prediction models, subpopulations are usually defined by individual characteristics, such as sex and ethnicity. Interestingly, although the definition of subpopulations is central to both topics, there appears to be little explicit consideration in the literature of how subpopulations should be defined. In both cases though, the most important characteristics of subpopulations (e.g., sex, social status) are those that may modify or influence the inference or prediction. If both PPH and prediction modeling are to be applied in an equitable and effective manner, then renewed attention must be given to ensuring that data are available to measure and model the most important characteristics of subpopulations.

Unfortunately, accessing data to characterize subpopulations is a challenge central to both topics. The measurement aspect of PPH and the assessment of equity in prediction models both tend to rely on the secondary analysis of data originally collected for other purposes. However, such data tend to not be uniformly available across subpopulations, which can be problematic, especially if the non-uniform coverage is not acknowledged explicitly. The implication for PPH is that the needs will be assessed with greater precision in some subpopulations, which may lead to more effective interventions in those populations, potentially increasing inequalities. A similar situation also exists for the therapeutic aspect of PPH. If interventions are less likely to be evaluated in some subpopulations, then the evidence about interventions in those subpopulations will be limited, making it difficult to identify optimal interventions. For prediction models, as discussed above, non-uniform data across subpopulations can result in differing model performance across sub-populations, which can exacerbate inequalities.

In addition to improvements in data, advances in training are necessary if the potential benefits of PPH and prediction modeling are to be realized. Both topics, like biomedical informatics, are at the intersection of multiple disciplines and draw on a range of methods. While trainees in some programs in biomedical informatics may be exposed to aspects of prediction modeling and PPH, these topics may not be addressed directly. In other fields, such as epidemiology, biostatistics, and computer science, training tends to address some, but not all, of the underlying methods needed to successfully develop, implement, and evaluate PPH approaches and prediction models. Education programs in biomedical informatics and related disciplines could benefit from a more direct and wholistic consideration of both PPH and prediction modeling as examples of how multiple methods and perspectives are relevant to advancing public health.

Finally, both topics examined in this survey are somewhat abstract, in that they define overarching frameworks which are meant to be helpful in advancing public health, including health equity. However, the practical implementation of these frameworks can be challenging. In PPH, there has been considerably more focus on increasing precision in measurement than on how to use this improved precision to identify optimal interventions for subpopulations. Similarly, for mitigating biases in prediction models that may exacerbate inequities, the strategies proposed, such as building and testing algorithms in diverse socioeconomic health systems, are likely to be challenging in practice.


#

Conclusion

The theory underlying precision public health and the prevention and mitigation of biases in prediction models to advance health equity have advanced considerably. Driven by the increasing availability of data and advances in statistics and machine learning, researchers and practitioners are increasingly applying and evaluating these frameworks. However, there remains much work to be done to understand how to implement and evaluate these concepts in practice. Most notably, there is a need to clarify how subpopulations are defined, to ensure that data are available to measure important characteristics of subpopulations, and to adequately train researchers and practitioners in these frameworks and the underlying methods upon which they depend.


#
#

Correspondence to

David L Buckeridge, MD PhD
Professor, School of Population and Global Health, McGill University
Canada Research Chair, Health Informatics and Data Science, Montreal, Quebec
Canada   
Phone: +1 (514) 398 8355