Keywords
Public health - epidemiology - surveillance - medical informatics - International Medical Informatics Association - artificial intelligence
Introduction
As compared to 2017 literature analyzed in the Public Health and Epidemiology Informatics section of the International Medical Informatics Association (IMIA) Yearbook [1], in addition to Precision Public Health or Digital epidemiology, a new term has appeared in 2018: infodemiology and infoveillance [2]
[3]
[4]. A large number of the papers published in Public Health informatics is about the epidemiological surveillance based on the new data generated in the current digital era. The papers include the analysis of the massive data from social media (leading to a so-called social sensor) or electronic health records (EHRs). The availability of this data has led to new opportunities to perform passive surveillance. However, this data requires organization to allow an architecture that makes it valuable.
The use of web-based data requires Natural Language Processing (NLP) approaches to extract the information. Electronic health records may also benefit from NLP but the key element is most often the integration of a large volume of structured and unstructured clinical data in a data warehouse. Several solutions are now widely used to construct a data warehouse such as i2b2 [5] or Labkey [6]. Once the architecture for collecting data is ready, signals may be detected by machine learning approaches from standard statistical methods to neural networks. At this stage, the work is far from finished. Actually, a crucial step is the evaluation of the system proposed to perform surveillance. This evaluation is not straightforward as it requests a good reference (i.e., a gold standard) which is rarely perfect and multi-sources of information are often used. Furthermore, the algorithm is most often dynamical because the system learns with the new real-time data collected, which may require repeated evaluation. Hence, as underlined in a systematic review [7], the transfer from research to practice is not obvious, especially because of the challenges underlined above. Public health surveillance is therefore a natural application for artificial intelligence techniques but more work remains to be done by epidemiological scientists to evaluate digital surveillance systems.
Paper Selection
A comprehensive literature search was performed using two bibliographic databases, Pubmed/Medline (from NCBI, National Center for Biotechnology Information), and Web of Science¯ (from Thomson Reuters). The search was targeted at public health and epidemiology papers that involve computer science or the massive amount of web-generated data. References addressing topics of other sections of the Yearbook, such as those related to interoperability between data providers were excluded from our search. The study was performed at the beginning of January 2019, and the search over the year 2018 returned a total of 805 references.
Articles were separately reviewed by the two section editors, and were first classified into three categories: keep, discard, or leave pending. Then, the two lists of references were merged, yielding 74 references that were retained by at least one reviewer or classified as “pending” by both of them. The two section editors jointly reviewed the 74 references and drafted an agreed upon list of15 candidate best papers. All pre-selected 15 papers were then peer-reviewed by both section editors and external reviewers (at least four reviewers per paper). Three papers [8]
[9]
[10] were finally selected as best papers (see [Table 1]). A content summary of these selected papers can be found in the appendix of this synopsis. The whole selection process has been described by Lamy et al. [11].
Table 1
Best paper selection of articles for the IMIA Yearbook of Medical Informatics 2019 in the section ‘Public Health and Epidemiology Informatics’. The articles are listed in alphabetical order of the first author’s surname.
Section
Public Health and Epidemiology Informatics
|
▪ Arsevska E, Valentin S, Rabatel J, de Goër de Hervé J, Falala S, Lancelot R, Roche M. Web monitoring of emerging animal infectious diseases integrated in the French Animal Health Epidemic Intelligence System. PLoS One 2018 Aug 3;13(8):e0199960.
|
▪ Effland T, Lawson A, Balter S, Devinney K, Reddy V, Waechter H, Gravano L, Hsu D. Discovering foodborne illness in online restaurant reviews. J Am Med Inform Assoc 2018 Dec 1;25(1 2):1586-92.
|
▪ Wakamiya S, Kawai Y, Aramaki E. Twitter-Based Influenza Detection After Flu Peak via Tweets With Indirect Information: Text Mining Study. JMIR Public Health Surveill 2018 Sep 25;4(3):e65.
|
Outlook and Conclusion
As in 2017, papers published in 2018 and selected by the review were mainly on public health surveillance using EHRs and social media, mainly Twitter. Hospital databases are clearly a source for surveillance which is increasingly considered [12]
[13]
[14]. A good review [15] shows pilot studies of public health surveillance of chronic diseases and risk factors performed in several states of the United States. The topics were type 2 diabetes (based on hemoglobin A1c), pediatric asthma, amyotrophic lateral sclerosis, obesity, and smoking. All these studies constituted a valuable proof of concept but the challenges remain in the definition of the algorithms and their standardization across states and countries. Several papers were about surveillance of seasonal influenza using hospital databases [12], [13], [16]. When several sources were evaluated, it appeared that coupling standard surveillance systems with EHRs constituted the best approach at least for the surveillance of influenza in the United States [16]. It was better than influenza-related search engine and Twitter flu activity social media data [16]. In France, even the single EHR source gave results closer to the reference surveillance system (“Sentinelles” network) than Google data at the national and regional scale [12].
It is interesting to see the spectrum of the outcomes that can be followed using hospital databases. Other published applications of the use of EHRs were the surveillance of antibiotic consumption [17] or hospital acquired infections [14]. Surveillance systems are also proposed to alert health care professionals on a real time basis. The impact of these surveillance techniques should be evaluated in various dimensions. How much is it informative? How valid is the alert? How does it change the practices of healthcare professionals? How does it improve the patient’s condition? Hence, the evaluation of a system used in an emergency department did not show any significant impact on the final clinical outcome: the incidence of death [18].
Social media constitutes a source of information for the surveillance of various public health outcomes on a real-time basis such as influenza [9], but also foodborne illnesses [10], or heat alerts [19], which may allow investigation or action in case of alerts, for instance in the context of mass gathering setting [19]. The exploitation of these data needs specific approaches to define the outcomes of interest according to the information available [9], [10] and to extract the signal using various machine learning techniques [12]. Then, the approaches are evaluated by comparison with reference surveillance systems that are often weak gold-standards. The metrics used for these comparisons are usually correlation coefficients [4], [9], [12], [13]. Other metrics such as sensitivity, specificity or accuracy, precision, and recall could be advantageously used in this context to provide a better evaluation of the validity of the surveillance tools [8].
In conclusion, although fantastic opportunities are expected from these new information sources, a lot of work should be done to exploit and validate them.