1 Introduction
The emergence of the global pandemic of coronavirus COVID-19 dominated much of the
health informatics and medical research landscape during 2020. Hence it is appropriate
that this end-of-year review of recent developments in medical knowledge management
focuses on the pandemic.
The pandemic has highlighted the clear need for informatics to support management
and synthesis of health information at the global scale and pace in the face of a
rapidly spreading infection. However, it has also highlighted the presence of severe
limitations in our ability to share, integrate, and analyse data at this scale.
To address those limitations, we propose that the model of a “global learning health
system” (gLHS) can be deployed. The concept is of a learning health system (LHS) [[1]
[2]], expanded to a global scale but with the singular focus on the viral disease. Indeed,
the international effort to quickly gather and share knowledge for clinical diagnosis,
management, and treatment of COVID-19 can be seen as an exemplar of a gLHS, albeit
not yet fully realised or effective.
The key elements of a LHS, including the core information cycles that characterise
it, were observed throughout the interactions of the global scientific community.
Information flowed from practice (what was being done on the ground to manage COVID-19
patients) to data (what was captured about those patients and their clinical characteristics
or response to interventions) to knowledge (about disease characteristics and trends,
what care approaches worked, and what did not, based on analysis and modeling of the
data) and rapid implementation for practice again. Furthermore, as required by a LHS,
information technology infrastructure played a critical role in enabling these information
flows.
In the wake of the previous SARS and Ebola virus epidemics, it was already argued
that unified frameworks supporting clinical and biological data integration were critical
to support evidence generation in a pandemic [[3]] and that information technology was needed for knowledge management [[4]]. Broader adoption of electronic health records has facilitated evidence generation,
including in observational studies and in traditional randomised controlled trials,
through their use to identify eligible patients, support data collection, and monitor
outcomes [[5]]. Electronic sharing of patient-level data from trials facilitates re-analysis of
outcomes and fosters reproducibility and trust in findings [[6]]. But, our public health and clinical responses during COVID-19 demanded much more
sophisticated strategies for rapid information synthesis and knowledge management
than was available, which we argue an effective gLHS would facilitate.
The rapid spread of COVID-19 internationally and its immediate impact on the global
economy has led to much more widespread appreciation of the need to coordinate pandemic
research, including substantially increased scientific globalism [[7]] – international research collaboration – and sharing of patient-level clinical
data [[6]]. However, despite a few shining examples of rapid deployment of multi-site clinical
trials [[8]
[9]
[10]
[11]], clinical research for COVID-19 has been highly fragmented [[12]], with a particular dearth of meaningful evidence in the area of non-drug interventions
with public policy impacts [[13]].
This survey will show that although there is still work to be done, the pandemic has
illustrated that many of the elements required for a global learning health system
are in place, critically including the human motivation to achieve it.
The framework of the gLHS that we present is not only a useful way of characterising
how knowledge evolved during the pandemic under the strong impetus to support knowledge-informed
clinical care and public health response to COVID-19, but also provides an architecture
for a system that we can invest in to ensure robust knowledge evolution for ongoing
global health needs into the future.
2 A Global Learning COVID-19 System
In this section, we introduce the framework for ‘learning’ that we will adopt in our
analysis of the evolution of knowledge during the first year of the COVID-19 crisis
(Section 2.1). We then present a model of the collective activity toward learning
about COVID-19, viewed as a global-scale learning health system (Section 2.2), and
illustrating the elements of the gLHS that emerged as observed from the literature.
2.1 Framework for Conceptualising Learning in a Crisis
To frame the evolution of knowledge in the COVID-19 crisis, we follow a recent proposal
by Tovstiga and Tovstiga (2020) to adopt a classical four-quadrant ‘conscious-competence’
conceptual framework from learning theory [[14]]. The model is presented in [Figure 1], illustrating the stages of a learning trajectory, from unconscious ignorance of
lack of knowledge, to deeply embedded knowledge.
-
Quadrant 1: Zone of uncertainty, including lack of clarity about a topic
-
Quadrant 2: Zone of learning, where the value of knowledge on the topic is recognised and sought.
Questions play a crucial role in this zone;
-
Quadrant 3: Zone of actionable knowledge, where learning is consolidated and integrated with
existing knowledge;
-
Quadrant 4: Zone of embedded understanding, enabling intuitive action. Knowledge in this zone
is often not fully recognised.
Fig. 1 The ‘conscious-competence’ matrix of learning, based on a model originally attributed
to Broadwell (1969) [[83]]; adapted from [[14]].
While typically applied in the context of an individual learner, the Tovstigas argue
that the framework effectively reflects the general knowledge evolution process in
the context of the COVID-19 crisis. They further suggest that it is useful for structuring
and understanding the learning trajectory with respect to the crisis, demonstrating
the various phases through an analysis of information about COVID-19 communicated
through news reports.
Their analysis identifies data analytics and scientific knowledge sharing as key drivers
of the learning trajectory in COVID-19, citing efforts such as the World Health Organization's
creation of a global COVID-19 clinical information platform based on a standard case
report form[1] requesting data on specific detailed clinical and demographic parameters on COVID-19
positive patients. This underscores the important role of data standards in supporting
the learning needed to manage the pandemic. These are also key elements of the gLHS
and highlights the relevance of the model.
2.2 Modelling the COVID-19 Knowledge Ecosystem Through the Learning Health System
Building on this framework for conceptualising learning, we propose that the core
model of a LHS [[1]
[2]] can be applied to characterise the rapid evolution of knowledge that has occurred
during the COVID-19 pandemic. In this model, data and analytics over clinical practice
data from patient care drives learning of new knowledge that can be implemented to
improve clinical practice, leading to continuous improvement. A number of critical
knowledge management and information technology elements can be identified as key
enablers of this learning process, supporting the significant human efforts that catalysed
and provided the appropriate socio-technical conditions for the learning cycle.
Our proposed model for the COVID-19 gLHS is shown in [Figure 2]. The impetus for the learning process in COVID-19 arose from a knowledge gap, the
gap between purposeful action grounded in knowledge (Q3) that exists for routine clinical
care and the uncertainty surrounding diagnosis and management of the novel virus in
affected patients (Q1). This then triggered a broad effort to gather data to fill
the gap, primarily taking advantage of data collected through electronic health record
systems. Data collection and integration, facilitated through electronic data sharing,
then enabled learning (Q2), actioned through observational analysis, clinical trials,
predictive modelling, and other research leveraging data, with the objective of turning
data into information. Publications summarising these studies served as the key vehicle
for sharing research results, but challenges in finding and interpreting papers –
particularly in the face of a flurry of research activity – resulted in remaining
uncertainty in the knowledge that existed (Q4). Translation of new knowledge into
practice (Q3) required evidence synthesis approaches such as systematic reviews, critically
involving searching (information retrieval), screening, appraisal, and meta-analysis
of research publications. Key conclusions were rapidly shared via actively maintained,
living guidelines [[15]] and platforms [[16]] or tools [[17]] for making available clinical decision support knowledge artifacts.
Fig. 2 Abstraction of the structure of the global learning system in place for COVID-19.
Knowledge management activities are overlaid onto the Learning Health System model
(core LHS cycle figure adapted from [[84]]), and related to the learning framework of ‘conscious-competence’ presented in
[Figure 1].
2.2.1 Electronic Data Capture and Data Sharing
Electronic health records (EHRs) are a key resource in the learning health system,
as they provide the data that is used to drive learning from practice. For COVID-19,
EHRs were analysed to characterise early cases of the infection in Wuhan, China [[18]
[19]], to provide important information related to the efficacy of symptom-based screening
[[20]], and to collect data on patients prospectively after enrolment in a trial [[21]]. In the UK, the OpenSafely Platform[2] was used to identify factors associated with COVID-19 deaths through analysis of
the primary care records of over 17 million patients [[22]], facilitated through the use of a single EHR system (TPP SystemOne) by general
practice surgeries covering approximately 40% of the UK population. Similarly, a study
of risk factors associated with death due to COVID-19 [[23]] was made possible by the use of a single integrated EHR system across many sites,
and the Quick COVID-19 Severity Index was developed with data from a single health
system with nine Emergency Departments [[24]]. A highly-cited study demonstrating lack of effectiveness of hydroxychloroquine
treatment was conducted using data extracted directly from the New York-Presbyterian
/ Columbia University Irving Medical Center EHR [[25]].
Leveraging distributed EHRs across national and international boundaries through collaborative
consortia and clinical data networks [[26]], several large-scale studies were undertaken to characterise COVID-19 patients
in relation to similar disease groups [[27]], to understand the trajectory of the disease [[28]], and to study interaction of the disease with patient medications [[29]]. Such studies were enabled through the adoption of common data models to harmonise
data, prominently the Observational Health Data Sciences and Informatics (OHDSI) Observational
Medical Outcomes Partnership (OMOP) Common Data Model (CDM) [[30]]; partners in the OHDSI network contribute statistical results based on federated
querying of local data sources represented with the CDM.
The Columbia Open Health Data for COVID-19 Research (COHD-COVID) data set [[31]] takes a different approach, also based on the OMOP CDM. They make publicly available
prevalence data for conditions, drugs, procedures and their co-occurrences, calculated
from their EHR, thereby sharing aggregate counts rather than patient-level data. Coupled
with slight perturbation and suppression of rare concepts, this eliminates privacy
concerns while still enabling comparative data analysis.
Absent adoption of common data models and data sharing platforms for EHR data, research
across multiple sites required manual extraction of defined clinical data elements
from EHRs submitted to a central study team. This was done in the UK for a pediatric
study related to COVID-19 [[32]] and for a study of factors associated with coronavirus death in the US [[33]]. In China, targeted national data reporting to a central research team supported
observational studies [[34]
[35]]. Data harmonisation and cleaning in these cases relied on expert review of data
submissions. For example, to support the development of a machine learning-based predictive
model for COVID-19 associated mortality based on EHR data from five New York hospitals,
expert mappings and data harmonisation were performed by a multi-disciplinary team
of clinicians [[36]].
Other approaches to collecting electronic clinical data were also utilised, including
deployment of clinical natural language processing to rapidly identify and characterise
patients relevant to COVID clinical questions [[37]]. Registries were established to collect disease-specific data items utilising case
report forms (CRFs). The most sophisticated of these used electronic CRFs mapped to
common data standards and submitted to a central database, such as the VIRUS-COVID-19
registry[3] [[38]
[39]] based on REDCap [[40]], which coordinated data entered in over twenty sites. Electronic surveys to gather
data directly from patients[4] were introduced to allow for collection of data from non-hospital settings [[41]].
Despite these successful demonstrations of the use of EHR and registry data to study
COVID-19, a number of challenges remain, summarised effectively in Madhavan et al. [[26]]. EHRs are not primarily designed to support coordinated research and public health
response, and their use in this context placed substantial strain on informatics and
data science teams at hospitals. Even basic conversion of spreadsheet-based systems
for research data collection to formal database structures can prove a challenge where
local differences exist in how fields are interpreted or used [[4]]. Infrastructure for individual-level storage and exchange of data for research
purposes is required, as well as commitment to common data models, terminologies,
and data interchange standards [[42]]. Rapid adaptation of standard clinical vocabularies such as ICD, LOINC, and SNOMED-CT
to include relevant new vocabulary is needed [[43]
[44]]. Governance and ethics factors must also be crucially addressed. The National COVID
Cohort Collaborative[5] (N3C) in the US aims to address many of these points, utilising the OMOP CDM to
bring together data from disparate sources, and aiming to facilitate record-level
analysis of COVID-19 patients (and matched controls) in a secure environment [[45]]. Tremendous progress has been made in the rush to address the pandemic, but there
is still much work to be done before truly international-scale data can be efficiently
and effectively brought together.
2.2.2 Data Analytics and Modelling
With the availability of large-scale and complex data about COVID-19 came the need
to analyse and model it. Advanced computational methods including machine learning,
natural language processing and other artificial intelligence (AI) methods can play
key roles [[46]], and indeed significantly contributed to detecting the COVID-19 outbreak, diagnosing
the disease, and predicting outcomes [[47]]. Models are critical to inform decision making [[48]], supporting prediction and simulation of outcomes under varying conditions or patient
characteristics.
Several of the EHR-based studies cited above utilise machine learning over clinical
variables [21]
[36]], while more traditional statistical or epidemiological modelling is typically employed
for observational studies. Imaging analysis models have also been adapted to COVID-19
from models for related diseases such as pneumonia [[49]], facilitated by public sharing of COVID-19 images with the AI community[6]
[7]
[8] [[50]].
The challenges faced in building sufficiently large data sets has meant that the modelling
of COVID-19 has resulted in high risk of bias and poor external validation [[51]
[52]
[53]]. Additionally, the inherent nature of observational EHR-based studies, lacking
controlled cohort selection, may lead to unreliable results due to confounding [[5]] and a risk of case contamination due to ambiguous cohort definitions [[54]].
2.2.3 COVID-19 Information Retrieval and Synthesis
The amount of COVID-19 research output has been remarkable; based on the LitCovid
index of this research at the US National Library of Medicine[9] [[55]], over 75,000 COVID-19-related publications were added to the PubMed literature
repository between January and late November 2020, at a steady pace of approximately
2000 articles per week (see [Figure 3]). A review of the literature focused on clinical presentation and management of
COVID-19 as of June 15, 2020, prioritised for general medicine readers, identified
over 100 relevant articles [[56]]. Over 7000 COVID-19 clinical trials are registered in the World Health Organization's
International Clinical Trials Registry Platform[10], which plays a key role in identifying research gaps [[57]]. Nearly 3000 systematic reviews related to COVID-19 are currently catalogued in
the Living OVerview of Evidence (L·OVE) platform[11] [[58]].
Fig. 3 Weekly publications in 2020 related to COVID-19, as indexed in the LitCovid collection
of PubMed [[55]].
This scientific knowledge is also broadly accessible; over 75% of this research is
available in open access publications, an unprecedented proportion, more than double
the rate for publications generally during 2015-2019 and for other topics in 2019-2020
[[7]].
However, the accumulation of this body of evidence about one disease in such a short
time is also overwhelming. A key challenge in COVID-19 knowledge management lay in
navigating this massive quantity of research evidence to support diagnosis, treatment,
and public policies, as well as molecular information about the virus. The sheer volume
of the research – published in natural language texts that must be read and interpreted
– requires significant effort to translate into knowledge. Studies must be synthesised
and evaluated, and broader conclusions drawn from comparing multiple studies examining
the same question.
Therefore, many systems based on information retrieval or text mining were created
in response to this challenge, including our COVID-SEE Scientific Evidence Explorer
system [[59]]; more are reviewed in [[60]
[61]
[62]]. An important resource in these efforts was the COVID-19 Open Research Dataset
(CORD-19) which compiled a significant collection of literature for both COVID-19
and related coronaviruses into a single, downloadable resource [[63]].
Leveraging such tools, community-based approaches to collect, curate, and model knowledge
rapidly emerged for COVID-19. Groups began working together to review the literature
and build living evidence guidelines that were updated as new information was made
available [[15]]. Utilising systematic review automation technologies, complete reviews could be
undertaken in a matter of weeks [[64]].
However, in the rapidly changing information space of COVID-19, the rush to explore
and share research outcomes also resulted in poor study designs, poor research reporting,
and lack of coordination and redundancy in research activities [[13]]. Coupled with the data biases noted above, this creates new problems – wasted effort,
increased review and quality appraisal work, and uncertainty about key diagnostic,
prognostic, and treatment decisions. The gLHS, effectively implemented, could provide
the coordination and feedback mechanisms needed to address these problems.
2.2.4 Knowledge Dissemination
Knowledge has been recognised as strategically important for managing pandemics [[48]] and it plays a central role in our learning-based model. As knowledge is acquired
through learning, it must be shared in order to have impact. While publications serve
a key role in disseminating knowledge, alone they are insufficient and ambiguous to
guide practice. Social media have been used effectively for knowledge dissemination
during COVID-19 [[65]], but this focuses on transferring knowledge between individuals.
Knowledge management implemented through information technology can improve information
sharing and coordination [[4]]. Several key elements for knowledge management in pandemics have been identified
[[66]]:
-
Shared knowledge spaces utilising consistent vocabulary.
-
Formal representations of knowledge.
-
Enabling reusable knowledge.
-
Empowering human collaboration through knowledge sharing.
All of these elements were adopted to one degree or another during the COVID-19 pandemic,
through the scientific globalism that emerged; in consortia like the N3C [[45]], working groups like those organised through the Research Data Alliance [[42]], and informal data and clinical networks.
Knowledge has been disseminated through numerous mechanisms, including online platforms
such as the Australian National COVID-19 Clinical Taskforce[12], Registry sites [[39]] and through clinical decision support tools such as the Magic Evidence Ecosystem
Foundation (MAGIC) MAGICApp [[17]].
The push for formal representation of knowledge, including computational and executable
models that can be integrated into health information systems to enable application
of knowledge to practice [[67]
[68]], has gained momentum during COVID-19. The COVID-19 Disease Map project [[69]
[70]] captured and made available molecular interaction information for the SARS-CoV-2
virus, based on manual curation, supported by weekly videoconferences. Knowledge graphs
are also being used to support representation and integration of the variety of biomedical
data related to COVID-19 [[71]], including via text mining [[72]]. Through the adoption of standardised ontology identifiers, comparable data in
different resources can be linked together for analysis, and combined in different
ways for different tasks.
Much of this work has been based on automated or semi-automated analysis of literature.
Direct generation of computable evidence from structured clinical trial registries
has also been proposed [[73]
[74]], which would shortcut the need for literature-based synthesis.
3 Discussion
Friedman and colleagues have stated [[75]]:
“A national-scale LHS will have to be understood and designed as such a cyber-social
ecosystem: a large-scale, decentralized, human-intensive, cyber-catalyzed and cyber-supported information processing system. The
system as a whole—not just the digital infrastructure, but also networks of people
and institutions—will have to be understood not just as users of a technological infrastructure,
but also as parts of the information system itself.”
Extending this to a global-scale LHS demands an even broader view of the relevant
ecosystem. The scale is larger, even more decentralised, and crosses a more diverse
set of legal jurisdictions. It is apparent that the people and institutions play a
critical part of the information system, to make possible the required data sharing
– including tackling legal barriers and leading ethical discussions around data sharing
– as well as by supporting effective communication of knowledge.
It has been observed that we entered the pandemic without a functioning LHS [[76]]. The authors ask [[76]]:
“We have the motivation. We have the vision. We have the technology. We have a roadmap.
What are the barriers?”
answering:
“The issue is culture. We need to treat medical data as a public good.”
They further point to the ethics framework of Faden et al. [[77]] that identifies the dual obligations of health professionals to learn and implement,
and patients to participate in the learning system by contributing their data.
Initiatives such as the US N3C are making important strides towards realising a gLHS.
We do appear to have the motivation, the vision, and the technology. A recent review
of the use of digital technologies during COVID-19 highlights how far we have come
in leveraging technology for the pandemic response [[78]]. What is required to achieve an ongoing gLHS is a commitment to the vision, coupled
with rigorous data governance and legal and regulatory frameworks that safeguard patient
privacy while supporting the learning knowledge ecosystem.
4 Conclusions
The model we have proposed is strongly aligned with the Agency for Healthcare Research
and Quality evidence-based Care Transformation Support (ACTS) Knowledge Ecosystem
initiative referred to as the ‘ACTS COVID-19 Evidence to Guidance to Action Collaborative’[13], which aims to continually enhance patient care throughout the pandemic, as the
evidence base evolves. This Collaborative emphasises development of digital infrastructure
to support the Knowledge Ecosystem, a cycle of Action-Data-Evidence-Guidance that
mirrors the LHS cycle. It has further been active in developing groups such as COKA,
the COVID-19 Knowledge Accelerator Initiative[14] [[74]], a response focused on COVID-19 to the call by Dunn and Bourgeois [[73]] to aim for computable knowledge synthesis and representation, through the use of
standards such as EBMonFHIR[15] [[79]] and CPGonFHIR[16], or rule formalisms for computational clinical guideline specification [[80]].
The vision pursued in these initiatives is still under active development, and has
required a vast community of clinicians, researchers, informaticians, developers,
industry and government representatives, and beyond coming together with the common
objective of addressing the technical, policy or legal, and cultural hurdles to enable
more effective management of the COVID-19 pandemic. It has been argued that infrastructure
is currently sorely lacking in most public health organisations to realise this vision
effectively or efficiently [[81]]. There are still many unanswered questions about how to overcome bias and determine
causality through real-world data [[24]
[82]]. As we have shown, many of the learning and knowledge sharing activities in the
context of the pandemic have been limited to very human-intensive approaches.
However, the core gLHS framework is in place, technology has been harnessed in many
ways to share data and knowledge at a pace that arguably outstripped the spread of
the virus, the requirements for information technology systems to support data and
knowledge exchange are increasingly being clarified, and the initial steps toward
achieving the vision have been made. This is entirely thanks to a tremendous response
by the scientific community with a shared objective of improving outcomes for patients.
Successful examples of large-scale, truly international data sharing and research
collaborations now exist [[28]]. Both the need for and the value of continued work towards a healthcare system
enabled through data and information technologies – a system that can be achieved
through the gLHS – are now obvious.
Continued efforts towards achieving a robust gLHS are important, not only to allow
us to respond to this pandemic and to prepare us to respond to the next pandemic,
but to support continuous improvements in how we care for human health. We now know
that we can do this.