2 Methods
Our paper selection process involved the following steps. First, we searched PubMed, the Association for Computational Linguistics Anthology, the Proceedings of the Conference on Human Factors in Computer Systems (CHI), and the Proceedings of the International AAAI (Association for the Advancement of Artificial Intelligence) Conference on Web and Social Media (ICWSM) using a variety of social media and NLP-related keywords. Second, we manually inspected Tables of Contents for the Journal of the American Medical Informatics Association, the Journal of Biomedical Informatics, and the Journal of Medical Internet Research. In this first pass, over 1,800 papers were identified. After reviewing abstracts, we reduced the number of papers reviewed to 130. In order to increase the tractability of the reviewing task, we further winnowed the papers to 71. This winnowing process was designed to capture a large swathe of both application areas and methods, and cannot be interpreted as a comment on the quality of research.
Only the papers that both demonstrated a clear public health focus and explicitly utilised NLP or text mining methods were retained. Papers that reported on the results of qualitative content analysis or professional standards for health communication using social media without reference to NLP were excluded. Papers that discussed ethical issues pertaining to the use of social media for public health applications and research were retained. References dated outside the period 2016-2018 have been included in order to provide important context. The use of these references does not imply that they form part of the document set defined by the inclusion criteria.
The papers reviewed utilise social media from several different sources, including Twitter, Reddit, Weibo, Facebook, and online discussion forums (see [Figure 1] and [Tables 1] & [2]).
Fig. 1 Social media data sources. Note that this list is not exhaustive.
Table 1
Number of papers by topic and data source. Note that papers can occur in several categories
Data Source
|
Vac[a]
|
Comm[b]
|
Cancer[c]
|
SA[d]
|
Pharmaco[e]
|
STI[f]
|
MH[g]
|
Total
|
Reddit
|
-
|
1
|
-
|
3
|
-
|
1
|
13
|
18
|
Twitter
|
3
|
3
|
1
|
17
|
7
|
1
|
9
|
41
|
Instagram
|
-
|
-
|
-
|
-
|
-
|
-
|
1
|
1
|
Facebook
|
1
|
-
|
-
|
-
|
-
|
-
|
3
|
4
|
OHC[h]
|
1
|
-
|
2
|
2
|
1
|
-
|
6
|
12
|
Weibo
|
-
|
1
|
-
|
-
|
-
|
-
|
1
|
2
|
WhatsApp
|
-
|
-
|
-
|
1
|
-
|
-
|
-
|
1
|
Youtube
|
-
|
-
|
-
|
1
|
-
|
-
|
-
|
1
|
Yik-Yak
|
-
|
-
|
-
|
1
|
-
|
-
|
-
|
1
|
Tumblr
|
-
|
-
|
-
|
-
|
-
|
-
|
1
|
1
|
a Vaccination hesitancy and refusal;
b Health communication;
c Cancer;
d Substance Abuse;
e Pharmacovigilance;
f Sexually transmitted infections;
g Mental health;
h Online Health Communities
a Vaccination hesitancy and refusal;
b Communicable diseases;
c Substance Abuse;
d Pharmacovigilance;
e Sexually transmitted infections;
f Mental health;
g Online Health Communities
The vast majority of the papers reviewed focussed on analysing English language text (68 papers), with two papers focussing on Chinese text [76], [77] and one paper focussing on Japanese text [31]. With respect to the geographical location of first authors, most of the articles emerged from North America (55), with Europe (7), and Asia (including Australasia and Turkey) (6) all represented.
The reviewed papers can be grouped into several health-related categories, including vaccine hesitancy and refusal, communicable diseases surveillance (including sexually transmitted infections, [STIs]), cancer, substance abuse, pharmacovigilance, and mental health (see [Table 2]). A wide range of methods were used, including “classical” machine learning (e.g., Random Forests, Support Vector Machines [SVM]), “modern” machine learning (e.g., Convolutional Neural Networks [CNN], Recurrent Neural Networks [RNN][2]), and lexicon-based approaches). Among the lexicon-based approaches, the Linguistic Inquiry and Word Count (LIWC) lexicon, a dictionary of words arranged into numerous psychological dimensions, is used extensively in many of the papers reviewed, especially in the areas of mental health and substance abuse [79].
3 Results
3.1 Vaccine Hesitancy and Refusal
Vaccine hesitancy – defined by the World Health Organisation as referring to a “delay in acceptance or refusal of vaccines despite availability of vaccination services”[3] – has been a growing subject of research during learning methods [5], [29], [30], and one used modem machine learning methods [30], with surveillance [28]
[29]
[30], health communication [5], [28]
[29]
[30], [66], and sentiment analysis [28]
[29]
[30], [66], all frequently studied topics. The LIWC lexicon has been used either to characterise public attitudes towards vaccination in general [66], or as a tool to explore the purported link between autism and the Measles, Mumps, and Rubella vaccine [28]. This last study aimed at investigating key differences between users who are longstanding vaccination advocates, long standing anti-vaccination advocates, or users who had recently adopted an anti-vaccination orientation. Vaccination the review period, with NLP methods applied to social media data in an attempt to develop insights into how best to understand and improve health communication as well as quantifying the degree of vaccine hesitancy in a community.
Of the five papers reviewed in this section (see [Table 3]), three utilised Twitter data [29], [30], one utilised Facebook data [66], and one further paper utilised data derived from an online health community, in this case moth- ering.com [5]. Supervised machine learning [30] and unsupervised machine learning [5], [28], [29] were both represented. Three of the papers reviewed used classical machine to protect against the Human Papillomavirus Virus (HPV) – a vaccine typically administered to adolescent boys and girls to prevent future sexual transmission of the disease – was also the subject of reviewed research, with high performance sentiment classifiers developed (AUC: 0.92) [30], and LDA (Latent Dirichlet Allocation) topic modeling used to identify a number of vaccine-hesitancy-related topics, including clinical evidence and vaccination harms [29].
Table 3
Summary of vaccine-related papers
Data Source
|
SML[a]
|
UML[b]
|
UML[b]
|
CML[c]
|
MML[d]
|
Surv[e]
|
HC[f]
|
Senti[g]
|
Lexicon[h]
|
Twitter
|
[30]
|
[28], [29]
|
[28], [29]
|
[29], [30]
|
[30]
|
[28]
[29]
[30]
|
[28]
[29]
[30]
|
[28]
[29]
[30]
|
[28]
|
Facebook
|
-
|
-
|
-
|
-
|
-
|
-
|
[66]
|
[66]
|
[66]
|
OHC[i]
|
-
|
[5]
|
[5]
|
[5]
|
-
|
-
|
[5]
|
-
|
-
|
a Supervised machine learning (e.g., Support Vector Machines, Random Forests);
b Unsupervised machine learning (e.g., Latent Dirichlet Allocation, K-means);
c Classical machine learning (e.g., Random Forests, Support Vector Machines);
d Modern machine learning (e.g., Convolutional Neural Networks);
e Surveillance;
f Health communication;
g Sentiment analysis;
h Lexicon-based methods;
i Online health communities
In a further example of novel research, Tangherlini et al., produced a statistical-mechanical network model representing relationships between “actants” (actors) that is used to automatically extract typical narratives and “story fragments” related to vaccination issues, evidencing a narrative framework related to a pronounced distrust of government and medical authority [5].
3.2 Communicable Diseases and Sexually Transmitted Infections
Systems designed to use social media data for pandemic public health surveillance have existed for almost 13 years [80], [81], and approaches that are variously referred to as infodemiology [82], digital disease detection [83], and digital epidemiology [84] are by now well established, particularly for dengue, influenza, and more recently, ebola. In addition, significant research efforts have centered on the study of STI, despite some methodological concerns regarding the willingness of users with STIs to disclose their status on social media.
In order to investigate the changing prevalence of a number of health related topics, Park et al., [10] observed that ebola discussions were characterised by concerns about risks and symptoms, while influenza was associated with terms like “CDC” and “H1N1”. Another study focussed on influenza misdiagnoses [33], achieving an F-score of 0.76. Regarding STIs, one study demonstrated statistically significant associations between Twitter data from 2012 and official Centers for Disease Control syphilis prevalence data from 2013 [57], with a related study discovering that the most frequent STIs discussed were intermediate (non-reportable) STIs like genital herpes and HPV, with more serious (reportable) diseases like syphilis and gonorrhoea discussed less frequently [14].
Of the six papers reviewed (see [Table 4]), four used Twitter data [31]
[32]
[33], [57], and two used Reddit data [10], [14], while Al-Garadi et al., provided a review that concentrated on Twitter and Weibo, the Chinese language microblog service [32]. Two of the papers reviewed described the use of supervised machine learning methods [31], [32], three papers used unsupervised machine learning methods [10], [14], [32], and one used a lexicon-based approach [57]. Machine learning methods were used to perform a variety of tasks, including surveillance [10], [14], [31]
[32]
[33], [57], health communication [32], and sentiment analysis [32]. Several studies concentrated on influenza surveillance using English [10], [33] and Japanese [31] Twitter data.
Table 4
Summary of communicable diseases and STI-related papers
Data Source
|
SML[a]
|
UML[b]
|
CML[c]
|
MML[d]
|
Surv[e]
|
HC[f]
|
Senti[g]
|
Lexicon[h]
|
Reddit
|
-
|
[10], [14]
|
[10], [14]
|
-
|
[10], [14]
|
-
|
-
|
-
|
Twitter
|
[31], [32]
|
[32]
|
[31-33]
|
-
|
[31-33, 57]
|
[32]
|
[32]
|
[57]
|
Weibo
|
[32]
|
[32]
|
[32]
|
-
|
[32]
|
[32]
|
[32]
|
-
|
a Supervised machine learning;
b Unsupervised machine learning;
c Classical machine learning;
d Modern machine learning;
e Surveillance;
f Health communication;
g Sentiment analysis;
h Lexicon-based methods
3.3 Cancer
Work on using NLP and text-mining methods to understand issues directly related to cancer (diagnosis, treatment, and management) are less well developed than some of the other areas considered in this review (e.g., mental health and substance abuse). Of the three cancer-related papers reviewed (see [Table 5]), one utilised Twitter data [34], and two utilised data derived from an online health community [68], [69]. All the papers discussed used both classical and modern machine learning methods, with modern machine learning methods performing better than classical machine learning methods, albeit by a narrow margin in the case of Zhang et al.’s work on identifying chemotherapy-related Twitter accounts by account type [34]. Zhang et al., observed that Twitter accounts belonging to individuals focussed on “personal chemotherapy experience and emotions”, whereas professional accounts typically provided a neutral presentation of chemotherapy side effects [34]. Two of the papers were centred on health communication, broadly conceived [68], [69], with one paper focusing on sentiment analysis [34]. Concentrating specifically on the patient experience of breast cancer, one study [68] aimed at characterizing how forum topics changed over time depending on the individual’s time since diagnosis and cancer state, and found that diagnosis is the most frequent class in the early stages of cancer treatment, with diagnosis (and treatment) related discussions declining over the course of a user’s cancer journey.
Table 5
Summary of cancer-related papers
Data Source
|
SML[a]
|
UML[b]
|
CML[c]
|
MML[d]
|
Surv[e]
|
HC[f]
|
Senti[g]
|
Lexicon[h]
|
Twitter
|
[34]
|
[34]
|
[34]
|
[34]
|
-
|
-
|
[34]
|
-
|
OHC[i]
|
[68, 69]
|
[68]
|
[68, 69]
|
[68, 69]
|
-
|
[68, 69]
|
-
|
-
|
a Supervised machine learning;
b Unsupervised machine learning;
c Classical machine learning;
d Modern machine learning;
e Surveillance;
f Health communication;
g Sentiment analysis;
h Lexicon;
i Online Health Communities
3.4 Substance Abuse
This section is concerned with reviewing work centred on the use of social media, in conjunction with NLP methods, to address substance abuse research questions, focussing on opioid abuse, tobacco, e-cigarette and marijuana use, and alcohol abuse. Interesting work on drug abuse – particularly new and emerging products – is increasingly evident in the literature. NLP methods are needed to deal with ambiguity and colloquial expressions used on social media (such as “bath salts”, “kitty cat”, or “miaow miaow” for mephedrone [44]).
Of the twenty-two papers discussed in this section, three are focussed on opioid abuse [35, 41, 42], eight on tobacco and marijuana use [6, 12, 13, 40, 43, 45, 46, 49], one on alcohol abuse [36], and one on the street drug, mephedrone [44]. Twitter is the most popular source of data (18 papers) [6, 11, 12, 35-49], with Reddit [11-13], and online health communities [12], [13], both represented. Supervised machine learning (8 papers - all utilising Twitter data) and unsupervised machine learning (11 papers) were both evident in the reviewed papers, with classical machine learning approaches more common than modern neural-network-based approaches (17 and 2 papers, respectively). Two of the papers reviewed utilized a rule- based approach. [Table 6] summarises the reviewed substance abuse-related papers.
Table 6
Summary of substance abuse-related papers
Data source
|
SML[a]
|
UML[b]
|
CML[c]
|
MML[d]
|
Surv[e]
|
HC[f]
|
Senti[g]
|
Lexicon[h]
|
Reddit
|
-
|
[11-13]
|
[11-13]
|
-
|
[12]
|
-
|
-
|
[13]
|
Twitter
|
[6, 36, 40, 45-49]
|
[6, 1 2, 35,37, 39, 41, 42, 43, 45]
|
[6, 12, 35, 36, 38-43, 45-49]
|
[6, 37]
|
[1 1, 12, 35, 36, 38, 39, 42, 44, 47-49]
|
[43]
|
[46-48]
|
[44]
|
OHC[i]
|
-
|
[12, 13]
|
[12, 13]
|
-
|
[12]
|
-
|
-
|
[13]
|
a Supervised machine learning;
b Unsupervised machine learning;
c Classical machine learning;
d Modern machine learning;
e Surveillance;
f Health communication;
g Sentiment analysis;
h Lexicon;
i Online Health Communities
3.4.1 Opioid Abuse
Opioid abuse is now recognised as one of the leading public health problems in the United States[4], and an important – albeit slightly less pressing – concern in many developed and developing countries. The crisis in the US is due to historical changes in drug prescription policies and practices that have encouraged both the licit and illicit use of highly addictive opioid-based painkillers[5] Every year in the United States, over 72,000 people die as a direct consequence of using opioids[6], making the need to understand emerging opioid-related behaviours and user trajectories especially pressing. One study concentrated on identifying public reactions to the opioid epidemic by identifying the most popular opioid-related topics tweeted by users [41]. Topics identified included discussions related to the possibility of promoting marijuana as a substitute for opioids, discussions related to the growing opioid market in North America, and discussions related to news reports advocating the use of buprenorphine – a narcotic used to treat opioid addiction – for adolescents experiencing opioid use disorders. Another study [35] aimed at detecting marketing and sale of opioids by illicit online sellers. The authors observed that the frequency of tweets directly related to illegal activity was relatively low when compared with other kinds of opioid mentions. A similar observation was made for tweets promoting the illegal online sale of fentanyl [42]. In this context, unsupervised approaches are of significant value for understanding changes in a rapidly developing online environment.
3.4.2 Tobacco, E-Cigarette, and Marijuana Use and Abuse
Tobacco use is declining in popularity in much of the developed world (the proportion of smokers in the US has declined by over half since 1964 and now stands at 16.8% among adults, and approximately half that among high school students [85]). However, despite this decrease in tobacco use, there has been a dramatic increase–now plateauing – in the use of e-cigarettes since their introduction to developed world markets in around 2007 [86]. This increase has occurred in the context of a lack of consensus regarding both the safety of the product [87] and its potential efficacy as a smoking cessation device [88]. In addition to these shifts in tobacco use, there have also been substantial changes in the regulation of marijuana products, particularly in the US context, and these changes have led – it has been suggested [89] – to an increase in marijuana use [90]. Given these public health concerns, using NLP to investigate tobacco, e-cigarette, and marijuana use, has become an active research area, especially to classify discussions [6, 12, 43, 45, 46] or to determine whether a particular user is above or below 21 years of age [40]. Reported findings included evidence that Twitter users frequently discussed ways in which e-cigarettes can be used in the workplace in a bid to circumvent smoking bans [43], and evidence that hookah was discussed more frequently at the weekend, indicating its use is associated with leisure activities, while reported tobacco use tends to be more consistent across the week [40]. In addition, authors observed that different social media services manifested distinctly different cultures regarding e-cigarette use, e.g., sensory experiences vs. psychological factors associated with quitting [13]. Rule- based approaches were used to identify where people reported using e-cigarettes, with 39% of posts referring to e-cigarette use in the classroom [49]. Other studies aimed at describing strategies for marketing Little Cigars & Cigarillos (LCC) and observed that 83% of identified LCC tweets referred to marijuana, and 29% of LCC tweets referenced memes [45].
3.4.3 Alcohol Abuse
Alcohol abuse was the seventh leading risk-factor worldwide for both death and disability in 2016. In the same year, among males aged 15-49, alcohol was a causal factor in 12% of deaths [91]. One of the reviewed studies [36] yielded the surprising result that– in the US at least – a positive correlation exists between excessive county-level alcohol consumption and higher education, suggesting that highly educated counties drink more, or at least tweet more about their drinking.
3.5 Pharmacovigilance
Pharmacovigilance – i.e. the post-market surveillance of drugs – was an early health-related focus for social media NLP [92], [93] and has remained an important subject of research, with applications including the identification of mentions of Adverse Drug Reactions (ADRs) [51], [55]. One recent study focussed on topics related to Thyroid Hormone Replacement Therapy (THRT), particularly on the identification of side effects [50]. It was discovered that male and female users of THRT had different experiences and concerns regarding side effects, with women primarily concerned about the effect of the drug on personal appearance and men more concerned about potential pain symptoms associated with the drug.
A recent significant development in pharmacovigilance research was the instigation of the SMM4 2017 shared task. The shared task consisted of three subtasks: automatic identification of ADRs, automatic classification of tweets that explicitly mentioned medication consumption, and normalization of ADR mentions. Important outputs of this effort included a publicly available corpus [51] and language models [55] for future research. In addition to this work on ADR identification and normalization, the identification of semantic relationships – chiefly causal relationships – between drug and symptom mentions had been a focus of research [52], [53]. A key challenge associated with this task is the difficulty involved in distinguishing between drug use as a response to a particular symptom (“I have a horrible headache and just took some ibuprofen”) and the existence of a symptom as a side effect of a drug (“Ever since I started taking Sertraline I’ve felt like crap”). Despite the difficulty of this task, Bollegala et al., achieved a moderately high F-score (0.74) using a skip-gram based method [52].
Six of the pharmacovigilance papers reviewed used Twitter as a data source [51], [56], while one used an online health community (see [Table 7]). Four of the papers used supervised methods [51]–[54] and five used unsupervised methods [50], [53]–[56] with five using classical machine learning methods [50]–[53], [56] and three using modern machine learning methods [51], [54], [55], with (unsurprisingly given the topic of pharmacovigilance) surveillance being the main application area.
Table 7
Summary of pharmacovigilance-related papers
Data Source
|
SML[a]
|
UML[b]
|
CML[c]
|
MML[d]
|
Surv[e]
|
HC[f]
|
Senti[g]
|
Lexicon[h]
|
Twitter
|
[51-54]
|
[53-56]
|
[51-53, 56]
|
[51, 54, 55]
|
[51-54, 56]
|
-
|
-
|
-
|
OHC[i]
|
-
|
[50]
|
[50]
|
-
|
-
|
-
|
-
|
-
|
a Supervised machine learning;
b Unsupervised machine learning;
c Classical machine learning;
d Modern machine learning;
e Surveillance;
f Health communication;
g Sentiment analysis;
h Lexicon-based methods;
i Online Health Communities
3.6 Mental Health
Mental health problems are estimated to account for 13% of the global burden of disease, as measured in Disability Adjusted Life Years [95]. Using social media as a resource to understand mental health is a research area that has experienced substantial growth in recent years [96], given the burden of disease associated with mental health problems and the fact that social media provides ready access to first person reports of behaviour, thoughts, and feelings. Reviewed studies covered a range of mental health topics, including predicting depression diagnosis [8], assessing suicide risk [16, 18, 24, 74-76, 98, 99], and developing a better understanding of users’ experiences of eating disorders [15], schizophrenia [59], [61], grief processes between gang-involved youth [58], relaxation [62], stress [63], pathological empathy [67], [72], and negative emotional effects associated with campus-based mass murders [64]. Related to this, a range of metrics have been used to characterize language use associated with specific mental health conditions, with lexical diversity, readability scores, sentence complexity, negation, uncertainty, and degree of repetition, all used during the review period [23, 26, 27, 60]. In novel work focussing on the relationship between clinical guidelines and actual treatments, Zhang et al. [71] created a catalogue of real-world treatments used – as opposed to merely discussed – by parents of children with autistic spectrum disorder, and then automatically identified their frequency of mention in two online autism forums.
With a view to improving how mental health forums are designed, one study applied textual cluster analysis to forums related to the conditions anxiety, depression, and post-traumatic stress disorder (PTSD) [19], showing that–consistent with current thinking regarding the relationship between PTSD and anxiety [97] – anxiety and PTSD forums shared more similarities to each other than to the depression forum. Related to this, another study found that different communities provided different degrees of emotional and informational support [20], with some communities (e.g., depression forums) focussed primarily on emotional support, and other communities (e.g. obsessive compulsive disorder forums) offering a greater proportion of informational support. Furthermore, the same study found that at the user level, the provision of social support was correlated with demonstrated linguistic accommodation, suggesting that those users who were able to “match” the linguistic culture of a particular community were likely to receive a greater volume of social support. Finally, a further study [100] involved the development of a classifier capable of identifying respectful uses of a mental-health related term (e.g. “I’m fuming. How dare a TV show portray folks suffering from mental health issues so unfairly”) and less-respectful usage.
Of the thirty-one mental health-related papers reviewed (see [Table 8]), thirteen involved the use of Reddit data [15-27], ten used Twitter data [18, 24, 58-65], one used Instagram [18], three used Facebook [8, 18, 67], six used OHC data [70-75], and one used data derived from Weibo [76], with twenty-two of the papers utilising supervised machine learning methods [8, 16, 18, 20-22, 24, 25, 58-62, 65, 67, 70-76], and twelve papers utilising unsupervised machine learning [8, 15, 18-22, 27, 59, 60, 70, 72]. The majority of the papers reported on the use of classical machine learning approaches [8, 15, 16, 18-20, 22, 24, 25, 27, 58-62, 65, 67, 71, 73-76], with a minority using modern machine learning methods [18, 21, 22, 67, 70, 72]. Four of the mental health papers reviewed utilised primarily lexicon-based methods [17, 23, 63, 64].
Table 8
Summary of mental health-related papers
Datasource
|
SML[a]
|
UML[b]
|
CML[c]
|
MML[d]
|
Surv[e]
|
HC[f]
|
Senti[g]
|
Lexicon[h]
|
Reddit
|
[16, 18, 20-22, 24, 25]
|
[15, 18-22, 27]
|
[15, 16, 18-20, 22, 24, 25, 27]
|
[21, 22]
|
-
|
-
|
[26]
|
[17], [23]
|
Twitter
|
[18, 58-62, 65]
|
[18, 59, 60]
|
[58-62, 65]
|
[18]
|
-
|
-
|
[63, 64, 24]
|
[63, 64]
|
Instagram
|
[18]
|
[18]
|
-
|
[18]
|
-
|
-
|
-
|
-
|
Facebook
|
[8, 18, 67]
|
[8, 18]
|
[8, 67]
|
[18, 67]
|
-
|
-
|
-
|
-
|
OHC[i]
|
[70-75]
|
[70, 72]
|
[71, 73-75]
|
[70, 72]
|
-
|
-
|
-
|
-
|
Weibo
|
[76]
|
-
|
[76]
|
-
|
-
|
-
|
-
|
-
|
a Supervised machine learning;
b Unsupervised machine learning;
c Classical machine learning;
d Modern machine learning;
e Surveillance;
f Health communication;
g Sentiment analysis;
h Lexicon-based methods;
i Online Health Communities
3.7 Ethical Issues
Two types of ethics-related papers are discussed in this section: those that are focussed on empirical ethics (i.e. the empirical investigation of ethical beliefs and practices) [101], [102], and those that are focussed on ethical guideline development (i.e. the generation of theoretical frameworks and practical guidelines for conducting health-related NLP research with social media) [9, 103, 104]. Reviewed studies highlighted the need for both transparency in the development of algorithms and an ethical framework to guide the appropriate use of social media for computational public health research.
Focussing specifically on research ethics from the perspective of social media users, one study [102] pointed to a generally favourable view of the use of computational methods for public health research among social media users, provided that data was highly aggregated, and the goal of the work was of significant public health value (e.g. opioid abuse surveillance was acceptable in a public health context, but not when used for employment screening). However, among some users, concerns remained regarding the robustness of both the data and the research methods, due to the fact that the data was not representative of the general population, and was subject to impression management (i.e. many users did not tweet about stigmatising health problems [105]). Related to this work, one paper – a systematic review of attitudes towards the ethics of computational social media research [106] – found a range of different views on appropriate research ethics, depending on the particular research topic discussed, suggesting that a “blanket” approach to research ethics is currently not appropriate, and instead ethical deliberations ought to take into account the particular context of the research under review [106].
As noted by Vayena et al., [104], the research regulation infrastructure in most jurisdictions was developed in the period prior to social media, and hence is not well-equipped to manage the review of computational social media research. This point is reinforced by a qualitative study conducted with Research Ethics Committee (Institutional Review Board) members in the United Kingdom. This study outlines the challenges faced by ethics committees in the application of existing research ethics regulation to computational work and emphasises the need to protect research participants (i.e. social media users), even in the context of research using publicly available data [101].
Finally, practical guidelines have recently been developed to guide NLP research using social media data [103], with eight principles outlined, including the stipulation that as most social media based NLP research can be defined as human subjects research [107], ethical approval or exemption ought to be gained from an Institutional Review Board or Research Ethics Committee; that data ought to be de-identified for use in publications and presentations; and that caution ought to be exercised in linking data.
In recent years there has been a move away from the commonly held view that in social media research “anything goes”, towards a more sophisticated perspective that acknowledges both the existence and importance of the ethical and regulatory issues involved in the application of NLP to social media for health research. Further, the provision of ethical guidelines developed specifically for NLP researchers – as described above, [103] – is a new and welcome development in the period since 2016.