[Hybrid]
*** New: proceedings are available ***

The Social Media Mining for Health Applications (#SMM4H) workshop serves as a venue for bringing together researchers interested in automatic methods for the collection, extraction, representation, analysis, and validation of social media data (e.g., Twitter, Facebook) for health informatics. The 7th #SMM4H Workshop, co-located at Coling 2022, invites the submission of papers on original, unpublished research in all aspects at the intersection of social media mining and health. Topics of interest include, but are not limited to:
- Methods for the automatic detection and extraction of health-related concept mentions in social media
- Mapping of health-related mentions in social media to standardized vocabularies
- Deriving health-related trends from social media
- Information retrieval methods for obtaining relevant social media data
- Geographic or demographic data inference from social media discourse
- Virus spread monitoring using social media
- Mining health-related discussions in social media
- Drug abuse and alcoholism incidence monitoring through social media
- Disease incidence studies using social media
- Sentinel event detection using social media
- Semantic methods in social media analysis
- Classifying health-related messages in social media
- Automatic analysis of social media messages for disease surveillance and patient education
- Methods for validation of social media-derived hypotheses and datasets
Important dates (tentative)
Keynote: Social media listening for pharmaceutical R&D
Traditionally, social media listening (SML) in the pharmaceutical setting has been limited to marketing and communication purposes and performed with manual, qualitative methods. Pharmaceutical companies, with the encouragement of regulatory agencies, have started utilizing social media listening to integrate the patient perspective in the clinical development process to ensure relevant treatments and outcomes. Additionally, there is a growing acknowledgement that quantitative methods for SML (QSML) can provide new and more rigorous analyses that enhance the value of social media data to enable a patient-centric approach to understanding disease burden and influence drug discovery decisions at all stages. During this talk, I will present some examples of QSML supporting pharmaceutical R&D.
Speaker: Raul Rodriguez-Esteban

Raul Rodriguez-Esteban is Senior Principal Scientist at Roche Pharmaceuticals in Basel, Switzerland, where he works on natural language processing, machine learning and real-world data applied to pharmaceutical R&D. Previously, he worked in pharmaceutical R&D at Boehringer Ingelheim and Pfizer. He completed his PhD in machine learning applied to text mining at the laboratory of Andrey Rzhetsky at Columbia University. He was a winner of the Bio-IT World Innovative Practices Award in 2020 and is editorial board member of the journal BMC Digital Health.
Paper Submission and Presentation Information
Paper submissions may consist of up to 4 pages, plus unlimited references, and must describe completed, original, and unpublished work. Papers may make small, focused contributions, but the work must be completed; we will not accept papers describing work-in-progress. We also will not accept papers that overlap significantly with papers that have been or will be published elsewhere, or are currently under consideration for other venues. All accepted papers are required to be presented orally or as a poster, as determined by the program committee, in order to be included in the workshop proceedings. At least one author of each accepted paper must register for #SMM4H 2022 to present.
All paper submissions must follow the Coling 2022 guidelines and be submitted as a PDF.
Submission link: softconf
Organizers
Graciela Gonzalez-Hernandez, Cedars-Sinai Medical Center, USA
Davy Weissenbacher, Cedars-Sinai Medical Center, USA
Arjun Magge, University of Pennsylvania, USA
Ari Z. Klein, University of Pennsylvania, USA
Ivan Flores, Cedars-Sinai Medical Center, USA
Karen O’Connor, University of Pennsylvania, USA
Raul Rodriguez-Esteban, Roche Pharmaceuticals, Switzerland
Lucia Schmidt, Roche Pharmaceuticals, Switzerland
Juan M. Banda, Georgia State University, USA
Abeed Sarker, Emory University, USA
Yuting Guo, Emory University, USA
Yao Ge, Emory University, USA
Elena Tutubalina, Insilico Medicine, Hong Kong
Luis Gasco, Barcelona Supercomputing Center, Spain
Darryl Estrada, Barcelona Supercomputing Center, Spain
Martin Krallinger, Barcelona Supercomputing Center, Spain
Program Committee
Cecilia Arighi, University of Delaware, USA
Natalia Grabar, French National Center for Scientific Research, France
Thierry Hamon, Paris-Nord University, France
Antonio Jimeno Yepes, Royal Melbourne Institute of Technology, Australia
Jin-Dong Kim, Database Center for Life Science, Japan
Corrado Lanera, University of Padova, Italy
Robert Leaman, US National Library of Medicine, USA
Kirk Roberts, University of Texas Health Science Center at Houston, USA
Yutaka Sasaki, Toyota Technological Institute, Japan
Pierre Zweigenbaum, French National Center for Scientific Research, France
Contact information
Davy Weissenbacher (davy.weissenbacher@cshs.org)
Call for Participation – Shared Task
The Social Media Mining for Health Applications (#SMM4H) Shared Task involves natural language processing (NLP) challenges of using social media data for health research, including informal, colloquial expressions and misspellings of clinical concepts, noise, data sparsity, ambiguity, and multilingual posts. For each of the eight tasks below, participating teams will be provided with a set of annotated posts for developing systems, followed by a three-day window during which they will run their systems on unlabeled test data and upload the predictions of their systems to CodaLab. Information about registration, data access, paper submissions, and presentations can be found in the individual competition sections below.
Registration: here
(Note: after registration, we communicate and release the data through dedicated google groups, please, make sure to check your spam folders)
Submission link for the system descriptions: softconf
Timeline (tentative)
Adverse Drug Events (ADEs), also often known as Adverse Drug Reactions (ADRs), are negative side effects related to the drug. Mining ADEs from social media is one of the most studied topics in the area of Social Media Pharmacovigilance to understand how data from non-traditional sources can be mined for early detection. In this task, the submitted system must perform one or more of the following tasks:
(1) classify tweets reporting ADEs (Adverse Drug Events),
(2) detect ADE spans in the tweets, and
(3) map these colloquial mentions to their standard concept IDs in the MedDRA vocabulary.
We will provide participants with about 18,000 labeled tweets for training and about 10,000 tweets for testing. Participants will have the option to participate in one or more subtasks:
Task 1a – Classification : Given a tweet, participants of this subtask will be required to submit only the binary annotations ADE/noADE
Task 1b – Extraction : Given a tweet, participants of this subtask will be required to submit both the ADE classification labels (Subtask 1a) and spans of expressed ADE.
Task 1c – Normalization : Given a tweet, participants of this subtask will be required to submit ADE classification labels (Subtask 1a), ADE spans (Subtask 1b) and normalization labels. This task involves development of multiple components and presents multiple challenges such as class imbalance and out-of-vocabulary labels. Hence, it will require methods going beyond simple applications of deep learning approaches to be successfully addressed.
Contact: Arjun Magge, University of Pennsylvania, USA (arjun.magge@pennmedicine.upenn.edu)
Users are actively sharing their views on various issues on social networks. Nowadays, these issues are often related to the COVID-19 pandemic. For example, users express their attitude towards a quarantine and wearing masks in public places. Some statements are reasoned by arguments, other statements are just emotional claims. Automated approaches for detecting people’s stances towards health orders related to COVID-19, using Twitter posts, can help to estimate the level of cooperation with the mandates. In this task we focus on argument mining (or argumentation mining) for extracting arguments from COVID-related tweets. According to argumentation theory, an argument must include a claim containing a stance towards some topic or object, and at least one premise/argument (“favor” or “against”) of this stance.
Participants will be provided with labeled training set containing texts from Twitter about three health mandates related to COVID-19 pandemic:
- Face Masks
- Stay At Home Orders
- School closures
Participants have an option to take part in one or two subtasks. All participants will be invited to submit papers and present their results at the SMM4H 2022 workshop (see COLING’22 for more information on dates)
Data
We will provide participants with manually labeled tweets for training, validation and testing. The train set for stance detection subtask is based on a COVID-19 stance detection dataset (Glandt et al., 2021).
- Train: 3669 tweets
- Validation: 600 tweets
- Test: 2000 tweets
Contact:
Elena Tutubalina, Insilico Medicine, Hong Kong (tutubalinaev@gmail.com)
Vera Davydova (veranchos@gmail.com)
Subtask 2a. Stance Detection
The designed system for this subtask should be able to determine the point of view (stance) of the text’s author in relation to the given claim (e.g., wearing a face mask). The tweets in the training dataset are manually annotated for stance according to three categories: in-favor, against, and neither. Given a tweet, participants of this subtask will be required to submit three classes annotations:
- FAVOR – positive stance
- AGAINST – negative stance
- NEITHER – neutral/unclear/irrelevant stance
Subtask 2b. Premise Classification
The second subtask is to predict whether at least one premise/argument is mentioned in the text. A given tweet is considered as having a premise if it contains a statement that can be used as an argument in a discussion. For instance, the annotator could use it to convince an opponent about the given claim.
Given a tweet, participants of this subtask will be required to submit only the binary annotations:
- 1 – tweet contains a premise (argument)
- 0 – tweet doesn’t contain a premise (argument)
Evaluation Metrics
The main performance metric in each of the two subtasks are F1𝑠𝑡𝑎𝑛𝑐𝑒 and F1𝑝𝑟𝑒𝑚𝑖𝑠𝑒 scores respectively,
which are calculated according to the following formula:
Examples of annotations
Useful links
COVID-19 Stance Detection dataset (Glandt et al., 2021) is available here:
https://github.com/kglandt/stance-detection-in-covid-19-tweets
References
The designed binary classifier should detect tweets where Twitter users self-declare changing their medication treatments, regardless of being advised by a health care professional to do so. Such changes are, for example, not filling a prescription, stopping a treatment, changing a dosage, forgetting to take the drugs, etc. This task is the first step toward detecting patients non-adherent to their treatments and their reasons on Twitter. The data consists of two corpora: a set of tweets and a set of drug reviews from WebMD.com. Negative and positive reviews are naturally balanced whereas positive and negative tweets are naturally imbalanced. Each set is split into a training, a validation, and a test subset. The participants will be given the training and validation subsets for both corpora and evaluated on both test sets independently. Participants are expected to submit their predictions for both test sets. This year, we will add in the test sets additional reviews and tweets as decoys to avoid manual corrections of the predicted labels. Evaluation script, annotation guidelines, and baseline code will be provided to registered participants.
- Training data: 5,898 Tweets / 10,378 Reviews
- Validation data: 1,572 Tweets / 1,297 Reviews
- Test data: 2,360 Tweets / 1,297 Reviews
- Evaluation metric: F1-score for the change class
Contact: Davy Weissenbacher, Cedars-Sinai, USA (davy.weissenbacher@cshs.org)
Subtask 3a. Tweet Classification
Submission format: Please use the format below for submission. Submissions should contain two columns tweet_id and label separated by tabspaces. All other columns will be ignored. Predictions for each task should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as submission.
Subtask 3b. WebMD Classification
Submission format: Please use the format below for submission. Submissions should contain two columns SOURCE_FILE and label separated by tabspaces. All other columns will be ignored. Predictions for each task should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as submission.
Advancing the utility of social media data for research applications requires methods for automatically detecting demographic information about social media study populations, including users’ age. Automatically identifying the exact age of social media users, rather than their age groups, would enable the large-scale use of social media data for applications that do not align with the predefined age groupings of extant models, including health applications such as identifying specific age-related risk factors for observational studies, or selecting age-based study populations. As a first step, this binary classification task involves automatically distinguishing tweets that self-report the user’s exact age from those that do not. A benchmark classifier, based on a RoBERTa-Large pretrained model, achieved an F1-score of 0.914 for the “positive” class (i.e., tweets that self-report the user’s exact age) in the validation data.
- Training data: 8,800 tweets
- Validation data: 2,200 tweets
- Test data: 10,000 tweets
- Evaluation metric: F1-score for the “positive” class (i.e., tweets that self-report the user’s exact age)
Table 1 provides sample training data, which include the Tweet ID, the text of the Tweet Object, and the annotated binary class. Tweets were annotated as “1” if the user’s exact age could be determined, from the tweet, at the time the tweet was posted. In the first tweet, the user’s exact age is explicitly stated. Although the second tweet does not explicitly state the user’s age, it can be inferred from the fact that the user reports turning 20 tomorrow. The third tweet does not specify when the user will be 21, but it was annotated as “1” under the assumption that the tweet is referring to the user’s next birthday. The fourth tweet, however, was annotated as “0” because it is ambiguous about whether the user was 21 when the tweet was posted, or whether the user is referring to a future age. The fifth tweet was also annotated as “0” because it is ambiguous whether the user was 18 when the tweet was posted, or whether the user is referring to age further in the past. The sixth tweet was annotated as “0” because it does not refer to the age of the user, but rather the user’s brother.
Contact: Ari Klein, University of Pennsylvania, USA (ariklein@pennmedicine.upenn.edu)
Codalab: https://codalab.lisn.upsaclay.fr/competitions/3566
Submission format: System predictions should be submitted through CodaLab. Submissions should be formatted as a ZIP file containing a TSV file with only two columns: the tweet_id column first and the label column second. The TSV file should not be in a folder in the ZIP file, and the ZIP file should not contain any files or folders other than the TSV file.
Table 1.
The purpose of this task is to bridge the gap in NLP and social media for COVID-19 research performed in languages other than English. While there has been an increased amount of non-English datasets and tasks using social media, proposed in the last couple of years, there is still a need for different applications on pressing topics. This shared task is similar to the SMM4H 2021 shared task #6, which involves identifying personal mentions of COVID-19 symptoms in Spanish language tweets. Note that the annotated set of tweets for this task is a brand new set of curated Spanish-native language tweets and not a translation of the previous English-only data. The proposed task is a three-way classification problem, requiring participants to distinguish personal symptom mentions from other mentions such as symptoms reported by others and references to news articles or other sources. The target classes are:
- self-reports,
- non-personal reports,
- literature/news mentions
The proposed training dataset consists of 1,654 tweets labeled as self-reports, 2,413 tweets labeled as non-personal reports, and 5,985 labeled as literature/news mentions. The systems submitted for this task will be evaluated on precision, recall and F1-score.
- Training data: 10,052 tweets
- Validation data: 3,578 tweets
- Test data: 6,851 tweets
- Evaluation Metric: precision, recall and F1-score
Contact: Juan Banda, Georgia State University, USA (juan@jmbanda.com)
Codalab: https://codalab.lisn.upsaclay.fr/competitions/3535
Submission format
Submissions should contain two columns tweet_id and label separated by tab spaces.The labels required are: self-report, non-personal report, literature-news mentions. Any other different label will be ignored. All other columns will be ignored. Predictions for each task should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as a submission.
With the widespread rollout of COVID-19 vaccines, vaccine surveillance became a very pressing research issue. While some vaccinated people report adverse events via their healthcare providers to systems like Vaccine Adverse Event Reporting System (VAERS), or are found documented in their electronic health record (EHR), a more robust and convenient method could be devised using self-reports from social media. In this task we provide an annotated dataset of Twitter users personally reporting vaccination status and users discussing vaccination status but not revealing their own. This task is tricky in the sense that users discuss vaccination status of others or from news reports in similar ways than they discuss their own at a higher rate (1 to 8 on average). The dataset presents as the positive class, unambiguous tweets of users clearly stating that they have been vaccinated. All other tweets are of users discussing vaccination status. This task involves the identification of self-reported COVID-19 vaccination status in English tweets. As a two-way classification task, the two classes in the provided training dataset are:
- vaccination confirmation,
- vaccine related chatter
The class imbalance in this dataset is roughly 1 to 8, meaning that we will provide 1,496 tweets of vaccination confirmation, and 12,197 of vaccine chatter tweets. The systems submitted for this task will be evaluated on precision, recall and F1-score.
- Training data: 13,693 tweets
- Validation data: 2,784 tweets
- Test data: 5,923 tweets
- Evaluation Metric: precision, recall and F1-score
Contact: Juan Banda, Georgia State University, USA (juan@jmbanda.com)
Codalab: https://codalab.lisn.upsaclay.fr/competitions/3536
Submission format
Submissions should contain two columns tweet_id and label separated by tab spaces.The labels required are: 0 for vaccination confirmation and 1 for vaccine related chatter. Any other different label will be ignored. All other columns will be ignored. Predictions for each task should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as a submission.
Intimate partner violence (IPV), which refers to abuse or aggression that occurs in a romantic relationship, is a serious health problem that can have a lifelong impact on health and well-being. Recently, social media platforms have been increasingly used by IPV victims to share experiences and seek for help. To provide early intervention and timely support, an effective automatic self-reported IPV classifier is needed to detect the potential IPV victims on social media platforms. This task presents two challenges. First, the annotated data is significantly imbalanced where only around 11% of the tweets are identified as self-reported IPV. Second, the negative tweets include non-IPV domestic violence and non-self-reported IPV, which can hardly be distinguished from self-reported IPV by an automatic system. The data include annotated collections of posts on Twitter. They will be shared as .csv files. The training data is already prepared and will be available to the teams registering to participate. The testing data will be released when the evaluation phase starts.
- Training data: 4,523 posts
- Validation data: 534 posts
- Testing data: 1,291 post
- Evaluation metric: F1-score for the self-reported IPV class
Contact: Yuting Guo, Emory University, USA (yuting.guo@emory.edu)
Codalab: https://codalab.lisn.upsaclay.fr/competitions/1535
Data Examples:
Submission format
Submissions should contain tweet_id and label separated by tabspace in the same order as below.
Chronic stress is defined as the physiological or psychological response to a prolonged internal or external stressful event (i.e., a stressor), which can lead to poor mental health, including depression and anxiety, and can also take a toll on the body, resulting in the dysfunctions of cardiovascular, metabolic, endocrine, and immuno-inflammatory systems. Traditional methods of assessing stress, including interviews, questionnaires/surveys, etc., have some limitations in accurately measuring population-level stress. Thus, there is a critical need to develop innovative chronic stress assessment methods. Social media are potentially valuable resources for studying chronic stress and its characteristics, and the first step is to accurately detect the tweets that are self-disclosures of chronic stress. In this task, about 37% of the tweets are positive (self-disclosure of chronic stress, P) and 63% are negative (non-self-disclosure of chronic stress, N). Systems designed for this task need to automatically identify tweets in the self-disclosure category. Classifier evaluation will be based on the F1 score over the positive class.
- Training data: 2,936 tweets
- Validation data: 420 tweets
- Testing data: 839 tweets
- Evaluation metric: F1-score over the positive class
Contact: Yao Ge, Emory University, USA (yao.ge@emory.edu)
Codalab: https://codalab.lisn.upsaclay.fr/competitions/1542
Data examples
Submission Format
Submissions should contain tweet_id and label separated by tabspace in the same order as below.
Because social media is used by patients in every aspect of their daily lives, its analysis presents a promising way to understand the patient’s perspective on their disease journey, their unmet medical needs and their disease burden. Social media listening (SML) can, therefore, potentially support the progress in our understanding of a disease and influence the development of new therapies.
The detection of demographic information on social media is essential to address the differences in demographic characteristics (age, gender, ethnicity, medical history) between patients on social media and patients in target clinical populations. In this task, we focus on the automatic classification of social media forum (Reddit) posts into posts that self-report the exact age of the social media user at the time of posting (annotated as “1”) from those that do not (annotated as “0”). The dataset is disease-specific and consists of posts collected via a series of keywords associated with dry eye disease.
- Training data: 9000 posts
- Validation data: 1000 posts
- Test data: 2000 posts
- Evaluation metric: F1-score for the positive class (i.e. posts annotated as “1”)
Contact: Ana Lucia Schmidt, Roche Pharmaceuticals, Switzerland (lucia.schmidt@roche.com)
Codalab: https://codalab.lisn.upsaclay.fr/competitions/3646
The table below provides sample training data, which includes the Post ID, Post Text, and annotated binary label (Class). Posts were annotated as “1” if the user’s exact age could be determined, from the post, at the time the post was written. Posts 1 to 5, shown below, are annotated as “1” as they have the age explicitly stated in the text (1-3) or the age can be inferred from the information presented (4-5). Posts 2 and 3 are noteworthy examples of the forum’s convention to denote the age and gender of the users. Posts 4 and 5 exemplify two different cases in which the age can be inferred from an explicitly stated past age (40 years since the user was 7) or future age (one week until the user turns 40). Posts 6 to 13 are
annotated as “0” for a variety of reasons:
- The posts lack enough information to infer the age of the users (6-7)
- The age explicitly stated in the text refers to a third party and not the user writing the post (8-9)
- The age given in the post is not exact (10-11)
- The age reported refers to the past and thus there is no self-reported age at the time of the post was written (12)
- There is no age self-reported in the post (13)
This task will focus on the recognition of disease mentions in tweets written in Spanish after selecting primarily first-hand experience of diseases and other health-relevant content (from patient associations and professional healthcare institutions).
The aim is to use social media as a proxy to better understand societal perception of disease, from rare immunological and genetic diseases such as cystic fibrosis, highly prevalent conditions such as cancer and diabetes, to often controversial diagnoses such as fibromyalgia and even mental health disorders.
Automatic data selection actively retrieved posts with personal messages and from patient associations. Thus, the SocialDisNER shared task will enable training deep learning named entity recognition approaches to detect all kinds of disease mentions in social media, including both lay and professional language.
There will only be a single sub-track: NER offset detection and classification. Participants must find the beginning and end of disease mentions.
- Training data: ~6000 tweets
- Validation data: ~2000 tweets
- Test data: ~2000 tweets
- Evaluation metric: Strict Precision, Recall and F1-score
Contact:
Luis Gasco, Barcelona Supercomputing Center, Spain, luis.gasco@bsc.es
Darryl Estrada, Barcelona Supercomputing Center, Spain, darryl.estrada@bsc.es
Codalab: https://codalab.lisn.upsaclay.fr/competitions/3531
Task Website: https://temu.bsc.es/socialdisner/
Submission format: tab-separated file with headers, same format used in the validation set.