Social Media Mining for Health 2022 (#SMM4H)

  • Workshop
  • Shared Task 
  • Past events 

The Social Media Mining for Health Applications (#SMM4H) workshop serves as a venue for bringing together researchers interested in automatic methods for the collection, extraction, representation, analysis, and validation of social media data (e.g., Twitter, Facebook) for health informatics. The 7th #SMM4H Workshop, co-located at Coling 2022, invites the submission of papers on original, unpublished research in all aspects at the intersection of social media mining and health. Topics of interest include, but are not limited to:

  • Methods for the automatic detection and extraction of health-related concept mentions in social media
  • Mapping of health-related mentions in social media to standardized vocabularies
  • Deriving health-related trends from social media
  • Information retrieval methods for obtaining relevant social media data
  • Geographic or demographic data inference from social media discourse
  • Virus spread monitoring using social media
  • Mining health-related discussions in social media
  • Drug abuse and alcoholism incidence monitoring through social media
  • Disease incidence studies using social media
  • Sentinel event detection using social media
  • Semantic methods in social media analysis
  • Classifying health-related messages in social media
  • Automatic analysis of social media messages for disease surveillance and patient education
  • Methods for validation of social media-derived hypotheses and datasets

Important dates (tentative)

Workshop papers due Aug. 15
Acceptance notification Sept. 1
Camera ready paper Sept. 12
Workshop date October 2022

Paper Submission and Presentation Information

Paper submissions may consist of up to 4 pages, plus unlimited references, and must describe completed, original, and unpublished work. Papers may make small, focused contributions, but the work must be completed; we will not accept papers describing work-in-progress. We also will not accept papers that overlap significantly with papers that have been or will be published elsewhere, or are currently under consideration for other venues. All accepted papers are required to be presented orally or as a poster, as determined by the program committee, in order to be included in the workshop proceedings. At least one author of each accepted paper must register for #SMM4H 2021 to present.

All paper submissions must follow the Coling 2022 guidelines and be submitted as a PDF.

Submission link: TBA


Graciela Gonzalez-Hernandez, University of Pennsylvania, USA

Davy Weissenbacher, University of Pennsylvania, USA

Arjun Magge, University of Pennsylvania, USA

Ari Z. Klein, University of Pennsylvania, USA

Ivan Flores, University of Pennsylvania, USA

Karen O’Connor, University of Pennsylvania, USA

Raul Rodriguez-Esteban, Roche Pharmaceuticals, Switzerland

Lucia Schmidt, Roche Pharmaceuticals, Switzerland

Juan M. Banda, Georgia State University, USA

Abeed Sarker, Emory University, USA

Yuting Guo, Emory University, USA

Elena Tutubalina, Kazan Federal University, Russia

Vera Davydova, Kazan Federal University, Russia

Program Committee


Contact information

Davy Weissenbacher (

Shared Task 

Call for Participation – Shared Task

The Social Media Mining for Health Applications (#SMM4H) Shared Task involves natural language processing (NLP) challenges of using social media data for health research, including informal, colloquial expressions and misspellings of clinical concepts, noise, data sparsity, ambiguity, and multilingual posts. For each of the eight tasks below, participating teams will be provided with a set of annotated posts for developing systems, followed by a three-day window during which they will run their systems on unlabeled test data and upload the predictions of their systems to CodaLab. Information about registration, data access, paper submissions, and presentations can be found in the individual competition sections below.

Registration: here

Timeline (tentative)

Sample data set release Feb. 15
Training and validation set release Feb. 15 (latest March 31)
Validation set submission due Jul. 4
Test set release ~Jul. 11 (See each task for details)
Test set predictions due ~Jul. 15 (See each task for details)*
Test set evaluation scores release Jul. 25
System descriptions due Aug. 1
Acceptance notification Aug. 15
Camera ready system descriptions Sep. 1
* All deadlines are 11:59 PM UTC (3:59 PM PST), NO extension will be provided
Task 1 – Classification, detection and normalization of Adverse Events (AE) mentions in English tweets

In this task, modified from previous years, systems must (1) classify tweets reporting AEs, (2) detect their spans in the tweets and (3) map these colloquial mentions to their standard concept IDs in the MedDRA vocabulary. We will provide participants with 18,000 labeled tweets for training and 10,800 tweets for testing. Participants will have the option to participate in one or more subtasks: participants of the ADR classification sub task will be required to submit only the binary annotations ADR/noADR, while participants of the ADR span detection will be required to submit both the ADR classification labels and spans of expressed ADR. Participants of the ADR resolution subtask will be required to submit ADR classification labels, spans and normalization labels. This task presents multiple challenges and is likely to require methods going beyond simple applications of deep learning approaches to be successfully addressed. The classification task needs to take into account class imbalance where only around 7% of the tweets contain ADR. The span detection task requires advanced named entity recognition approaches. And the resolution task requires choosing a normalized concept from more than 23,000 MedDRA preferred terms.

Contact: Arjun Magge (

Task 2 – Classification and detection of Adverse Events (AE) mentions in Russian tweets

In this task, modified from the last two years, systems must classify and detect the text span of reported AEs in tweets. This task contains two subtasks in increasing order of complexity : (1) ADR classification, (2) ADR span detection. Participants of the shared task will be provided with a labeled training set containing tweet texts and ADR multi-level annotations with the option of participating in one or more subtasks. During the evaluation period, an unlabeled test set containing only the tweet texts will be provided for which the participants will be required to provide automated annotations. This task follows the guidelines and evaluation strategies of SMM4H 2022 Task 1. The task allows testing both monolingual and multilingual models (for example, multilingual BERT, XLM-R). We encourage participants to use not only tweets in Russian for training models, but also annotated tweets in English from SMM4H 2022 Task 1.

Contact: Elena Tutubalina (

Task 3 – Classification of changes in medication treatments in tweets and WebMD reviews

The designed binary classifier should detect tweets where Twitter users self-declare changing their medication treatments, regardless of being advised by a health care professional to do so. Such changes are, for example, not filling a prescription, stopping a treatment, changing a dosage, forgetting to take the drugs, etc. This task is the first step toward detecting patients non-adherent to their treatments and their reasons on Twitter. The data consists of two corpora: a set of tweets and a set of drug reviews from Negative and positive reviews are naturally balanced whereas positive and negative tweets are naturally imbalanced. Each set is split into a training, a validation, and a test subset. The participants will be given the training and validation subsets for both corpora and evaluated on both test sets independently. Participants are expected to submit their predictions for both test sets. This year, we will add in the test sets additional reviews and tweets as decoys to avoid manual corrections of the predicted labels. Evaluation script, annotation guidelines, and baseline code will be provided to registered participants.

  • Training data: 5,898 Tweets / 10,378 Reviews
  • Validation data: 1,572 Tweets / 1,297 Reviews
  • Test data: 2,360 Tweets / 1,297 Reviews
  • Evaluation metric: F1-score for the change class

Contact: Davy Weissenbacher (

Subtask 3a. Tweet Classification

Submission format: Please use the format below for submission. Submissions should contain two columns tweet_id and label separated by tabspaces. All other columns will be ignored. Predictions for each task should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as submission.

tweet_id label
123 0
435 1
276 0
167 0

Subtask 3b. WebMD Classification

Submission format: Please use the format below for submission. Submissions should contain two columns SOURCE_FILE and label separated by tabspaces. All other columns will be ignored. Predictions for each task should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as submission.

reviews_parsed/119_049.txt 1
reviews_parsed/219_879.txt 0
reviews_parsed/123_839.txt 0
reviews_parsed/179_022.txt 1
Task 4 – Classification of tweets self-reporting exact age

Advancing the utility of social media data for research applications requires methods for automatically detecting demographic information about social media study populations, including users’ age. Automatically identifying the exact age of social media users, rather than their age groups, would enable the large-scale use of social media data for applications that do not align with the predefined age groupings of extant models, including health applications such as identifying specific age-related risk factors for observational studies1, or selecting age-based study populations2. As a first step, this new binary classification task involves automatically distinguishing tweets that self-report the user’s exact age (annotated as “1”) from those that do not (annotated as “0”). Recent work3 presents a benchmark classifier, based on a RoBERTa-Large pretrained model, that achieves an F1-score of 0.914 for the “positive” class (i.e., tweets annotated as “1”) in the validation data.

  • Training data: 8,800 tweets
  • Validation data: 2,200 tweets
  • Test data: 10,000 tweets
  • Evaluation metric: F1-score for the “positive” class (i.e., tweets annotated as “1”)

Table 1 provides sample training data, which includes the Tweet ID, Tweet Object, and annotated binary label. Tweets were annotated as “1” if the user’s exact age could be determined, from the tweet, at the time the tweet was posted. In the first tweet, the user’s exact age is explicitly stated. Although the second tweet does not explicitly state the user’s age, it can be inferred from the fact that the user reports turning 20 tomorrow. The third tweet does not specify when the user will be 21, but it was annotated as “1” under the assumption that the tweet is referring to the user’s next birthday. The fourth tweet, however, was annotated as “0” because it is ambiguous about whether the user was 21 when the tweet was posted, or whether the user is referring to a future age. The fifth tweet was also annotated as “0” because it is ambiguous whether the user was 18 when the tweet was posted, or whether the user is referring to age further in the past. The sixth tweet was annotated as “0” because it does not refer to the age of the user, but rather the user’s brother.

Contact: Ari Klein (

Table 1.

Tweet ID Tweet text Class
759873324511924224 It’s my 21st birthday today. But who cares….. ITS FINALLY AUGUST!!!!!!!! That’s what really matters 😭😭😍😍😍💖💖💖💖💖 1
861628001485815809 It’s crazy, tomorrow I’ll be 20. I’m getting so OLD. 🤦🏽‍♀️ 1
802422959818145793 can’t believe im going to be 21 …. i actually want to be a teenager again with no responsibilities 😒 1
836614466846535680 I graduate in May only focusing on me and my child.. watch me at 21 😘 0
850592132356296705 Had just turned 18 then found out I was pregnant 2 weeks later 0
693155152279109632 Yesterday was my little bros 14th bday & I also found this sweet pic of us at my baby shower! Hes growing up on me 😢 0
Task 5 – Classification of tweets containing self-reported COVID-19 symptoms in Spanish

The purpose of this task is to bridge the gap in NLP and social media for COVID-19 research performed in languages other than English. While there has been an increased amount of non-English datasets and tasks using social media, proposed in the last couple of years, there is still a need for different applications on pressing topics.  This shared task is similar to the SMM4H 2021 shared task #6, which involves identifying personal mentions of COVID-19 symptoms in Spanish language tweets. Note that the annotated set of tweets for this task is a brand new set of curated Spanish-native language tweets and not a translation of the previous English-only data. The proposed task is a three-way classification problem, requiring participants to distinguish personal symptom mentions from other mentions such as symptoms reported by others and references to news articles or other sources. The target classes are:

  1. self-reports,
  2. non-personal reports,
  3. literature/news mentions

The proposed training dataset consists of 1,654 tweets labeled as self-reports, 2,413 tweets labeled as non-personal reports, and 5,985 labeled as literature/news mentions. The systems submitted for this task will be evaluated on precision, recall and F1-score.

  • Training data: 10,052 tweets
  • Validation data: 3,578 tweets
  • Test data: 6,851 tweets
  • Evaluation Metric: precision, recall and F1-score 

Contact: Juan Banda (

Submission format

Submissions should contain two columns tweet_id and label separated by tab spaces.The labels required are: self-report, non-personal report, literature-news mentions. Any other different label will be ignored. All other columns will be ignored. Predictions for each task should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as a submission. 

Task 6 – Identification of tweets which indicate self-reported COVID-19 vaccination status

With the widespread rollout of COVID-19 vaccines, vaccine surveillance became a very pressing research issue. While some vaccinated people report adverse events via their healthcare providers to systems like Vaccine Adverse Event Reporting System (VAERS), or are found documented in their electronic health record (EHR), a more robust and convenient method could be devised using self-reports from social media. In this task we provide an annotated dataset of Twitter users personally reporting vaccination status and users discussing vaccination status but not revealing their own. This task is tricky in the sense that users discuss vaccination status of others or from news reports in similar ways than they discuss their own at a higher rate (1 to 8 on average). The dataset presents as the positive class, unambiguous tweets of users clearly stating that they have been vaccinated. All other tweets are of users discussing vaccination status. This task involves the identification of self-reported COVID-19 vaccination status in English tweets. As a two-way classification task, the two classes in the provided training dataset are:

  1. vaccination confirmation,
  2. vaccine related chatter

The class imbalance in this dataset is roughly 1 to 8, meaning that we will provide 1,496 tweets of vaccination confirmation, and 12,197 of vaccine chatter tweets. The systems submitted for this task will be evaluated on precision, recall and F1-score.

  • Training data: 13,693 tweets
  • Validation data: 2,784 tweets
  • Test data: 5,923 tweets
  • Evaluation Metric: precision, recall and F1-score

Contact: Juan Banda (

Submission format

Submissions should contain two columns tweet_id and label separated by tab spaces.The labels required are: 0 for vaccination confirmation and 1 for vaccine related chatter. Any other different label will be ignored. All other columns will be ignored. Predictions for each task should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as a submission.

Task 7 – Classification of self-reported intimate partner violence on Twitter

Intimate partner violence (IPV), which refers to abuse or aggression that occurs in a romantic relationship, is a serious health problem that can have a lifelong impact on health and well-being. Recently, social media platforms have been increasingly used by IPV victims to share experiences and seek for help. To provide early intervention and timely support, an effective automatic self-reported IPV classifier is needed to detect the potential IPV victims on social media platforms. This task presents two challenges. First, the annotated data is significantly imbalanced where only around 11% of the tweets are identified as self-reported IPV. Second, the negative tweets include non-IPV domestic violence and non-self-reported IPV, which can hardly be distinguished from self-reported IPV by an automatic system. The data include annotated collections of posts on Twitter. They will be shared as .csv files. The training data is already prepared and will be available to the teams registering to participate. The testing data will be released when the evaluation phase starts.

  • Training data: 4,523 posts
  • Validation data: 534 posts
  • Testing data: 1,291 post
  • Evaluation metric: F1-score for the self-reported IPV class

Contact: Yuting Guo (

Link to Codalab:

Data Examples:

tweet_id text label
12453 I didn’t get married because I wanted too.
I didn’t have much choice I  an abusive relationship. I was so scared to say no for what the repercussions would be. I didn’t want to be homeless. I loved him at the time but marriage is not something I dreamed about it aspired too.
13349 RT @Richkid_life: Domestic violence is a sensitive topic so I just stay away from it all together 0

Submission format

Submissions should contain tweet_id and label separated by tabspace in the same order as below.

tweet_id label
234 1
414 0
611 0
876 0
Task 8 – Classification of self-reported chronic stress on Twitter

Chronic stress is defined as the physiological or psychological response to a prolonged internal or external stressful event (i.e., a stressor), which can lead to poor mental health, including depression and anxiety, and can also take a toll on the body, resulting in the dysfunctions of cardiovascular, metabolic,

endocrine, and immuno-inflammatory systems. Traditional methods of assessing stress, including interviews, questionnaires/surveys, etc., have some limitations in accurately measuring population-level stress. Thus, there is a critical need to develop innovative chronic stress assessment methods. Social media are potentially valuable resources for studying chronic stress and its characteristics, and the first step is to accurately detect the tweets that are self-disclosures of chronic stress. In this task, about 37% of the tweets are positive (self-disclosure of chronic stress, P) and 63% are negative (non-self-disclosure of chronic stress, N). Systems designed for this task need to automatically identify tweets in the self-disclosure category. Classifier evaluation will be based on the F1 score over the positive class. Training data: 3,356 tweets; evaluation data: 839 tweets.

Contact: Abeed Sarker (