The Social Media Mining for Health Applications (#SMM4H) Workshop serves as a venue for bringing together researchers interested in automatic methods for the collection, extraction, representation, analysis, and validation of social media data (e.g., Twitter, Reddit) for health informatics. The 8th #SMM4H Workshop invites the submission of papers on original, completed, and unpublished research in all aspects at the intersection of social media mining and health. Paper submissions may consist of up to 4 pages (including references) and must follow the AMIA formatting requirements. In order for accepted papers to be included in the workshop proceedings, at least one author must register for and present at the #SMM4H 2023 Workshop.
Submit workshop papers here: TBA
Important Dates (tentative)
Graciela Gonzalez-Hernandez, Cedars-Sinai Medical Center, USA
Ari Z. Klein, University of Pennsylvania, USA
Ivan Flores, Cedars-Sinai Medical Center, USA
Abeed Sarker, Emory University, USA
Yuting Guo, Emory University, USA
Juan M. Banda, Georgia State University, USA
Raul Rodriguez-Esteban, Roche Pharmaceuticals, Switzerland
Lucia Schmidt, Roche Pharmaceuticals, Switzerland
Pierre Zweigenbaum, LISN, CNRS, Université Paris-Saclay, France
The Social Media Mining for Health Applications (#SMM4H) Shared Tasks address natural language processing (NLP) challenges of using social media data for health informatics, including informal, colloquial expressions, misspellings, noise, data sparsity, ambiguity, and multilingual posts. For each of the 5 tasks below, teams will be provided with annotated training and validation data to develop their systems, followed by 5 days during which they will run their systems on unlabeled test data and upload their predictions to CodaLab. The individual CodaLab site for each task can be found below. Teams may upload up to 2 sets of predictions per task.
Please use this form to register. When your registration is approved, you will be invited to a Google group, where the data sets will be made available. Registered teams are required to submit a paper describing their systems. System descriptions may consist of up to 2 pages (including references) and must follow the AMIA formatting requirements. Teams participating in multiple tasks are permitted an additional page. Sample system descriptions can be found in past proceedings. In order for accepted system descriptions to be included in the proceedings, at least one author must register for and present at the #SMM4H 2023 Workshop.
Submit system description papers here: TBA
To facilitate the use of Twitter data for monitoring personal experiences of COVID-19 in real time and on a large scale, this binary classification task involves automatically distinguishing tweets that self-report a COVID-19 diagnosis (annotated as “1”)—for example, a postitive test, clinical diagnosis, or hospitalization—from those that do not (annotated as “0”). By this definition, a tweet that merely states that the user has experienced COVID-19 would not be considered a diagnosis. The training data include the Tweet ID, the text of the Tweet Object, and the annotated binary label. System predictions for the validation and test data should be submitted through CodaLab. Submissions should be formatted as a ZIP file containing a TSV file with only two columns: the tweet_id column first and the label column second, separted by a tab. The TSV file should not be in a folder in the ZIP file, and the ZIP file should not contain any files or folders other than the TSV file. The TSV file should be named prediction_task1.tsv.
- Training data: 7,600 tweets
- Validation data: 400 tweets
- Test data: 10,000 tweets
- Evaluation metric: F1-score for the “positive” class (i.e., tweets that self-report a COVID-19 diagnosis)
Contact: Ari Klein, University of Pennsylvania, USA (email@example.com)
There is an abundance of health-related data on social networks, including chatter about therapies for health conditions. These therapies include but are not limited to medication, behavioral, and physical therapies. Social media subscribers who discuss such therapies often express their sentiments associated with the therapies. In this task, the focus will be to build a system that can automatically classify the sentiment associated with a therapy into one of three classes—positive, negative, and neutral. The annotated dataset for this task has been drawn from multiple preidentified Twitter cohorts (chronic pain, substance use disorder, migraine, chronic stress, long-COVID, and intimate partner violence). Thus, there is a high possibility that the therapies are being mentioned by people who are actually receiving/consuming them. The dataset consists of 5000 English Tweets containing mentions of a variety of therapies manually labeled as positive, negative, or neutral with the following approximate distribution: 20%, 14%, and 66%, respectively. The evaluation metric for this task is the micro-averaged F1-score over all 3 classes. The data include annotated collections of posts on Twitter which will be shared in csv files. There are 4 fields in the csv files: tweet_id, therapy, text, label. The training data is already prepared and will be available to the teams registering to participate. The testing data will be released when the evaluation phase starts.
- Training data: 3009 tweets
- Validation data: 753 tweets
- Testing data: TBA
- Evaluation metric: micro-averaged F1-score
Please use the format below for submission. Submissions should contain tweet_id and label separated by tabspace in the same order as below.
The unzipped submission data needs to be named as “answer.txt” and be zipped.
For more information, please refer to https://github.com/codalab/codalab-competitions/wiki/User_Building-a-Scoring-Program-for-a-Competition#directory-structure-for-submissions
Contact: Yuting Guo, Emory University, USA (firstname.lastname@example.org)
Google Group: email@example.com
Expanding on an #SMM4H 2022 task involving the classification of Spanish tweets that self-report COVID-19 symptoms, this task focuses on the detection and extraction of COVID-19 symptoms in tweets written specifically in Latin American Spanish. The task includes both personal self-reports and third-party mentions of symptoms, in an effort to generalize the identification of various disease symptoms in Latin American Spanish to both colloquial and formal language domains. The dataset consists of tweets annotated by medical doctors who are native Latin American Spanish speakers, including labels for whether or not the tweet mentions a symptom and the characters offsets of symptoms. In addition, participants will be provided with the dataset for the aforementioned #SMM4H 2022 task, along with BERT-like language models pretrained on Latin American Spanish tweets. The evaluation metric for this task is the strict F1-score for identifying the character offsets of COVID-19 symptoms. The task involves NER offset detection and classification. Participants must find the beginning and end of symptoms. Dataset annotation guidelines: Adapted annotation guideline derived from 2022’s SocialDisNER SMM4H shared task (available https://zenodo.org/record/6983041).
- Training data: 6,021 tweets
- Validation data: 1,979 tweets
- Test data: 2,150 tweets
- Evaluation metric: Strict F1-score
Tab-separated file with headers, same format used in the validation set.
Contact: Juan Banda, Georgia State University, USA (firstname.lastname@example.org)
Because social media is used by patients in every aspect of their daily lives, its analysis presents a promising way to understand the patient’s perspective on their disease journey, their unmet medical needs and their disease burden. Social media listening (SML) can, therefore, potentially support the progress in our understanding of a disease and influence the development of new therapies. SML, however, still has many challenges to overcome in order to do rigorous quantitative studies, one of them being the lack of confirmed diagnosis. The lack of diagnosis information, while difficult, becomes even relevant in the cases of mental disorders, where access to a diagnosis faces many barriers and patients tend to self-diagnose. Social Anxiety Disorder is a good example of this problem: the main barrier that prevents the patients from getting a diagnosis is the disease itself.
For this task we used a dataset extracted from the subreddit r/socialanxiety. The challenge is to build a classifier that correctly identifies patients that report having a positive or probable diagnosis of social anxiety disorder (positive cases labeled as ‘1’) from patients that report not having a diagnosis or the presence of a diagnosis is unlikely or unclear (negative cases labeled as ‘0’). For more details into the class annotation please refer to the annotation guidelines shared in the Google Group.
The dataset consists of 8117 posts written by users who range from 12 to 25 years old. Each row corresponds to a post and contains a unique identifier, the text and the diagnosis label.
- Training data (75%): 6090 posts
- Validation data (~8.4%): 680 posts
- Test data (~16.6%): 1347 posts
- Evaluation metric: F1-score for the positive class (i.e. posts annotated as “1”)
Table 1 provides sample training data, which includes the Post ID, Post Text, and annotated binary label (Class). Posts were annotated as “1” if the user’s reports having a positive or probable diagnosis of social anxiety disorder. Posts were annotated as “0” if the user reports not having a diagnosis or the presence of a diagnosis is unlikely or unclear.
An adverse drug event (ADE), or adverse drug reaction (ADR), is harm resulting from the use of a medication. In recent years, many studies have begun mining social media for the potential of early detection and novel discovery of ADEs. This task focuses on normalizing ADEs in tweets to their standard concept IDs in the MedDRA vocabulary.
To enable novel approaches, in contrast to previous iterations of this task, the evaluation metric will no longer require extracting the text span of the ADE. In addition to evaluating systems’ performance for all MedDRA IDs in the test set, this year, a second evaluation metric will be based on a zero-shot learning setup, evaluating systems’ performance specifically for MedDRA IDs in the test set that were not seen during training.
Participants will be provided with about 18,000 labeled tweets for training and about 10,000 tweets for testing. The training data include the Tweet ID, the text of the Tweet Object, the annotated binary label for whether or not the tweets contains an ADE, the character offsets of the ADE, the text span of the ADE, and the MedDRA ID—for example:
Contact: Dongfang Xu, Cedars-Sinai Medical Center, USA (Dongfang.Xu@cshs.org)