
The Social Media Mining for Health (#SMM4H) Workshop provides an interdisciplinary forum to present and discuss natural language processing, machine learning, and artificial intelligence at the convergence of social media and health. For the 10th #SMM4H Workshop, co-located at ICWSM 2025, we are broadening the scope to include additional web-based sources of "health real-world data" (HeaRD), inviting the submission of papers on original, completed, and unpublished research in all aspects at the intersection of web-based text data and health. Paper submissions may consist of up to 4 pages (plus unlimited references) and must follow the AAAI formatting guidelines. In order for accepted papers to be included in the workshop proceedings, at least one author must register for and present at the #SMM4H-HeaRD 2025 Workshop.
Submit workshop papers here: https://openreview.net/group?id=ICWSM.org/2025/Workshop/SMM4H-HeaRD
Important Dates
Submission deadline: April 21, 2025
Notification of acceptance: May 5, 2025
Camera-ready papers due: May 12, 2025
Workshop: June 23, 2025
Workshop Chair
Graciela Gonzalez-Hernandez, Cedars-Sinai Medical Center, USA
Organizers
Graciela Gonzalez-Hernandez, Cedars-Sinai Medical Center, USA
Dongfang Xu, Cedars Sinai-Medical Center, USA
Takeshi Onishi, Cedars-Sinai Medical Center, USA
Guillermo Lopez-Garcia, Cedars-Sinai Medical Center, USA
Ivan Flores, Cedars-Sinai Medical Center, USA
Ari Z. Klein, University of Pennsylvania, USA
Abeed Sarker, Emory University, USA
Jeanne Powell, Emory University, USA
Swati Rajwal, Emory University, USA
Pierre Zweigenbaum, LISN, CNRS, Université Paris-Saclay, France
Lisa Raithel, Technische Universität Berlin, Germany
Roland Roller, DFKI, Germany
Philippe Thomas, DFKI, Germany
Elena Tutbalina, AIRI, Kazan Federal University, Russia
Tirthankar Dasgupta, TCS, India
Manjira Sinha, TCS, India
Sudeshna Jana, TCS, India
Sedigh Khademi, MCRI, Australia
The Social Media Mining for Health (#SMM4H) shared tasks address natural language processing, machine learning, and artificial intelligence challenges inherent to utilizing social media data for health-related research. For the 10th #SMM4H Workshop, co-located at ICWSM 2025, we are broadening the scope of the shared tasks to include additional web-based sources of "health real-world data" (HeaRD). For each of the 6 tasks below, teams will be provided with annotated training and validation data to develop their systems, followed by 5 days during which they will run their systems on unlabeled test data and upload their predictions to CodaLab.
Please use this form to register. When your registration is approved, you will be invited to a Google group, where the datasets will be made available. Registered teams are required to submit a paper describing their systems. System descriptions may consist of up to 2 pages (plus unlimited references) and must follow the AAAI formatting guidelines. Teams participating in multiple tasks are permitted an additional page. Sample system descriptions can be found in past #SMM4H Workshop proceedings (e.g., #SMM4H 2024). In order for accepted system descriptions to be included in the proceedings, at least one author must register for and present at the #SMM4H-HeaRD 2025 Workshop.
Submit system description papers here: https://openreview.net/group?id=ICWSM.org/2025/Workshop/SMM4H-HeaRD
Important Dates
Training and validation data available: February 14, 2025
System predictions for validation data due: March 31, 2025
Test data available: April 7, 2025
System predictions for test data due: April 11, 2025
Submission deadline for system description papers: April 21, 2025
Notification of acceptance: May 5, 2025
Camera-ready papers due: May 12, 2025
Workshop: June 23, 2025
Adverse Drug Events (ADEs) are negative medical side effects associated with a drug. Extracting ADE mentions from user-generated text has gained significant attention in research, as it can help detect crowd signals from online discussions. Leveraging multilingual methods to analyze ADE reports across languages and borders further enhances this effort.
For this shared task, we provide messages from patient forums, each labeled according to the presence of an ADE. A message with a positive label (1) contains at least one mention of an ADE, while a message with a negative label (0) does not.
Task
This is a binary classification task. Given a social media post, participants are supposed to develop a system that predicts whether the post contains a mention of an Adverse Drug Event (ADE). The system should output either 1 (positive, ADE mentioned) or 0 (negative, no ADE mentioned).
Data
The dataset consists of user-generated social media messages, where mentions of medications and medical symptoms can be highly variable and sometimes ambiguous. Additionally, the dataset is relatively small, with fewer than 2,000 documents per language. The labels are also highly imbalanced, with the positive class (posts mentioning an ADE) making up only about 1% of the data.
Participants will receive the following data:
- Training / Validation:
- German: TRAIN + DEV documents collected from a German patient forum ("lifeline.de")
- French: TRAIN + DEV documents collected from a German patient forum ("lifeline.de") and translated to French. The French training data is distinct from the German training data.
- Russian: TRAIN + DEV documents collected from a site with user reviews about drugs in Russian. These sets are based on the RuDReC dataset.
- English: TRAIN + DEV documents collected from Twitter.
- Test:
- German: TEST documents collected from a German patient forum ("lifeline-de")
- French: TEST documents collected from a German patient forum ("lifeline.de") and translated to French. The French test data is the translated version of the German test data.
- Russian: TEST documents collected from a site with user reviews about drugs in Russian.
- English: TEST documents collected from Twitter.
Data Format
The data are provided in CSV format. Each document includes a language identifier, a unique ID, and a label, as shown below:

Evaluation
Submissions will be ranked based on the unweighted macro F1-score, Precision, and Recall across all languages. Our evaluation script and a baseline model will be published online.
Task Organizers
- Pierre Zweigenbaum, Université Paris-Saclay, LISN, CNRS, France
- Lisa Raithel, BIFOLD, TU Berlin (XplaiNLP group), Germany
- Roland Roller, DFKI GmbH, Germany
- Philippe Thomas, DFKI GmbH, Germany
- Elena Tutubalina, AIRI, Russia
- Dongfang Xu, Cedars-Sinai Medical Center, USA
- Takeshi Onishi, Cedars-Sinai Medical Center, USA
Contact: Lisa Raithel (raithel@tu-berlin.de)
Google group: https://groups.google.com/g/smm4h-2025-task-1
CodaLab: https://codalab.lisn.upsaclay.fr/competitions/21886
Substance use, both prescription and illicit, has become a significant public health concern, leading to addiction, overdose, and associated health issues. Understanding the clinical impacts and social impacts of nonmedical substance use is essential for improving the treatment of substance use disorder. It helps healthcare professionals develop more effective interventions and medications to address addiction. By studying these impacts, researchers can develop more effective prevention and education programs to reduce the occurrence of nonmedical substance use and its associated clinical and social consequences. In this named entity recognition task, we focus on two entity types: clinical impacts and social impacts. Instances in the clinical impacts category describe the clinical effects, consequences, or impacts of substance use on individuals' health, physical condition, or mental well-being. Instances the social impacts describe the societal, interpersonal, or community-level effects, consequences, or impacts of nonmedical substance use. These impacts may include social relationships, community dynamics, or broader social issues. In this task, 27.8% of posts contain words or phrases marked as clinical or social impacts. Systems designed for this task need to detect these impacts and automatically distinguish between clinical impacts and social impacts in text data derived from Reddit, with specific spans. Specifically, we anticipate that the strategies will involve leveraging Large Language Models (LLMs).
- Training data: 843 posts
- Validation data: 259 posts
- Test data: 278 posts
- Evaluation metric: F1-score
Participants of this task must sign a data use agreement (DUA) confirming that the data will not be redistributed.
Data Examples
Text= “In PA at a 28 day detox/rehab they used methadone to get me off of bupe.”

Submission Format
Please submit using the same format as Data Example.
Task Organizer: Swati Rajwal, Emory University, USA (swati.rajwal@emory.edu)
Google group: https://groups.google.com/g/smm4h-2025-task-2
CodaLab: https://codalab.lisn.upsaclay.fr/competitions/22203
A range of traditional interventions has been developed to support family caregivers of people with dementia; however, most of them have not been implemented in practice and remain largely inaccessible. Recent systematic reviews have concluded that internet-based interventions are valued by family caregivers of people with dementia for their easy access and can have beneficial effects on caregivers' health. While recent studies have shown that family caregivers of people with dementia use Twitter to discuss their experiences, methods have not been developed to enable the use of Twitter as a platform for internet-based interventions. This binary classification task involves automatically distinguishing English-language tweets that report having a family member with dementia (annotated as "1") from tweets that merely mention dementia (annotated as "0"). Sample tweets are shown in the table below. The training, development, and test sets contain 6,724 tweets, 353 tweets, and 1,769 tweets, respectively. The evaluation metric is the F1-score for the class of tweets that report having a family member with dementia. A benchmark classifier, based on fine-tuning a BERTweet pretrained model, achieved an F1-score of 0.96. Participants are encouraged to use large language model (LLM) prompting to compare with the high performance of this fine-tuned model.

Task Organizer: Ari Klein, University of Pennsylvania, USA (ariklein@pennmedicine.upenn.edu)
Google group: https://groups.google.com/g/smm4h-2025-task-3
CodaLab: https://codalab.lisn.upsaclay.fr/competitions/22022
Insomnia is a prevalent sleep disorder that severely impacts sleep quality and overall health. It is associated with a range of serious outcomes including psychiatric issues, increased work absenteeism, and a heightened risk of accidents. Despite its widespread occurrence and significant health implications, insomnia remains largely underdiagnosed. Therefore, there is a critical need to develop effective methods for detecting insomnia, which would improve our understanding of its prevalence, associated risk factors, its progression over time, and the efficacy of treatments.
This new shared task aims to the development of automatic systems for identify patients potentially suffering from insomnia using electronic health records (EHRs). It is structured as a text classification challenge requiring participants to analyze a clinical note to determine if a patient is likely to have insomnia.
We have developed a comprehensive set of rules (Insomnia rules) to facilitate the identification of patients potentially suffering from insomnia. These rules incorporate both direct and indirect symptoms of insomnia and include information about commonly prescribed hypnotic medications. For this task, we have curated an annotated corpus of 210 clinical notes from the MIMIC III database, adhering to the Insomnia rules during the annotation process. Each note is annotated with a binary label indicating the patient’s overall insomnia status ("yes" or "no"), and at the rule-level to indicate the satisfaction of each rule based on the note’s content. Additionally, to enhance the explainability of participating NLP systems, we provide textual evidence from the clinical notes that support each annotation. This ensures that the outputs of the systems can be effectively justified.
Participants are encouraged to use large language models (LLMs) to tackle the Insomnia detection task. This shared task serves as an exceptional benchmark to assess the reasoning capabilities of LLMs in medicine, applying a realistic set of diagnostic guidelines to real-world clinical data.
Task
This text classification shared task is divided into three distinct subtasks:
- Subtask 1: Binary text classification. Assess whether the patient described in a clinical note is likely to have insomnia ("yes" or "no"). Evaluation is based on the F1 score, treating "yes" as the positive class.
- Subtask 2A: Multi-label text classification. Evaluate each clinical note against the defined Insomnia rules: Definition 1, Definition 2, Rule A, Rule B, and Rule C, predicting "yes" or "no" for each item. The micro-average F1 score is the primary metric, with "yes" treated as the positive class.
- Subtask 2B: Evidence-Based Classification. This task extends Subtask 2A by requiring not only classification of each item but also the identification and extraction of text evidence from the clinical note that supports each classification. For items Definition 1, Definition 2, Rule B, and Rule C, participants must provide a label ("yes" or "no") and include specific text spans from the note that justify the classification. The alignment of text spans with the reference spans from the clinical notes will be assessed using BLEU and ROUGE metrics. This subtask focuses on promoting transparency and explainability in NLP models by requiring justification for each decision made.
Corpus
This shared task utilizes a corpus of clinical notes derived from the MIMIC-III Database. Participants must complete necessary training and sign a data usage agreement to access the MIMIC-III Clinical Database (v1.4). Once access is granted, they must run the text_mimic_notes.py script to retrieve clinical notes and associated patient information using the provided note IDs, as detailed in the instructions provided at README file.
Annotations and Submission Format
For each subtask, ground truth annotations are provided in JSON format. Participants are required to submit their system outputs following the same format as the ground truth annotations provided by the organizers. To illustrate the submission format, a sample of the training set annotations is available at data/training. Additionally, the complete set of Insomnia rules utilized for annotating the corpus can be found at resources/Insomnia_Rules.md file.
Below, we provide two samples of notes annotated with Subtask 1 information:

Task Organizer: Guillermo Lopez-Garcia, Cedars-Sinai Medical Center, USA (Guillermo.LopezGarcia@cshs.org)
Google group: https://groups.google.com/g/smm4h-2025-task-4
GitHub repository: https://github.com/guilopgar/SMM4H-HeaRD-2025-Task-4-Insomnia
CodaLab: https://codalab.lisn.upsaclay.fr/competitions/22509
Foodborne diseases and food recalls are major public health concerns, with outbreaks causing significant illness, and recalls aiming to prevent further harm to consumers. This shared task focuses on the development of automated systems for detecting foodborne disease outbreaks and recall events from news articles, particularly those in the form of FDA press releases. The task will allow researchers and practitioners to explore the capabilities of NLP in real-world, unstructured data extraction, specifically aimed at improving public health responses and consumer safety. Participants will be tasked with addressing two subtasks:
Subtask 1 (multi-class classification): Participants will build a model that classifies sentences from news articles into one of three categories: (1) Food Recall, (2) Foodborne Disease Outbreak, or (3) Neither.
Subtask 2 (entity and event extraction): Participants will develop a system that extracts key entities and events from text, including:
- Target Organization (e.g., company or regulatory body)
- Product Name (e.g., food item being recalled or related to an outbreak)
- Infection Name (e.g., the disease responsible for the outbreak)
- Safety Incident (e.g., contamination or allergic reaction)
- Number of People Affected (e.g., how many individuals have been reported affected by the recall or outbreak)
Data:
The dataset will consist of English-language news articles, press releases, and other publicly available documents related to foodborne diseases and recalls. The articles will be sourced from a combination of FDA press releases, news websites, and scientific publications. These documents have been annotated to highlight relevant events, entities, and categories.
Evaluation Metrics:
Subtask 1 (classification): Accuracy: The proportion of correctly classified sentences out of the total number of sentences; Precision, Recall, and F1-score: Calculated for each of the three categories (Recall, Outbreak, Neither).
Subtask 2 (entity and event extraction): Precision, Recall, and F1-score: For each type of entity (e.g., Product Name, Target Organization, Infection Name, etc.), as well as for the overall event extraction (including the relations between entities); Exact Match: An evaluation based on the percentage of correctly extracted entities and events, with the exact match between the predicted and true entities being counted.
Example:

Submission Details:
Participants should submit their results in the required format, which will include predictions for both classification (Subtask 1) and entity/event extraction (Subtask 2). All models will be tested on a held-out set of news articles that were not included in the training data.
Task Organizer: Tirthankar Dasgupta, TCS Research, India (dasgupta.tirthankar@tcs.com)
Google group: https://groups.google.com/g/smm4h-2025-task-5
CodaLab: https://codalab.lisn.upsaclay.fr/competitions/22154
The objective of this binary classification task is to distinguish Reddit posts that contain personal mentions of adverse reactions to herpes zoster (shingles) vaccines from other vaccine-related discussions. Sample posts are shown in the table below. The training, validation, and test sets contain 2,521 posts, 786 posts, and 629 posts, respectively. The evaluation metric is the F1-score for the "positive" class. Benchmark classifiers are presented in this article.

Task Organizer: Sedigh Khademi, Murdoch Children's Research Institute, Australia (sedigh.khademi@mcri.edu.au)
Google group: https://groups.google.com/g/smm4h-2025-task-6
CodaLab: https://codalab.lisn.upsaclay.fr/competitions/22159

SMM4H 2024 (proceedings)
SMM4H 2023 (overview paper)
SMM4H 2022 (proceedings)
SMM4H 2021 (proceedings)
SMM4H 2020 (proceedings)
SMM4H 2019 (proceedings)
SMM4H 2018 (proceedings)
SMM4H 2017 (proceedings)
SMM4H 2016 (proceedings)
