Social Media Mining for Health Applications (#SMM4H) Shared Task 2023

The Social Media Mining for Health Applications (#SMM4H) Shared Tasks address natural language processing (NLP) challenges of using social media data for health informatics, including informal, colloquial expressions, misspellings, noise, data sparsity, ambiguity, and multilingual posts. For each of the 5 tasks below, teams will be provided with annotated training and validation data to develop their systems, followed by 5 days during which they will run their systems on unlabeled test data and upload their predictions to CodaLab. The individual CodaLab site for each task can be found below. Teams may upload up to 2 sets of predictions per task. Please use this form to register. When your registration is approved, you will be invited to a Google group, where the data sets will be made available. Registered teams are required to submit a paper describing their systems. System descriptions may consist of up to 2 pages (including references) and must follow the AMIA formatting requirements. Teams participating in multiple tasks are permitted an additional page. Sample system descriptions can be found in past proceedings. In order for accepted system descriptions to be included in the proceedings, at least one author must register for and present at the #SMM4H 2023 Workshop. Submit system description papers here: TBA  

Important Dates

Training and validation data available April 24, 2023
System predictions for validation data due June 30, 2023 (23:59 CodaLab server time)
Test data available July 10, 2023
System predictions for test data due July 14, 2023 (23:59 CodaLab server time)
Submission deadline for system description papers August 11, 2023
Notification of acceptance September 15, 2023
Camera-ready papers due September 29, 2023
Workshop November 11 or 12, 2023 (TBA)


Task 1 – Binary classification of English tweets self-reporting a COVID-19 diagnosis Details
Task 2 – Multi-class classification of sentiment associated with therapies in English tweets Details
Task 3 – Extraction of COVID-19 symptoms in Latin American Spanish tweets Details
Task 4 – Normalization of symptoms in English Reddit posts Details