As a follow-up to #SMM4H 2020 Task 5, which focused on birth defect outcomes, this new binary classification task involves automatically distinguishing tweets that report a personal experience of an adverse pregnancy outcome (annotated as “1”) such as miscarriage, stillbirth, preterm birth, low birthweight, and neonatal intensive care from those that do not (annotated as “0”).
- Training data: 6,487 tweets
- Test data: 10,000 tweets
Register your team here : https://forms.gle/1qs3rdNLDxAph88n6
After registration approval, you will be invited to join the Google group for the task. Link to the dataset is available in the Google groups banner. If you do not receive the invite please request to join the Google group with team name using the link below.
Google groups : https://groups.google.com/g/smm4h21-task-4
Link to Codalab : https://competitions.codalab.org/competitions/28766
Evaluation Period for Task 4 :
Test Dataset Release | 27th Feb 2021 12:00am UTC |
Predictions Due | 1st Mar 2021 11:59pm UTC (3:59pm PST) |

Submission format: Please use the format below for submission. Submissions should contain two columns tweet_id and label separated by tabspaces. All other columns will be ignored. Predictions for each task should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as submission.
tweet_id | label |
35656 | 0 |
34637 | 1 |
56844 | 0 |
12735 | 1 |
05745 | 0 |
24677 | 0 |
Evaluation Metric : F1-score for the “positive” class (i.e., tweets annotated as “1”)
Contact information: Ari Klein (ariklein@pennmedicine.upenn.edu)
References:
- Klein AZ, Cai H, Weissenbacher D, Levine LD, Gonzalez-Hernandez G. A Natural Language Processing Pipeline to Advance the Use of Twitter Data for Digital Epidemiology of Adverse Pregnancy Outcomes. Journal of Biomedical Informatics: X. 2020; 100076.
- Klein AZ, Gonzalez-Hernandez G. An Annotated Data Set for Identifying Women Reporting Adverse Pregnancy Outcomes on Twitter. Data Brief. 2020;32:106249.