Task 4 : Classification of tweets self-reporting adverse pregnancy outcomes.

As a follow-up to #SMM4H 2020 Task 5, which focused on birth defect outcomes, this new binary classification task involves automatically distinguishing tweets that report a personal experience of an adverse pregnancy outcome (annotated as “1”) such as miscarriage, stillbirth, preterm birth, low birthweight, and neonatal intensive care from those that do not (annotated as “0”).

  • Training data: 6,487 tweets
  • Test data: 10,000 tweets

Register your team here : https://forms.gle/1qs3rdNLDxAph88n6
Link to Codalab : Available Feb 1 2021

Attach screenshot below

Examples of annotations

Evaluation Metric : F1-score for the “positive” class (i.e., tweets annotated as “1”)

Contact information: Ari Klein (ariklein@pennmedicine.upenn.edu)


  1. Klein AZ, Cai H, Weissenbacher D, Levine LD, Gonzalez-Hernandez G. A Natural Language Processing Pipeline to Advance the Use of Twitter Data for Digital Epidemiology of Adverse Pregnancy Outcomes. Journal of Biomedical Informatics: X. 2020; 100076.
  2. Klein AZ, Gonzalez-Hernandez G. An Annotated Data Set for Identifying Women Reporting Adverse Pregnancy Outcomes on Twitter. Data Brief. 2020;32:106249.