SMM4H 2023 – Task 4 – Normalization of symptoms in English Reddit posts

Because social media is used by patients in every aspect of their daily lives, its analysis presents a promising way to understand the patient’s perspective on their disease journey, their unmet medical needs and their disease burden. Social media listening (SML) can, therefore, potentially support the progress in our understanding of a disease and influence the development of new therapies. This pipeline task involves automatically detecting Reddit posts that report symptoms, extracting the spans of symptoms in the posts, and mapping the extracted symptoms to their standard concept ID in the MedDRA vocabulary. The annotated dataset consists of posts that were sampled from a collection of the r/socialanxiety subreddit, including labels for whether or not the post reports a symptom, characters offsets of symptoms, and MedDRA IDs.

  • Training data: TBA
  • Validation data: TBA
  • Test data: TBA
  • Evaluation metric: micro-averaged F1-score for MedDRA ID

Contact: Lucia Schmidt, Roche Pharmaceuticals, Switzerland (

CodaLab: TBA