Social Media Mining for Health Applications (SMM4H) Workshop & Shared Task

(For the Workshop, please follow this link…)

Shared Task

The proposed SMM4H shared tasks involve NLP challenges on social media mining for health monitoring and surveillance and in particular pharmacovigilance. This requires processing noisy, real-world, and substantially creative language expressions from social media. The proposed systems should be able to deal with many linguistic variations and semantic complexities in various ways people express medication-related concepts and outcomes.

Similar to the first and second runs of the shared tasks, the data will include of medication-related posts on Twitter. The training data (which includes the train and test sets from previous runs) is already prepared and will be available to the teams registering to participate. We will prepare the evaluation data in the following months. There will be a one-week window during which teams will be able to run their systems on a blind evaluation data. The shared task will include four subtasks:

  1. Automatic classification of tweets describing medication intake. Participants are expected to build a three-way classification system to distinguish between tweets that express definite vs possible medication intake, and non-intake (even though the tweets mentioned the medication names). This is the second run of this task and approximately 18,000 annotated tweets will be made available for training with around 5,000 tweets for evaluation.
  2. Automatic classifications of adverse effects mentions in tweets. The designed system for this sub-task should be able to distinguish tweets reporting an adverse effect (AE) from those that do not, taking into account subtle linguistic variations between adverse effects and indications (the reason to use the medication). This is a rerun of the popular classification task organized in 2016 and 2017. The data set will consist of approximately 25,000 tweets for training and 5000 for evaluation.
  3. Extraction of AE mentions. As a follow-up step of subtask ii, this task will include and identifying the text span of the reported AEs and distinguishing AEs from similar non-AE expressions. AEs are multi-token, descriptive, expressions, so this subtask would require advanced named entity recognition approaches. The data for this sub-task includes 2000+ tweets which are fully annotated for mentions of AEs and indications. This set contains a subset of the tweets from sub-task ii tagged as hasADR plus a random set of 800 nonADR tweets. The nonADR subset was annotated for mentions of indications, in order to allow participants to develop techniques to deal with this confusion class.
  4. Normalization of adverse drug reaction mentions. This is a mapping task where systems must map colloquial mentions of adverse reactions to standard concept IDs in the MEDDRA vocabulary (preferred terms). It requires a concept normalization system that receives ADR mentions, understands their semantic interpretations, and mapping those to standard concept IDs. As we have seen in the first run, this task is more challenging and requires a semi-supervised approach. About 9000 annotated mappings will be made available for training and 5000 will be made available for evaluation.

The task presents several interesting challenges including the noisy nature of the data, the informal language of the user posts, misspellings, and data imbalance. It has been shown in past research that automated systems frequently underperform when exposed to social media text because of the presence of novel/creative phrases and misspellings, and frequent use of idiomatic, ambiguous and sarcastic expressions. The tasks will thus act as a discovery and verification process of what approaches work best for social media data.

TASK 2: Automatic classification of posts describing medication intakethree-class classification

Systems are required to distinguish tweets that present personal medication intakepossible medication intake and non-intake. This is the second execution of this subtask. The class descriptions are as follows:

  • personal medication intake – tweets in which the user clearly expresses a personal medication intake/consumption. (1)
  • possible medication intake – tweets that are ambiguous but suggest that the user may have taken the medication. (2)
  • non-intake – tweets that mention medication names but do not indicate personal intake. (3)


  • Training data: Over 17,000 tweets manually categorized into the three classes will be provided (which includes past year’s training and test data).
  • Evaluation data: approximately 8000 annotated tweets
  • Evaluation metric: micro-averaged F-score for the intake and possible intake classes

For each tweet, the publicly available data set contains: (i) the Tweet ID, (ii) the User ID, (iii) our database ID, and (iv) the binary annotation indicating the class, as shown below. The evaluation data will contain the same information, but without the classes. Participating teams should submit their results in the same format as the training set (shown below).

Tweet ID                           User ID                       Database ID                      Class

707959308504305664  S_Cavallii        med-int-17996        1
788971260239876096  britt20_          med-int-17997        1
529684196479205376  Keezy_TaughtYou   med-int-17998        2
676320685526933504  HotCheeteauxs     med-int-17999        3

Training data can be downloaded from here: smm4h-EMNLP-task2-trainingsets. Please download the full data set. There are 3 files, representing last year’s training sets and test set. Use the download script provided within the folder to download the contents of the available tweets.

TASK 3: Automatic classification of adverse drug reaction mentioning postsbinary classification

Systems have to distinguish between Twitter posts that contain adverse drug reaction mention versus those that do not. This is a popular task rerun from the first two shared tasks organized in 2016. A new, blind data set will be used for evaluation and an extended training set will be provided to the participants.


  • Training data: Approximately 25,000 annotated tweets will be provided for training.
  • Evaluation data: approximately 8,000 tweets.
  • Evaluation metric: F-score for the ADR/positive class.

For each tweet, the publicly available data set contains: (i) the user ID, (ii) the tweet ID, and (iii) the binary annotation indicating the presence or absence of ADRs, as shown below. The evaluation data will contain the same information, but without the classes. Participating teams should submit their results in the same format as the training set (shown below).

Tweet ID                            User ID             Class

354256195432882177  54516759     0
352456944537178112  1267743056   1
332479707004170241  273421529    0
340660708364677120  135964180    1

Training data and download scripts are available at the following link: smm4h-EMNLP-trainingset3.