Social Media Mining for Health Applications (SMM4H) Workshop & Shared Task

(For the Workshop, please follow this link…)

Shared Task

The proposed SMM4H shared tasks involve NLP challenges on social media mining for health monitoring and surveillance and in particular pharmacovigilance. This requires processing noisy, real-world, and substantially creative language expressions from social media. The proposed systems should be able to deal with many linguistic variations and semantic complexities in various ways people express medication-related concepts and outcomes.

The data will include of medication-related posts on Twitter. The training data (which includes the train and test sets from previous runs) is already prepared and will be available to the teams registering to participate. We will prepare the evaluation data in the following months. There will be a one-week window during which teams will be able to run their systems on a blind evaluation data. The shared task will include four subtasks, described below.

The task presents several interesting challenges including the noisy nature of the data, the informal language of the user posts, misspellings, and data imbalance. It has been shown in past research that automated systems frequently underperform when exposed to social media text because of the presence of novel/creative phrases and misspellings, and frequent use of idiomatic, ambiguous and sarcastic expressions. The tasks will thus act as a discovery and verification process of what approaches work best for social media data.

TASK 1: Automatic detection of posts mentioning a drug namebinary classification

Systems are required to distinguish tweets that mention any drug names or dietary supplement. For this task we follow the definition of a drug product and of a dietary supplement as stated by the FDA. These definitions and concrete examples can be found in the guidelines we followed during the annotation of the data for the Task 1.

Data

  • Training data: 10,000 annotated tweets are provided for training.
  • Evaluation data: approximately 5,000 tweets.
  • Evaluation metric: Recall, Precision and F-score.
  • Test data: will be released July 20, 2018 (Free registration is required, please contact Davy Weissenbacher)
  • Final run: July 27, 2018

For each tweet, the publicly available data set contains: (i) the user ID, (ii) the tweet ID, and (iii) the binary annotation indicating the presence or absence in the tweet of one or more drug names/dietary supplement, as shown below. The evaluation data will contain the same information, but without the classes. Participating teams should submit their results in the same format as the training set (shown below).

Tweet ID                            User ID             Class

354256195432882177  54516759     0
352456944537178112  1267743056   1
332479707004170241  273421529    0
340660708364677120  135964180    1

Data can be downloaded from here smm4h-EMNLP-task1-trainingset. Use the download script provided within the folder to download the contents of the available tweets.

TASK 2: Automatic classification of posts describing medication intakethree-class classification

Systems are required to distinguish tweets that present personal medication intakepossible medication intake and non-intake. This is the second execution of this subtask. The class descriptions are as follows:

  • personal medication intake – tweets in which the user clearly expresses a personal medication intake/consumption. (1)
  • possible medication intake – tweets that are ambiguous but suggest that the user may have taken the medication. (2)
  • non-intake – tweets that mention medication names but do not indicate personal intake. (3)

Data

  • Training data: Over 17,000 tweets manually categorized into the three classes will be provided (which includes past year’s training and test data).
  • Evaluation data: approximately 8000 annotated tweets.
  • Evaluation metric: micro-averaged F-score for the intake and possible intake classes.
  • Test data: will be released July 20, 2018 (Free registration is required, please contact Davy Weissenbacher)
  • Final run: July 27, 2018

For each tweet, the publicly available data set contains: (i) the Tweet ID, (ii) the User ID, (iii) our database ID, and (iv) the binary annotation indicating the class, as shown below. The evaluation data will contain the same information, but without the classes. Participating teams should submit their results in the same format as the training set (shown below).

Tweet ID                           User ID                       Database ID                      Class

707959308504305664  S_Cavallii        med-int-17996        1
788971260239876096  britt20_          med-int-17997        1
529684196479205376  Keezy_TaughtYou   med-int-17998        2
676320685526933504  HotCheeteauxs     med-int-17999        3

Training data can be downloaded from here: smm4h-EMNLP-task2-trainingsets. Please download the full data set. There are 3 files, representing last year’s training sets and test set. Use the download script provided within the folder to download the contents of the available tweets.

TASK 3: Automatic classification of adverse drug reaction mentioning postsbinary classification

Systems have to distinguish between Twitter posts that contain adverse drug reaction mention versus those that do not. This is a popular task rerun from the first two shared tasks organized in 2016. A new, blind data set will be used for evaluation and an extended training set will be provided to the participants.

Data

  • Training data: Approximately 25,000 annotated tweets will be provided for training.
  • Evaluation data: approximately 8,000 tweets.
  • Evaluation metric: F-score for the ADR/positive class.
  • Test data: will be released July 20, 2018 (Free registration is required, please contact Davy Weissenbacher)
  • Final run: July 27, 2018

For each tweet, the publicly available data set contains: (i) the user ID, (ii) the tweet ID, and (iii) the binary annotation indicating the presence or absence of ADRs, as shown below. The evaluation data will contain the same information, but without the classes. Participating teams should submit their results in the same format as the training set (shown below).

Tweet ID                            User ID             Class

354256195432882177  54516759     0
352456944537178112  1267743056   1
332479707004170241  273421529    0
340660708364677120  135964180    1

Training data and download scripts are available at the following link: smm4h-EMNLP-trainingset3.

TASK 4 : Automatic detection of posts mentioning vaccination behavior—binary classification

Systems are required to distinguish tweets that mention behavior related to influenza vaccination. Specifically, annotators labeled tweets to answer the binary question, “Does this message indicate that someone received, or intended to receive, a flu vaccine?”

Data

  • Training data: 8180 annotated tweets.
  • Evaluation data: 1664 tweets.
  • Evaluation metric: F-score.
  • Test data: will be released July 20, 2018 (Free registration is required, please contact Davy Weissenbacher)
  • Final run: July 27, 2018

For each tweet, the publicly available tab-separated data set contains: (i) the Tweet ID, and (ii) the binary annotation indicating the class, where 1 indicates “yes” and 0 indicates “no”. The evaluation data will contain the same information, but without the classes. Participating teams should submit their results in the same format as the training set (shown below).

Tweet ID                                       Class

522600972476903424      0
522645182953836544      0
522662196984033281      1
522673895745536000      0

Data can be downloaded from here smm4th-EMNLP-task4-trainingset. Use the download script provided within the folder to download the contents of the available tweets.

FAQ

Q: How can I download the data for tasks 1, 2, 3 and 4?
A: The data can be downloaded from Twitter. We provide the tweet ids and the user ids for each instance, along with the annotations. We provide a simple python script which can be used to download the tweets associated with these ids. The python script is compressed with the training data in the zip files for each task.

Q: I have downloaded the data, but the number of tweets is lower than that mentioned in the shared task page. Why is that?
A: Because of Twitter’s privacy policies, we cannot share the texts of the tweets directly, but can only share the user and tweetids. Therefore, when a user removes a tweet or closes his/her account, the tweets are not accessible anymore. Unfortunately, there is nothing we can do about this. Participants will have to train on whatever data is available. For the testing, we may only use tweets that are available at the time of the release of the data set. We will make some additional data set available prior to the release of the official test data.

Q: How will I submit my results?
A: The submission format for each task is described along with the description. Submission link will be made available closer to date.

Q: What are the state-of-the art systems for these tasks?
A: Because social media mining for restricted domains, such as the medical domain, is a relatively new research area, a variety of approaches are currently being experimented upon to mine social media data. For the tasks involved in this competition, the state-of-the-art systems are mentioned.
Task 1 and 3. Binary classification of social media text is a well explored area. Our past work on Adverse Drug Reaction in tweets can be found here: http://www.sciencedirect.com/science/article/pii/S1532046414002317
Task 2. PREVIOUS PERFORMANCES OF THE AMIA CHALLENGE

Q: How many submissions can I make?
A: For each task, two submission from each team will be accepted. You can submit as many times as you want, but only the last two submissions will be accepted. You can participate in one or multiple tasks.

Q: Can I participate in Task 2 only?
A: Yes. You can participate in any number of tasks.

Q: Are there any restrictions on data and resources that can be used for training the classification system? For example, can we use manually or automatically constructed lexicons? Can we use other data (e.g., tweets, blog posts, medical records) annotated or unlabeled?
A: There are currently no restrictions on data and resources. External resources and data can be used. All external resources need to be explained in the system description paper.

Q: Is there any information on the test data? Will the test data be collected in the same way as the training data? For example, will the same drug names be used to collect tweets?
A: The test data has been collected the same way.