Social Media Mining for Health Applications (SMM4H) Workshop & Shared Task 2019

(For the Workshop, please follow this link, for SMM4H’18, please follow this link)

Shared Task

The proposed SMM4H shared tasks involve NLP challenges on social media mining for health monitoring and surveillance. This requires processing imbalanced, noisy, real-world, and substantially creative language expressions from social media. The proposed systems should be able to deal with many linguistic variations and semantic complexities in various ways people express medication-related concepts and outcomes. It has been shown in past research that automated systems frequently underperform when exposed to social media text because of the presence of novel/creative phrases and misspellings, and frequent use of idiomatic, ambiguous and sarcastic expressions. The tasks will thus act as a discovery and verification process of what approaches work best for social media data.

Similar to the first three runs of the shared tasks, the data will include annotated collections of posts on Twitter. The training data is already prepared and will be available to the teams registering to participate. This year, we will standardize the competition platform, utilizing Codalab competitions.

Task 1: Automatic classifications of adverse effects mentions in tweets

The designed system for this sub-task should be able to distinguish tweets reporting an adverse effect (AE) from those that do not, taking into account subtle linguistic variations between adverse effects and indications (the reason to use the medication). This is a rerun of the popular classification task organized in 2016, 2017, and 2018.

Data

  • Training data: ~25,000 annotated tweets are provided for training.
  • Evaluation data: approximately 5,000 tweets.
  • Evaluation metric: F-score for the ADR/positive class.
  • Test data: April 15 (Free registration is required, please contact Davy Weissenbacher)
  • Final run: April 19
  • Codalab link: TBA

For each tweet, the publicly available data set contains: (i) the user ID, (ii) the tweet ID, and (iii) the binary annotation indicating the presence or absence of ADRs, as shown below. The evaluation data will contain the same information, but without the classes. Participating teams should submit their results in the same format as the training set (shown below).

Tweet ID User ID Class

354256195432882177  54516759     0
352456944537178112  1267743056   1
332479707004170241  273421529    0
340660708364677120  135964180    1

Task 2: Extraction of Adverse Effect mentions

As a follow-up step of Task 1, this task includes identifying the text span of the reported AEs and distinguishing AEs from similar non-AE expressions. AEs are multi-token, descriptive, expressions, so this subtask requires advanced named entity recognition approaches. The data for this sub-task includes 2000+ tweets which are fully annotated for mentions of AEs and indications. This set contains a subset of the tweets from Task 1 tagged as hasADR plus a random set of 800 nonADR tweets. The nonADR subset was annotated for mentions of indications, in order to allow participants to develop techniques to deal with this confusion class.

Data

  • Training data: TBA
  • Evaluation data: TBA
  • Evaluation metric: F1-score
  • Test data: April 15 (Free registration is required, please contact Davy Weissenbacher)
  • Final run: April 19
  • Codalab link: TBA

For each tweet, the publicly available data set contains: (i) the tweet ID, (ii) the user ID, (iii) the start and end of the span and (iv) the annotation indicating an ADR, and Indication or a Drug as shown below. The evaluation data will contain the same information, but without the classes.

Tweet ID                                   User ID              Begin         End            Class

332574327444746240  210777087	60	66	ADR	 tired	baclofen	baclofen
332574327444746240  210777087	67	73	ADR	sleepy	baclofen	baclofen
333278830980648960  323112996  	59	66	ADR	disable	cipro	cipro
333278830980648960  323112996   127	135	ADR	crippled	cipro	cipro
349211366210539520  323112996   112	115	Indication	UTI	cipro	cipro
332480409654943744  42299706	26	35	ADR	addictive	cymbalta	cymbalta
33319916351484313   39844213	79	94	Indication	anti-depressant	lamotrigine	lamotrigine

Task 3: Normalization of adverse drug reaction mentions (ADR)

This is a mapping task where systems must map colloquial mentions of adverse reactions to standard concept IDs in the MEDDRA vocabulary (preferred terms). It requires a concept normalization system that receives ADR mentions, understands their semantic interpretations, and mapping those to standard concept IDs. As we have seen in the first and second SMM4H shared tasks, this is more challenging and is likely to require a semi-supervised approach to successfully address it. About 9000 annotated mappings will be made available for training and 5000 will be made available for evaluation.

Data

  • Training data: TBA
  • Evaluation data: TBA
  • Evaluation metric: Accuracy
  • Test data: April 15 (Free registration is required, please contact Davy Weissenbacher)
  • Final run: April 19
  • Codalab link: TBA

For each ADR mention, the publicly available data set contains: (i) an internal ID, (ii) the mention of the ADR, (iii) the concept ID in the MEDDRA vocabulary. The evaluation data will contain the same information, but without the concept IDs.

Internal ID    ADR mention                            MEDDRA ID

10415	  withdrawal	        10048010
10768	  fall asleep	        10020765
11075	  fucks up my sleep	10040984
10546	  feel so drink	        10016330
10302	  depressed thoughts	10012378

Task 4: Generalizable identification of personal health experience mentions

This binary classification task identifies whether a tweet indicates a first-person mention of a health concern or condition [1], for example distinguishing if someone personally has an illness or is merely conversing or sharing information about an illness. The goal is to build classification models that can generalize across different health issues, which would reduce the burden of creating illness-specific models and datasets. Toward this end, this task will provide at least three Twitter datasets in different health domains: detecting if someone has the flu [2], detecting if someone was vaccinated [3], and detecting if someone is changing their travel plans to avoid disease [4]. Each task will have approximately 1,000 tweets of labeled data. Two tasks will be given as training data, while a third task will be held out for testing, in order to evaluate whether the trained models can generalize to a completely different health application.

Data

  • Training data: TBA
  • Evaluation data: TBA
  • Evaluation metric: F1-score
  • Test data: April 15 (Free registration is required, please contact Davy Weissenbacher)
  • Final run: April 19
  • Codalab link: TBA

References

  1. Payam Karisani, Eugene Agichtein. Did You Really Just Have a Heart Attack? Towards Robust Detection of Personal Health Mentions in Social Media. 2018. https://arxiv.org/abs/1802.09130
  2. Alex Lamb, Michael J. Paul, Mark Dredze. Separating fact from fear: Tracking flu infections on Twitter. 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), Atlanta. June 2013.
  3. Xiaolei Huang, Michael C. Smith, Michael J. Paul, Dmytro Ryzhkov, Sandra C. Quinn, David A. Broniatowski, Mark Dredze. Examining patterns of influenza vaccination in social media. AAAI Joint Workshop on Health Intelligence (W3PHIAI), San Francisco. February 2017.
  4. Ashlynn R. Daughton, Dasha Pruss, Brad Arnot, Danielle Albers Szafir, Michael J. Paul. Characteristics of Zika behavior discourse on Twitter. AMIA Workshop on Social Media Mining for Health Applications, Washington, DC. November 2017.