Task 1 : Classification, extraction and normalization of adverse effect (AE) mentions in English tweets

In this task, modified from previous years, systems developed must develop one or more components to classify tweets that contain an adverse effect (AE) or also known as adverse drug effect (ADE), detect the text span of reported ADEs in tweets and map these colloquial mentions to their standard concept IDs in the MedDRA vocabulary (preferred terms). This task presents multiple challenges. Firstly, the classification task needs to take into account class imbalance where only around 7% of the tweets contain ADEs. Secondly, span detection will require advanced named entity recognition approaches. Finally, the resolution task will additionally require choosing a normalized concept from more than 23,000 MedDRA preferred terms.

Participants will be provided with a labeled training set containing tweet texts and ADE annotations with the option of participating in one or more subtasks. This task contains three subtasks in increasing order of complexity:

Training data: 18,000 tweets
Test data: 10,000 tweets

Register your team here : https://forms.gle/1qs3rdNLDxAph88n6
After registration approval, you will be invited to join the Google group for the task. Link to the dataset is available in the Google groups banner. If you do not receive the invite please request to join the Google group with team name using the link below.
Google groups : https://groups.google.com/g/smm4h21-task-1
Link to Codalab : https://competitions.codalab.org/competitions/28766

Evaluation Period for Task 1 :

Test Dataset Release	26th Feb 2021 12:00am UTC
Predictions Due	28th Feb 2021 11:59pm UTC (3:59pm PST)

All submissions are automated and time limits are enforced by Codalab. No extensions will be provided.

Subtask 1a : ADE tweet classification

Given a tweet, participants of this subtask will be required to submit only the binary annotations ADE/noADE. A tweet should be assigned the label ADE if and only if it has one or more mentions of an ADE.

Submission format: Please use the format below for submission. Submissions should contain tweet_id and label separated by tabspace in the same order as below. Predictions for each subtask should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as submission.

tweet_id	label
699	ADE
310	NoADE
653	NoADE
872	NoADE
751	ADE
469	NoADE

Submission format for Subtask 1a

You may also ignore predictions that are not ADE as shown below.

tweet_id	label
699	ADE
751	ADE

Alternative submission format for Subtask 1a

Evaluation Metric : Submissions will be ranked by Precision, Recall and F1-score for the ADE class.

Subtask 1b : ADE span detection (includes annotations for Subtask 1a)

Participants of the ADE span detection subtask will be required to submit both the ADE classification labels and spans of expressed ADE.

Submission format: Please use the format below for submission. Submissions should contain tweet_id, label, start, end and span separated by tabspace in the same order as below. You may use the same submission format for Subtask 1a and start, end and span columns will be ignored. Predictions for each subtask should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as submission.

tweet_id	label	start	end	span
699	ADE	111	118	fatigue
699	ADE	126	130	achy
751	ADE	26	41	lower back ache
751	ADE	43	52	neck ache
751	ADE	54	58	hips
751	ADE	63	73	knees ache
751	ADE	75	90	short of breath

Submission format for Subtask 1b

Evaluation Metric : Submissions will be ranked by Precision, Recall and F1-score for each ADE extracted where the spans overlap either entirely or partially.

Subtask 1c : ADE resolution (includes annotations for Subtask 1a and 1b)

Participants of the ADE resolution subtask will be required to submit ADR classification labels, spans and normalization labels.

Submission format: Submissions should contain tweet_id, label, start, end and span separated by tabspace in the same order as below. You may use the same submission format for Subtask 1a where start, end, span and ptid columns will be ignored. Similarly, you may use the same submission format for Subtask 1b where ptid column will be ignored. Predictions for each subtask should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as submission.

tweet_id	label	start	end	span	ptid
699	ADE	111	118	fatigue	10016256
699	ADE	126	130	achy	10033371
751	ADE	26	41	lower back ache	10003988
751	ADE	43	52	neck ache	10028836
751	ADE	54	58	hips	10003239
751	ADE	63	73	knees ache	10003239
751	ADE	75	90	short of breath	10040604

Submission format for Subtask 1c

Evaluation Metric : Submissions will be ranked by Precision, Recall and F1-score for each ADE extracted where the spans overlap either entirely or partially AND each span is normalized to the correct MedDRA preferred term ID.

Contact information: Arjun Magge (Arjun.Magge@pennmedicine.upenn.edu)

Frequently asked questions (FAQs):

Do I have to participate in all the subtasks?
No. You may choose to participate in one, two or all three subtasks. During evaluation, you will be allowed to make two submissions for each subtask.

How do I access MedDRA preferred terms (PT) and lower level terms (LLT)?
You can download MedDRA from https://www.meddra.org/. It is free for academic institutions and you may subscribe online at https://www.meddra.org/subscription/subscription-form. Once you download MedDRA, you will find PTs, LLTs and their mapping in the llt.asc file.

HLP @ Cedars-Sinai Computational Biomedicine

Progressing healthcare through automated natural language processing research

Task 1 : Classification, extraction and normalization of adverse effect (AE) mentions in English tweets

Like this:

Share this:

Like this: