***Thanks for the submissions. Results will be emailed out on Monday 18th September***
Draft workshop program is available here.
Like the first run of this shared task, the data will consist of medication/drug-related chatter from Twitter. Teams registering to participate will be provided with training data.
- To register, please email: abeed AT upenn DOT edu
- Include: a team name, task number, the number of members of the team, the location of the team and the leader of the research lab/group (if applicable).
- Training data will be provided to participants following registration.
There will be a two-week window during which teams will be able to run their systems on a blind evaluation data. The shared task will include three subtasks:
TASK 1: Automatic classification of adverse drug reaction mentioning posts—binary classification
Systems have to distinguish between Twitter posts that contain adverse drug reaction mention versus those that do not. This is a rerun from the first shared task organized in 2016. A new, blind data set will be used for evaluation and an extended training set will be provided to the participants.
Data
- Training data: 10,822 annotated tweets will be provided for training initially. A smaller additional data set will be made available to participants prior to the final evaluation.
- Evaluation data: approximately 10,000 tweets.
- Evaluation metric: F-score for the ADR/positive class.
For each tweet, the publicly available data set contains: (i) the user ID, (ii) the tweet ID, and (iii) the binary annotation indicating the presence or absence of ADRs, as shown below. The evaluation data will contain the same information, but without the classes. Participating teams should submit their results in the same format as the training set (shown below).
Tweet ID User ID Class
354256195432882177 54516759 0 352456944537178112 1267743056 1 332479707004170241 273421529 0 340660708364677120 135964180 1
Training data is available from: http://diego.asu.edu/Publications/ADRClassify.html
Please download the Full Twitter data set. Use the Download tweets script provided.
**ADDITIONAL TRAINING DATA FOR TASK 1**: task_1_dev_download_form2
TASK 2: Automatic classification of posts describing medication intake—three-class classification
Systems are required to distinguish tweets that present personal medication intake, possible medication intake and non-intake. This is a new task, similar to TASK 1. The class descriptions are as follows:
- personal medication intake – tweets in which the user clearly expresses a personal medication intake/consumption. (1)
- possible medication intake – tweets that are ambiguous but suggest that the user may have taken the medication. (2)
- non-intake – tweets that mention medication names but do not indicate personal intake. (3)
Data
- Training data: 8000 tweets manually categorized into the three classes will be provided initially. A smaller additional set will be made available to participants prior to the final evaluation.
- Evaluation data: approximately 5000 annotated tweets
- Evaluation metric: micro-averaged F-score for the intake and possible intake classes
For each tweet, the publicly available data set contains: (i) the Tweet ID, (ii) the User ID, (iii) our database ID, and (iv) the binary annotation indicating the class, as shown below. The evaluation data will contain the same information, but without the classes. Participating teams should submit their results in the same format as the training set (shown below).
Tweet ID User ID Database ID Class
707959308504305664 S_Cavallii med-int-17996 1 788971260239876096 britt20_ med-int-17997 1 529684196479205376 Keezy_TaughtYou med-int-17998 2 676320685526933504 HotCheeteauxs med-int-17999 3
Training data can be downloaded from here: download_binary_twitter_data. The link contains the training data and the download script.
**ADDITIONAL TRAINING DATA FOR TASK 2**: task_2_dev_download_form
TASK 3: Normalization of adverse drug reaction mentions
This is a concept normalization task. Given an concept mention in natural language (colloquial or other), participant systems are required to identify the MEDDRA Preferred Term (PT) code for the mention.
Training data will consist of a set of concept mentions and their corresponding, human-assigned MEDDRA PTs, as shown below. Submissions should follow an identical format.
Database ID Text Class
1003 gorked 10015535 1005 dry nose 10028740 1021 allergic 10041349 1025 withdrawal 10013754
Systems will be ranked based on the accuracy of mapping
[Date: June 15, 2017]: Download training data release 1 from this link: amia_task3_release1
[June 26, 2017]: **Download the full training set from this link: amia_task3_full_training**
Timeline
- Team registration begins: April 15, 2017
- Training data release: May 05, 2017
- Evaluation period:
August 25-September 1, 2017;Extended to: September 5-September 12 - Results and ranking release: September 14, 2017
- System descriptions due: October 1, 2017
Participation Rules
- Each team can participate in any number of the tasks. Teams are not required to participate in all the tasks.
- Registered teams must submit at least one system output for the registered task.
- Data for each task will be given to teams that have registered to participate in that specific task.
- Attending the conference is not mandatory (although encouraged). However, the registration requires a team to submit at least 1 set of results for the task and a system description paper.
- System description papers can be submitted elsewhere. You can also submit your system description as a paper to the workshop. We will host the proceedings at our lab at UPenn. You are allowed to submit the work elsewhere.
FAQ
Q: How can I download the data for tasks 1 and 2?
A: The data can be downloaded from Twitter. We provide the tweet ids and the user ids for each instance, along with the annotations. We provide a simple python script which can be used to download the tweets associated with these ids. The python script can be downloaded from: http://diego.asu.edu/downloads/download_binary_twitter_data.py
Q: I have downloaded the data, but the number of tweets is lower than that mentioned in the shared task page. Why is that?
A: Because of Twitter’s privacy policies, we cannot share the texts of the tweets directly, but can only share the user and tweetids. Therefore, when a user removes a tweet or closes his/her account, the tweets are not accessible anymore. Unfortunately, there is nothing we can do about this. Participants will have to train on whatever data is available. For the testing, we may only use tweets that are available at the time of the release of the data set. We will make some additional data set available prior to the release of the official test data.
Q: Is registration to AMIA necessary to participate in this competition?
A: No. However, if you submit a system, you will have to submit a short system description (~4 pages). We are looking at the possibility of having a combined publication for the workshop. Also, you are welcome to submit your system description paper to the workshop as well (it will undergo usual peer review).
Q: How will I submit my results?
A: The submission format for each task is described along with the description. Submission link will be made available closer to date.
Q: What are the state-of-the art systems for these tasks?
A: Because social media mining for restricted domains, such as the medical domain, is a relatively new research area, a variety of approaches are currently being experimented upon to mine social media data. For the tasks involved in this competition, the state-of-the-art systems are mentioned.
Task 1. Binary classification of social media text is a well explored area. Our past work on this can be found here: http://www.sciencedirect.com/science/article/pii/S1532046414002317
Task 2. This is a new data set and currently the state-of-the-art is unknown.
Q: How many submissions can I make?
A: For each task, two submission from each team will be accepted. You can submit as many times as you want, but only the last two submissions will be accepted. You can participate in one or multiple tasks.
Q: Can I participate in Task 2 only?
A: Yes. You can participate in any number of tasks.
Q: Are there any restrictions on data and resources that can be used for training the classification system? For example, can we use manually or automatically constructed lexicons? Can we use other data (e.g., tweets, blog posts, medical records) annotated or unlabeled?
A: There are currently no restrictions on data and resources. External resources and data can be used. All external resources need to be explained in the system description paper.
Q: Is there any information on the test data? Will the test data be collected in the same way as the training data? For example, will the same drug names be used to collect tweets?
A: The test data has been collected the same way, but not using the same drug names. While there might be some common drug names in the sets, it’s unlikely that the same drug will be represented heavily in two sets. However, the same classes of drugs maybe used. We will provide an additional training set later on, which will provide the participants with some idea about the differences between the training and the test sets.
Q: For task 3, is the task open or closed world? i.e. external resources?
A: Open. All resources allowed.
Q: For task three, what will be the metric of evaluation.
A: Accuracy.