Task 3: Classification of changes in medication treatment in tweets

The designed binary classifier should detect tweets where Twitter users self-declare changing their medication treatments, regardless of being advised by a health care professional to do so. Such changes are, for example, not filling a prescription, stopping a treatment, changing a dosage, forgetting to take the drugs, etc. This task is the first step toward detecting patients non-adherent to their treatments and their reasons on Twitter. The data consists of two corpora: a set of tweets and a set of drug reviews from WebMD.com. Negative and positive reviews are naturally balanced whereas positive and negative tweets are naturally imbalanced. Each set is split into a training, a validation, and a test subset. The participants will be given the training and validation subsets for both corpora and evaluated on both test sets independently. Participants are expected to submit their predictions for both test sets. This year, we will add in the test sets additional reviews and tweets as decoys to avoid manual corrections of the predicted labels. Evaluation script, annotation guidelines, and baseline code will be provided to registered participants.

Training data: 5,898 Tweets / 10,378 Reviews
Validation data: 1,572 Tweets / 1,297 Reviews
Test data: 2,360 Tweets / 1,297 Reviews
Evaluation metric: F1-score for the change class

Register your team here : https://forms.gle/1qs3rdNLDxAph88n6
After registration approval, you will be invited to join the Google group for the task. Link to the dataset is available in the Google groups banner. If you do not receive the invite please request to join the Google group with team name using the link below.
Google groups : https://groups.google.com/g/smm4h21-task-3
Link to Codalab : https://competitions.codalab.org/competitions/28766
Annotation Guidelines: https://upenn.box.com/s/9aqtkfa0zy57wusj3ai99ldjcz5a7idg
Baseline Classifier: https://upenn.box.com/s/ktcl7urxvz11ngw4lpvsf94ifhtgpz1r

Evaluation Period for Task 3 :

Test Dataset Release	27th Feb 2021 12:00am UTC
Predictions Due	1st Mar 2021 11:59pm UTC (3:59pm PST)

All submissions are automated and time limits are enforced by Codalab. No extensions will be provided.

Subtask 3a : Tweet classification

Submission format: Please use the format below for submission. Submissions should contain two columns tweet_id and label separated by tabspaces. All other columns will be ignored. Predictions for each task should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as submission.

tweet_id	label
123	0
543	1
231	0
135	1
486	0
247	0

Submission format for Task3a

Subtask 3b : WebMD classification

Submission format: Please use the format below for submission. Submissions should contain two columns SOURCE_FILE and label separated by tabspaces. All other columns will be ignored. Predictions for each task should be contained in a single .tsv (tab separated values) file. This file (and only this file) should be compressed into a .zip file. Please upload this zip file as submission.

SOURCE_FILE	label
reviews_parsed/119_049.txt	1
reviews_parsed/219_879.txt	0
reviews_parsed/123_839.txt	0
reviews_parsed/179_022.txt	0
reviews_parsed/154_346.txt	1
reviews_parsed/329_055.txt	0

Submission format for Task3b

Contact information: Davy Weissenbacher (dweissen@pennmedicine.upenn.edu)

HLP @ Cedars-Sinai Computational Biomedicine

Progressing healthcare through automated natural language processing research

Task 3: Classification of changes in medication treatment in tweets

Like this:

Share this:

Like this: