Breast cancer patients often discontinue their long-term treatments, such as hormone therapy, increasing the risk of cancer recurrence. These discontinuations may be caused by adverse patient-centered outcomes (PCOs) due to hormonal drug side effects or other factors. PCOs are not detectable through laboratory tests and are sparsely documented in electronic health records. Thus, there is a need to explore complementary sources of information for PCOs associated with breast cancer treatments. Social media is a promising resource but extracting true PCOs from it first requires the accurate detection of self-reported breast cancer patients. In this task, only about 26% of the tweets contains such self-reports (S) and 74% of the tweets are non-relevant (NR). Systems designed for this task need to automatically identify tweets in the self-reports category.
- Training data: 3815 tweets
- Test data: 1204 tweets
Register your team here : https://forms.gle/1qs3rdNLDxAph88n6
After registration approval, you will be invited to join the Google group for the task. Link to the dataset is available in the Google groups banner. If you do not receive the invite please request to join the Google group with team name using the link below.
Google groups : https://groups.google.com/g/smm4h21-task-8
Link to Codalab : https://competitions.codalab.org/competitions/28766
Evaluation Period for Task 8 :
Test Dataset Release | 1st Mar 2021 12:00am UTC |
Predictions Due | 3rd Mar 2021 11:59pm UTC (3:59pm PST) |
TweetID | Tweet | Label |
4191 | A hereditary genetic mutation gave me breast cancer at 44. My mom & grandma also have this mutation. Both had early onset cancer at the same age as me. I have to tk Tamoxifen for 10 yrs. Has created a whole host of new health problems like fatty liver & debilitating joint pain. | 1 (S) |
3614 | “My breast cancer awareness bracelet just broke and I honestly want to cry. It’s one of the originals we bought, too. “ | 0 (NR) |
Submission format: Please use the format below for submission. Submissions should contain tweet_id and label separated by tabspace in the same order as below.
tweet_id | label |
234 | 1 |
414 | 0 |
611 | 0 |
876 | 0 |
986 | 1 |
543 | 0 |
Evaluation Metric : F1 score for the Self-reports (1) class
Contact information: Mohammed Ali Al-Garadi (m.a.al-garadi@emory.edu)
References: Al-Garadi, M. A., et. al. (2020, August). Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient-Centered Outcomes. In International Conference on Artificial Intelligence in Medicine (pp. 100-110). Springer, Cham