Task 8 : Classification of self-reported breast cancer posts on Twitter

Breast cancer patients often discontinue their long-term treatments, such as hormone therapy, increasing the risk of cancer recurrence. These discontinuations may be caused by adverse patient-centered outcomes (PCOs) due to hormonal drug side effects or other factors. PCOs are not detectable through laboratory tests and are sparsely documented in electronic health records. Thus, there is a need to explore complementary sources of information for PCOs associated with breast cancer treatments. Social media is a promising resource but extracting true PCOs from it first requires the accurate detection of self-reported breast cancer patients.  In this task, only about 26% of the tweets contains such self-reports (S) and 74% of the tweets are non-relevant (NR). Systems designed for this task need to automatically identify tweets in the self-reports category.

  • Training data: 3815 tweets
  • Test data: 1204 tweets

Register your team here : https://forms.gle/1qs3rdNLDxAph88n6
Link to Codalab : Available Feb 1 2021

TweetIDTweetLabel
4191A hereditary genetic mutation gave me breast cancer at 44. My mom & grandma also have this mutation. Both had early onset cancer at the same age as me. I have to tk Tamoxifen for 10 yrs. Has created a whole host of new health problems like fatty liver & debilitating joint pain.1 (S)
3614“My breast cancer awareness bracelet just broke and I honestly want to cry. It’s one of the originals we bought, too. “0 (NR)

Evaluation Metric : F1 score for the S class

Contact information: Mohammed Ali Al-Garadi (m.a.al-garadi@emory.edu)

References: Al-Garadi, M. A., et. al. (2020, August). Automatic Breast Cancer Cohort Detection from Social Media for Studying Factors Affecting Patient-Centered Outcomes. In International Conference on Artificial Intelligence in Medicine (pp. 100-110). Springer, Cham