An Analysis of a Twitter Corpus for Training a Medication Intake Classifier

Klein AZ, Sarker A, O’Connor K, Gonzalez-Hernandez G. An Analysis of a Twitter Corpus for Training a Medication Intake Classifier. AMIA Summits on Translational Science Proceedings. 2019;2019:102.

Link to journal


While social media has evolved into a useful resource for studying medication-related information, observational studies of medications have continued to rely on other sources of data. Towards advancing the use of social media data for medication-related observational studies, we analyze an annotated corpus of 27,941 tweets designed for training machine learning algorithms to automatically detect users’ medication intake. In particular, we assess how a baseline classifier trained on the general corpus—that is, on various types of medication—performs for specific types. For most types, the classifier performs significantly better than it does overall; however, for nervous system medications, it performs significantly worse. These results suggest that, while the general corpus may have utility for observational studies focusing on most types of medication, studying nervous system medications may benefit from training a classifier exclusively for this type. We will explore this data-level approach in future work.