Detecting Personal Medication Intake in Twitter: An Annotated Corpus and Baseline Classification System

Ari Z. Klein, Abeed Sarker, Masoud Rouhizadeh, Karen O’Connor, Graciela Gonzalez

Department of Biostatistics, Epidemiology and Informatics
Perelman School of Medicine
University of Pennsylvania
Philadelphia, PA, USA


Social media sites (e.g., Twitter) have been used for surveillance of drug safety at the population level, but studies that focus on the effects of medications on specific sets of individuals have had to rely on other sources of data. Mining social media data for this information would require the ability to distinguish indications of personal medication intake in this media. Towards that end, this paper presents an annotated corpus that can be used to train machine learning systems to determine whether a tweet that mentions a medication indicates that the individual posting has taken that medication (at a specific time). To demonstrate the utility of the corpus as a training set, we present baseline results of supervised classification.

Quick Downloads

Downloadable data
Annotation guidelines