Our research focus

Scientific Literature Mining Social Media Mining Health Record Mining
Published biomedical literature is complex, with its domain-specific terminologies and language structures. We utilize knowledge bases, meta-data, discourse structures, and free text to derive knowledge from published scientific articles. Social media contains enormous volumes of user-posted data, which present various challenges for NLP such as colloquial language, misspellings, and noise. We develop innovative solutions to health-related problems utilizing social media big data by overcoming the various challenges presented by this domain. Patient data, such as electronic health records, encapsulate crucial information about patient health and associations between medical entities. We use rule-based and machine learning techniques to perform optimized processing of health-related data, which can be used by practitioners at real time.

Current projects

Mining Social Media Postings for Mentions of Potential Adverse Drug Reactions

The overarching goal of this application is to deploy the infrastructure needed to explore the value of informal social network postings as a source of “signals” of potential adverse drug reactions soon after the drugs hit the market, paying particular attention at the value such information might have to detect adverse events earlier than currently possible, and to detect effects not easily captured by traditional means. Despite the significant challenge of processing colloquial text, our prototype study in this direction showed promising performance in identifying adverse reactions mentioned in these postings, with significant correlations between the effects mentioned by the public and those documented for the drugs we studied.

Social Media Data for Monitoring Medication Use and Effects during Pregnancy

Pregnancy exposure registries are the primary sources of information about the safety of maternal usage of medications during pregnancy. Such registries enroll pregnant women in a voluntary fashion early on in pregnancy, and follow them until the end of pregnancy or longer to systematically collect information regarding specific pregnancy outcomes. While the model of pregnancy registries has distinct advantages over other study designs, they are faced with numerous challenges and limitations, such as low enrollment rate, high cost and selection bias. The primary objectives of this study are to systematically assess if social media (Twitter) can be used to discover cohorts of pregnant women, and to develop and deploy a natural language processing and machine learning pipeline for automatic collection of cohort information. The long-term goal of the project is to derive associations between medication use during pregnancy and fetal outcomes.

Prescription Medication Abuse Surveillance through Social Media

Prescription medication overdose is the fastest growing drug-related problem in the USA. The growing nature of this problem necessitates the implementation of improved monitoring strategies for investigating the prevalence and patterns of abuse of specific medications. Our primary aims are to assess the possibility of utilizing social media as a resource for automatic monitoring of prescription medication abuse and to devise an automatic classification technique that can identify potentially abuse-indicating user posts.

Phylogeography of Zoonotic Viruses

Phylogeography is an emerging discipline in public health which allows researchers to model the evolution and migration patterns of viruses, and can be an especially useful tool for studying rapidly evolving viruses like RNA viruses. The goal of this project is to improve phylogeography models for tracking evolutionary changes in viral genomes and their spread. The addition of more precise geospatial metadata in building such models could enable health agencies to better target areas that represent the greatest public health risk. In addition, by improving geospatial metadata linked to popular sequence databases, we will enrich other sciences beyond phylogeography that utilize this information such as molecular epidemiology, population genetics, and environmental health.

more projects to come…

See the publications page for details of research tasks. See the Software and downloads page for associated software.