Our research focus

Scientific Literature Mining Social Media Mining Health Record Mining
Published biomedical literature is complex, with its domain-specific terminologies and language structures. We utilize knowledge bases, meta-data, discourse structures, and free text to derive knowledge from published scientific articles. Social media contains enormous volumes of user-posted data, which present various challenges for NLP such as colloquial language, misspellings, and noise. We develop innovative solutions to health-related problems utilizing social media big data by overcoming the various challenges presented by this domain. Patient data, such as electronic health records, encapsulate crucial information about patient health and associations between medical entities. We use rule-based and machine learning techniques to perform optimized processing of health-related data, which can be used by practitioners at real time.

Current projects A Case Study of Online Birth Club Forums

In recent years, do-it-yourself (DIY) medical movements and direct-to-consumer (DTC) health technologies have made information, products and services available to the public that were previously sequestered in the “ivory tower” of science and medicine. While previous studies have examined online communities related to various diseases, there has been little research on online pregnancy forums—despite the fact that pregnant women are increasingly turning to technology to supplement maternal healthcare. The goal of this study is to explore the health content of online pregnancy forums on the online birth club forums using a topic modeling approach, and to compare this data to the typical symptoms that women experience by month, as outlined by the American College of Obstetricians and Gynecologists (ACOG). This study will illuminate how non-traditional data sources such as online pregnancy forums are transforming the patient-physician relationship in the realm of obstetrics.

Social-Media Based Medication Abuse Monitoring System (SMAMS)

Prescription Medication (PM) abuse is a major epidemic in the United States, and monitoring and studying the characteristics of the PM abuse problem requires the development of novel approaches. Social media encapsulates an abundance of data about PM abuse from different demographics, but extracting that data and converting it to knowledge requires advanced natural language processing and data-centric artificial intelligence systems. Our proposed social media mining framework will automate the process of big data to knowledge conversion for PM abuse, providing crucial insights to toxicologists about targeted populations and enabling the future development of directed intervention strategies.

Mining Social Media Postings for Mentions of Potential Adverse Drug Reactions

The overarching goal of this application is to deploy the infrastructure needed to explore the value of informal social network postings as a source of “signals” of potential adverse drug reactions soon after the drugs hit the market, paying particular attention at the value such information might have to detect adverse events earlier than currently possible, and to detect effects not easily captured by traditional means. Despite the significant challenge of processing colloquial text, our prototype study in this direction showed promising performance in identifying adverse reactions mentioned in these postings, with significant correlations between the effects mentioned by the public and those documented for the drugs we studied.

Social Media Data for Monitoring Medication Use and Effects during Pregnancy

Pregnancy exposure registries are the primary sources of information about the safety of maternal usage of medications during pregnancy. Such registries enroll pregnant women in a voluntary fashion early on in pregnancy, and follow them until the end of pregnancy or longer to systematically collect information regarding specific pregnancy outcomes. While the model of pregnancy registries has distinct advantages over other study designs, they are faced with numerous challenges and limitations, such as low enrollment rate, high cost and selection bias. The primary objectives of this study are to systematically assess if social media (Twitter) can be used to discover cohorts of pregnant women, and to develop and deploy a natural language processing and machine learning pipeline for automatic collection of cohort information. The long-term goal of the project is to derive associations between medication use during pregnancy and fetal outcomes.

Prescription Medication Abuse Surveillance through Social Media

Prescription medication overdose is the fastest growing drug-related problem in the USA. The growing nature of this problem necessitates the implementation of improved monitoring strategies for investigating the prevalence and patterns of abuse of specific medications. Our primary aims are to assess the possibility of utilizing social media as a resource for automatic monitoring of prescription medication abuse and to devise an automatic classification technique that can identify potentially abuse-indicating user posts.

Phylogeography of Zoonotic Viruses

Phylogeography is an emerging discipline in public health which allows researchers to model the evolution and migration patterns of viruses, and can be an especially useful tool for studying rapidly evolving viruses like RNA viruses. The goal of this project is to improve phylogeography models for tracking evolutionary changes in viral genomes and their spread. The addition of more precise geospatial metadata in building such models could enable health agencies to better target areas that represent the greatest public health risk. In addition, by improving geospatial metadata linked to popular sequence databases, we will enrich other sciences beyond phylogeography that utilize this information such as molecular epidemiology, population genetics, and environmental health.

more projects to come…

See the publications page for details of research tasks. See the Software and downloads page for associated software.