Advanced Methods for Big Data Analytics in Women’s Health

Session Proposal for the Pacific Symposium on Biocomputing 2021

Session Chair:  Graciela Gonzalez-Hernandez PhD

Session Co-Chairs: Karin Verspoor, PhD, Maricel G Kann PhD, Su Golder PhD, Lisa Levine MD, Mary Regina Boland PhD,  Natalia Villanueva-Rosales PhD, Karen O’Connor MS

1 – Introduction:  session theme and motivation

Recent advances in data science and digital epidemiology have unlocked an unprecedented amount of data for analysis, and uncovered previously unseen sex-specific patterns that point at marked differences in disease symptoms, progression and care that affect women of all ages. In 2016, the NIH published a guidance document1 and changed its policy for reviewing proposals whereby accounting for “sex as a biological variable” became a required and scorable aspect of the research strategy, highlighting that “an over-reliance on male animals and cells may obscure understanding of key sex influences on health processes and outcomes”. Dr. Kathryn Rexrode, chief of the Division of Women’s Health at Brigham and Women’s Hospital, is quoted7 as succinctly stating the enormity of the problem: “without the inclusion of women, all the way through from basic research to clinical research, we can’t be sure we really have the right answers for 51 percent of the population.”

Aside from x-linked inheritable diseases, where women generally are carriers rather than express the disease2, there are various aspects of women’s health that challenge current methods. Recent research shows that variations in physiology may alter the pharmacokinetics or pharmacodynamics that determines drug dosing and effect for women, both in general and particularly during pregnancy6, as hormonal and other biological differences may influence the impact of drugs, their effectiveness and their side effects. Over two-thirds of women receive prescription drugs while pregnant, with treatment and dosing strategies based on data from healthy male volunteers and non-pregnant women 5. In addition, health processes unique to women, such as pregnancy and pregnancy loss, menstruation and menopause require differential approaches to data representation and analysis. Disorders related to pregnancy and menstruation (such as miscarriage and heavy bleeding, which have a significant impact on women’s health) have been recently found to be related to specific genetic mutations and are just being explored 3,4. Furthermore, it has become clear through numerous recent studies that many diseases (cardiovascular disease, asthma, eating disorders, lung cancer, and autoimmune disorders, among others) impact women differently than men.  Advanced data science methods specifically designed for exploring the influence that sex hormones and a women’s physiology can have on the pathophysiology of these processes diseases and on their treatment are essential to advance our understanding of key processes in women’s health, and, at the same time, the contrast could also shed light on the specific mechanisms that affect men.

Topics of interest to this session include novel text and data mining, machine learning, data integration, and other analytic methods applied to big data of any type (genetic, clinical, population, user-generated) that would enable further understanding of processes and diseases that are specific to women or differentially impact women. In harmony with the focus of PSB, the session emphasizes methodological advances and applications in data science, emphasizing reproducibility and validation.

2. Justification

We expect that the session will attract work by text mining and machine learning experts and bioinformaticians around the world that are currently working in collaboration with clinicians and bench scientists on women’s health. We seek to attract contributions tightly coupled to both the methodology aspects and biological/translational/clinical aspects for their significance and innovation; where one enhances and informs the other. By focusing on data science methods applied to women’s health, from the genetic, physiological, and population angles, we will provide a unique forum and spotlight on this topic, facilitating networking and interactions amongst all interested researchers. We anticipate an enthusiastic response to this session’s topic, which in the future could be expanded to methodologies addressing other biases in the data.

3 – Profiles of the organizers

Graciela Gonzalez-Hernandez, Ph.D. is an Associate Professor of Informatics in the Department of Biostatistics and Epidemiology at the Perelman School of Medicine, University of Pennsylvania. She is a recognized expert and leader in natural language processing (NLP) applied to bioinformatics, medical/clinical informatics, and public health informatics. At the University of Pennsylvania she leads the Health Language Processing Lab within the Institute of Biomedical Informatics. Her recent work is focused on NLP applications for public health monitoring and surveillance and is funded by R01s from the National Library of Medicine and the National Institute of Allergy and Infectious Diseases. She has over 90 publications in prestigious conferences and journals. She has chaired six PSB sessions and workshops.

Karin Verspoor, Ph.D. is a Professor in the School of Computing and Information Systems at the University of Melbourne, Australia. She is Director of Health Technologies for the Melbourne School of Engineering, Deputy Director of the Centre for Digital Transformation of Health, and Deputy Director of the ARC Training Centre in Cognitive Computing for Medical Technologies.Trained as a computational linguist, Karin’s current research primarily focuses on extracting information from clinical texts and the biomedical literature using machine learning methods to enable biological discovery and clinical decision support. She has additional extensive experience in health data semantics and analytics. Karin held previous posts as the Scientific Director of Health and Life Sciences at NICTA Victoria Research Laboratory, at the University of Colorado School of Medicine, and at Los Alamos National Laboratory

Maricel G. Kann, Ph.D. is an Associate Professor at the University of Maryland, Baltimore County where she has been since 2007.  Dr. Kann’s research focuses on developing new computational methodologies to identify the role of individual cancer mutations and other disease mutations in disease mechanisms. She is one of the leading experts in the area of translational Bioinformatics and has chaired and co-chaired several international conference sessions  (PSB, AMIA, ISMB). She is an associate editor of the Annual Review Biomedical Data Science, Journal of Computational Biology, and PLoS computational Biology. She is also a former NIH/NLM study session member, a scientific advisory board member of the PubMedCentral National Committee and of the UniProt consortium.

Su Golder, Ph.D is Associate Professor at the University of York in the UK. Su is a qualified information specialist with over 20 years’ experience. She has worked in lots of different settings and has a wide breadth of experience in systematic reviews of healthcare interventions. She has specialist expertise in systematic review methodology and systematic reviews of adverse effects and has taught in this field. Her PhD on optimising the retrieval of adverse effects data was funded by the MRC and has made an important contribution to the retrieval of information on adverse effects both nationally and internationally. Her current research is on the use of unpublished data, text mining and social media to maximise the efficiency and effectiveness of the retrieval of adverse effects data and has been funded by the National Institute for Health Research (NIHR). She has over 80 peer reviewed journal publications including subjects such as treatment and prevalence of gestational diabetes, and the medication intake and the risks of birth defects.

Lisa Levine, MD, is board certified Maternal-Fetal Medicine (MFM) specialist and an Assistant Professor of MFM within the Obstetrics & Gynecology Department at the University of Pennsylvania (Penn). She is a perinatal epidemiologist and received my Masters of Science in Clinical Epidemiology (MSCE) at Penn. She has extensive experience with both clinical trials as well as cohort studies within Obstetrics.  She recently completed a multi-arm randomized trial evaluating four different methods for induction of labor, recruting  491 women in only two years. She is also the Principal Investigator of two prospective cohort studies evaluating the cardiovascular health of women after a pregnancy complicated by preeclampsia. In addition to these research studies, she has a strong clinical interest in medication exposure during pregnancy and teach within the pharmaco-epidemiology course at the FDA regarding medication exposure in pregnancy and the difficulties with obtaining accurate and informative data on this topic; and co-investigator in an R01 that explores social media data for pharmacovigilance, specifically, for longitudinal case-control studies based on pregnancy outcomes.

Mary Regina Boland, MA, MPhil, PhD, FAMIA, is an Assistant Professor at the University of Pennsylvania, Philadelphia (since 2017). Dr. Boland’s research focuses on using Electronic Health Records coupled with data on the environment to study the ways that environment and pollution modulate risk for disease – specifically focusing on women’s health and adverse fetal outcomes. Dr. Boland is a Fellow of the American Medical Informatics Association and has over 45 papers studying informatics, environmental exposures and women’s health. Dr. Boland’s informatics contributions are with regards to developing reproducible, data driven approaches that link EHR data with external sources using ontologies when needed.

Natalia Villanueva Rosales, Ph.D. is an Assistant Professor at the University of Texas at El Paso since 2013. Her work aims to improve the efficiency and effectiveness of discovery, integration, and trust of scientific data and models. Two distinctive areas of her research include data- and ontology-based knowledge negotiation and the creation of trust models for interdisciplinary research. Her approach links human and machine knowledge to address societally-relevant problems that require interdisciplinary approaches supported by the development and application of data science such as pharmacogenomics and the sustainability of water resources.

Karen O’Connor, MSc is a Staff Scientist in the Department of Biostatistics, Epidemiology and Informatics at the Perelman School of Medicine, University of Pennsylvania.  She has extensive experience in the creation and annotation of corpora utilized in natural language processing models. Her current research in on improving the annotation process and corpus quality through the methodological development of annotation guidelines. She has overseen the creation of a multitude annotated corpora used in health related research including the detection of adverse drug relation mentions in social media, potential drug abuse or misuse mentions in Twitter and topic classification of tweets mentioning a medication.

4 – Sample publications

The following publications support the significance of the topic, and illustrate examples of papers that could be submitted to our session:

Publications available here

5– Key dates

PSB 2021 Key Dates

PSB 2021 will be held at the Fairmont Orchid on the Big Island of Hawaii, January 3-7, 2021.

Registration Available August 1, 2020
Submission site availableAugust 1, 2020
Paper submissions dueAugust 3, 2020
Notification of paper acceptanceSeptember 14, 2020
Camera-ready accepted paper deadlineOctober 1, 2020
Travel award applications DueOctober 1, 2020
Submission deadlineNovember 15, 2020

PSB has been able to offer partial travel support to many PSB attendees in the past. However, please note that no one is guaranteed travel support.

Regardless of submission to our session for consideration as a full paper, you can submit an abstract to PSB for poster presentation.

Poster presenters will be provided with an easel and a poster board 32″W x 40″H (80x100cm). One poster from each paid participant is accepted.

  1. NIH. Consideration of Sex as a Biological Variable in NIH-funded Research [Internet]. Available from: https://orwh.od.nih.gov/sites/orwh/files/docs/NOT-OD-15-102_Guidance.pdf
  2. https://www.sciencedirect.com/topics/neuroscience/x-linked-recessive-disorders
  3. Maybin JA, Boswell L, Young VJ, Duncan WC, Critchley HOD. Reduced Transforming Growth Factor-β Activity in the Endometrium of Women With Heavy Menstrual Bleeding. J Clin Endocrinol Metab. Endocrine Society; 2017 Apr 1;102(4):1299–1308. PMID: 28324043
  4. Husseini-Akram F, Haroun S, Altmäe S, Skjöldebrand-Sparre L, Åkerud H, Poromaa IS, Landgren B-M, Stavreus-Evers A. Hyaluronan-binding protein 2 (HABP2) gene variation in women with recurrent miscarriage. BMC Womens Health. BioMed Central Ltd.; 2018 Aug 24;18(1):143. PMID: 30143058
  5. Feghali M, Venkataramanan R, Caritis S. Pharmacokinetics of drugs in pregnancy. Semin Perinatol. W.B. Saunders; 2015 Nov 1;39(7):512–9. PMID: 26452316
  6. Buck Louis GM, Yeung E, Kannan K, et al. Patterns and Variability of Endocrine-disrupting Chemicals During Pregnancy: Implications for Understanding the Exposome of Normal Pregnancy. Epidemiology. 2019;30 Suppl 2(Suppl 2):S65–S75. doi:10.1097/EDE.0000000000001082
  7. 7 Major Gaps in Women’s Health Research [Internet] Available from: https://health.usnews.com/health-care/patient-advice/slideshows/7-major-gaps-in-womens-health-research