Progress Report 2022 – SM4PV – SMM4H – RADS

Progress Report through December 2022.

The goal of the proposed work is to develop novel Natural Language Processing (NLP) methods to leverage Social Media (SM) data for specific pharmacovigilance (PV) efforts that are hindered by known drawbacks of Spontaneous reporting systems (SRSs). We focus on methods to facilitate the use of SM data for exploring (a) factors affecting medication adherence and persistence among the general population (Aim 1), and (b) possible associations between medications taken during pregnancy and pregnancy outcomes (Aim 2). These are areas of significant impact for which SM data could meaningfully complement current PV efforts. In collaboration with domain experts, we propose to:

Specific Aim 1. Develop and evaluate NLP methods to identify non-adherence and non-persistence and related information from SM data. This includes methods to dynamically collect a cohort of SM users that stopped taking or switched medications, did not fill a prescription, or altered their treatment, and methods to extract information from the user’s timeline (publicly available postings over time) and conversation threads (postings by the user and others in reply to a posting of interest) relevant to (a) an expressed reason for these actions, (b) dosage/duration of treatment, (c) concomitant treatments, and (d) diagnosed health conditions.

On the study of medication non-adherence on social media, though our preliminary study1 detected some medication non-adherence in tweets, we expanded our binary classifier to detect any mention of change in medication treatments in users’ posts, regardless of whether or not the changes were recommended by a health care professional2. We trained a convoluted neural network (CNN) using annotated tweets mentioning medications and WebMD reviews to find mentions of medication treatment changes. We improved this system using bidirectional encoder representations from transformer (BERT)-based contextual embeddings trained using 12,972 WebMD reviews annotated for medication change achieving a 0.874 F1 score 3. We also implemented a sequence labeler to extract the spans that mention the reason for the medication change. This system was trained using 2,837 WebMD reviews positive for medication change whose reason for change was coded and extracted. This baseline system achieved moderate performance with an F1 score of 0.696.

We classified our collection of 324,459 WebMD medication reviews using our medication change detection system. This allows us to study the patient reported reasons for medication change for a given class of medications. We chose to analyze statins for our pilot study 3. In our manual review, the overwhelming majority (90%) expressed that adverse events were the primary driver of their reasons for discontinuing or switching their statin medication, with over half reporting more than one adverse event type. We also found mentions of dechallenge and rechallenge reports in the reviews.

To identify reports of dechallenge and rechallenge in other medications, we crafted a set of regular expressions deploying them on the reviews classified as positive for change. A subset (n=1500) of the 15,000 posts identified were manually annotated for rechallenge/dechallenge mentions.

During the SMM4H shared-tasks, we challenged the community to detect the medication changes mentions in our corpus and received good participation with 29 teams registered in 2021 and xx in 2022.

Tweets are collected using keywords searches. To address the limitation of spelling variations when collecting tweets, we developed an unsupervised spelling variant generator. This generator improved upon our previously developed phonetic based system 4, uses dense vector models used to find semantically close terms to the original, lexically dissimilar terms are then filtered 5. We developed Kusuri, a binary classifier based on ensemble learning 6 to identify true positive mentions of any medications in a SM post. To increase speed and performance, we updated Kusuri with a transformer-based classifier, BERT, trained with under-sampling to disambiguate tweets matched by a lexicon derived from RxNorm 7obtaining similar performance but processing the data was 28 times faster. We also expanded our classifier into a sequence labeler to extract the spans of the medication in tweets. We have collected over 27 million tweets using medication keywords and our generated variants. Kusuri has classified 17 million of these tweets as true positive medication mentions.

During the SMM4H 2020 shared tasks8, we ran task to detect tweets mentioning medications in a corpus with the natural, highly imbalanced ratio between tweets mentioning a medication and tweets that do not. The task was well attended by 16 participating teams. This task was also run in BioCreative VII9, with xx teams participating.

Improving upon our previous effort on the detection of adverse event mentions in SM10, we developed a deep learning pipeline (DeepADEMiner)11. We also combined annotations from our 2015 study on extracting ADRs and Indications on Twitter and DailyStrength to develop a general-purpose tool called SEED which extracts self-reported symptoms and diseases mentioned by users12. SEED utilized multi-corpus training and deep learning to identify and extract symptom/disease mentions and normalizes them to UMLS terms. The system achieved an F-1 score of 0.86 on DailyStrength and 0.72 on a Twitter corpus.

As a use case for SEED, we developed a pipeline to identify possible cases of Covid-19 in the US and UK 13,14. Our pipeline was trained on an annotated dataset of 8976 tweets that were collected based on a set of Covid related keywords and filtered using a set of regular expressions. The annotations were used to train a deep neural network to detect self-reports of Covid-19 which achieved and F1-score of 0.76. As this system was developed early in the pandemic, we are updating the classifier, training on a new dataset with more stringent guidelines for classifying a patient with a Covid diagnosis. We have annotated a dataset of 10000 tweets to use in the development of the new classifier. We used our SEED tool on the timelines extracted from 13,200 users (containing 21 million tweets) that self-reported Covid-19 detected by our classifier and discovered more than 1.3 million symptoms most of which were found in common symptoms among hospitalized and non-hospitalized symptoms reported by CDC.

In our continuing effort to encourage research in social media pharmacovigilance, we have presented AE mining tasks, including classification, extraction, and normalization in ADR mining tasks in English at the SMM4H 2017-2022 Shared Tasks. The 2022 SMM4H attracted xx participants to the task.

Data collection

  • 26,437,566 tweets mentioning at least one medication monitored
  • 343,459 WebMD reviews
  • 21,000,000 tweets of user’s identified as self-reporting a Covid-19 infection

Data Annotation

  • 17,176,207 tweets automatically classified as containing a mention of a medication (Kusuri)
  • 324,459 WebMD medication reviews automatically classified for medication change (medChange)
  • Manual annotation of 2,458 WebMD reviews of statins, coded for reason for medication change
  • Manual annotation of 1,500 WedMD reviews for dechallenge/rechallenge mentions
  • Manual annotation of 18,976 tweets for Covid diagnosis classification

Specific Aim 2. Develop and evaluate NLP methods to identify medication use during pregnancy and pregnancy outcomes from SM data. Work under this aim includes the development and evaluation of NLP methods to dynamically collect a cohort of SM users who report a pregnancy, and methods to extract information from the user’s timeline to (a) distinguish when mention of a medication indicates possible intake of it, (b) infer the estimated pregnancy timeframe (beginning and end of pregnancy), and (c) extract or infer pregnancy outcomes from those postings (including at least live birth, fetal death, hemorrhage, miscarriage, low-birth weight, pre-term birth, and reported congenital malformations).

Our automated NLP pipeline has identified more than 450,000 users who have announced their pregnancy on Twitter, and we have collected more than 3 billion of their tweets 15, our Social Media Pregnancy Cohort (SMPC). We have updated our pregnancy collection pipeline using a system of precise regular expressions, Pregex, to identify users reporting their own pregnancy and estimate their pregnancy timeframe. We have developed a module that can be incorporated into our pipelines for filtering out “reported speech” 16(e.g., quotations, news headlines) as well as a bot detection system for applications in health-related domains 17.

 

We improved upon the baseline performance of our pipelines, using state-of-the art pretrained language models, and deployed them at scale on our entire SMPC, giving us a count by specific outcome. A BERTweet classifier (F1-score = 0.82) identified 505 users who reported a birth defects 18,19. Another BERTweet classifier (F1-score = 0.93) identified reports of prenatal, perinatal, or postnatal outcomes 16,20 other than birth defects, including miscarriage (9988 users), stillbirth (805 users), preterm birth or premature labor (3242 users), low birthweight (1072 users), and neonatal intensive care unit (NICU) admission (3930 users). We developed high-precision regular expressions for detecting tweets that report “normal” pregnancy outcomes 21: a birth weight ≥ 5 pounds and 8 ounces (indicating that there is no low birth weight, no miscarriage, and no stillbirth) identifying 11,341 users and a gestational age ≥ 37 weeks (indicating no miscarriage and no preterm birth) , identifying 40,846 users.

 

Our ability to continue collecting users’ tweets on a long-term, ongoing basis after birth provides the opportunity to monitor outcomes past postpartum. To identify adverse childhood outcomes, we deployed a RoBERTa classifier (F1-score = 0.93) on a set of tweets that match keyword-based regular expressions to positively identify users who reported adverse pregnancy outcomes in childhood, including attention-deficit/hyperactivity disorder (ADHD) (855 users), autism spectrum disorders (ASD) (2066 users), delayed speech (896 users), and asthma (1533 users).

 

As a proof of concept study, we searched the tweets of users in our SMPC for mentions of beta blockers and their variants5, to identify a cohort of users who took or may have taken the medication during pregnancy22. We utilized several of our NLP tools, to identify the timeframe of pregnancy23 and the outcome of the pregnancy16,21,24 for users for whom it was determined to have, or possibly have taken the medication25,26. We identified 257 pregnancies during which a beta blocker may have been taken along with the indication for taking the medication in 76.7% of the pregnancies and the maternal age27 for 86.4%. This study indicated the utility of Twitter as a potential source of cohorts for drug safety studies to complement traditional study methods.

We also performed a preliminary study to examine birth outcomes of users in our SMPC who reported receiving at least one dose of the Covid-19 vaccine. We retrieved posts indicating vaccinations using precise regular expressions. We deployed our timeframe detection system23 to determine the start and end dates of the pregnancy and for users who were vaccinated during the periconception or during pregnancy, we deployed our adverse outcome detection system 16,19,24 and our NLP system to detect normal birthweight and full term birth 21. All data were manually validated finding 11 reported adverse outcomes in 45 of the pregnancies that were completed at the time of study.

We have created several community challenges during SMM4H including shared tasks for medication intake, birth defect outcomes, adverse pregnancy outcomes and exact age extraction.

Data collection

  • 534,870 users (4,171,331,105 tweets) reporting being pregnant on their Twitter accounts
  • 2,600 tweets mentioning a calcium channel blocker or a beta blocker from our medication collection
  • 24,105 tweets mentioning an adverse childhood outcome

Data Annotation

  • 2,600 tweets validated for medication intake, and classification of the medication intake as during the pregnancy timeframe of the user or not.
  • Manually annotated 4,017 tweets classifying for pregnancy announcement and timeframe
  • Manually annotated 9,734 tweets for adverse childhood outcomes
  • Manual annotation 2,451 users validating pregnancy, pregnancy timeframe and Covid-19 vaccine information including timing of vaccination

Specific Aim 3. Develop and evaluate methods for automatic selection of control groups. Work under this aim addresses the challenge faced when information from SM is to be used for epidemiological studies. Access to longitudinal SM data (timelines) as proposed in Aims 1 and 2, enables case-control studies as a suitable epidemiological model, since the NLP methods can find persons with a condition (or another outcome variable) of interest within a larger cohort. However, finding a suitable control group among the vast number of other SM users not exhibiting the condition is a challenge. We propose a novel adaptation of biased topic modeling to find control subjects using information in the timelines. Evaluation hypothesis: identified users will be labeled by experts to be suitable control subjects to a moderate level of agreement (Kappa 0.41 – 0.6).

To create a cohort of users for a case-control study. We annotated 650 timelines of pregnant women having reported their pregnancy, reached full term of their pregnancy and their baby was born with normal birth weight, and the timelines of 552 users with a reported pregnancy with a child with a birth defect. In each cohort we identify the age, race/ethnicity, location, and medication intake, where available. To facilitate the creation of cohorts for future studies, we are evaluating, extending, or developing tools to automatically detect the demographic features of Twitter users – age, gender, race/ethnicity, and location. We developed deployed an automated NLP pipeline to identify age, ReportAGE (F1-score = 0.86)27, that extracts the exact age of Twitter users based on self-reports in their tweets, automatically detecting an age for more than 50% of the users in our SMPC. We have conducted two scoping reviews to identify viable systems for gender and age28, and race/ethnicity detection. Our scoping review on the available methods for automatically identifying the race/ethnicity of Twitter users 29 concluded that at this point, none of the

Data Annotation

  • Manual annotation of the pregnancy timeline for 1202 users demographic information of the mother, medication intake during pregnancy
  • Manual annotation of age mentions in 2200 tweet

Research and Diversity Students (RADS)

Under our parent grant, we received supplemental funding to run a program aimed at providing research experience for diversity scholars in the field of health informatics. The program included seminars on all aspects of biomedical research, as well as annotation projects that gave the scholars hands-on experience with SM data, providing them valuable insights into the benefits and challenges of using SM data in research. Throughout the program, we were able to support a diverse group of scholars at various stages of their education, including 3 high school students, 3 undergraduate students, 4 master’s students, and 1 PhD student.

The culmination of the program involved each scholar driving a research project with the assistance of their mentor and other lab staff as needed. The scholars selected projects that aligned with the scope of the parent grant and took responsibility for identifying their data needs, collecting data, applying the appropriate ML/NLP methods, and analyzing the results of their study. Team meetings were held each week where the student presented their progress to the group.  Upon completion, each student wrote a manuscript summarizing their work. To date, three of the studies are under review for publication, and three others are approaching completion for journal submission.

Social Media Mining for Health Research and Applications (#SMM4H) Workshop and Shared Task

Since 2016, we have organized the Social Media Mining for Health Research and Applications (#SMM4H) workshop and shared tasks. Our vision for the workshop has been to provide a unique venue to bring together researchers interested in developing and sharing natural language processing (NLP) methods utilizing social media data for health informatics. The #SMM4H workshop, which includes oral presentations and poster sessions, has consistently attracted 50 to 75 participants submitting research from a diverse range of social media platforms (e.g., Twitter, Reddit, Facebook, online health forums), languages (e.g., English, Spanish, German, Dutch, Romanian, Nepali), and health domains (e.g., diabetes, depression, COVID-19, adverse drug events). We have held the workshop at various venues, including PSB (2016)30, AMIA (2017)31, EMNLP (2018)32, ACL (2019)33, COLING (2020 and 2022)34,35, and NAACL (2021)36. We have held workshops at conferences run by the Association for Computational Linguistics (ACL), a premiere organization in Natural Language Processing (NLP) research, since 2018, proving us the opportunity to expose and encourage participation in using advances in NLP to solve the significant challenges posed by using social media for health research. Our keynote speakers have included international researchers from academic institutions and industry: Raul Rodriguez-Esteban (Roche Pharmaceuticals, Switzerland), Mark Dredze (Johns Hopkins University, USA), and Fabio Rinaldi (Dalle Molle Institute for Intelligence Artificial, Switzerland).

Each of our workshops has been accompanied by shared tasks to address the NLP challenges inherent to utilizing social media data for health research, including informal, colloquial expressions, noise, and data sparsity8,30,31,37–39. With our shared tasks, we aim to advance the use of user-generated tests from social media for pharmacovigilance, epidemiology, patient-centered outcomes, disease tracking, including the impact and beliefs about a disease such as COVID-19. We began with three tasks in 201630 organized solely by our lab, expanding to 10 tasks in 2022, organized in collaboration with five other research labs from academia and industry. Participation has also steadily increased over the years, from 11 teams in 2016, to 54 teams from 28 countries at #SMM4H 2022, about 30% more than the prior iteration. Our tasks have included the identification, extraction, and normalization of adverse event mentions in tweets in English, Russian and French, detection of adverse pregnancy outcomes, demographic information (age), and several tasks related to COVID-19. Table 1 lists all the tasks run by year and the data provided to participants. In addition to driving advances in research, shared tasks also provide gold-standard annotated corpora to the research community (Table 1). While the test set are generally not made available, all training set are freely available to participants at the time of the shared task and to those who attempt to advance solutions to the problems on their own, or use the data for their own research.

Table 1: Brief Description of the tasks and the annotated data provided for each task

Year Conference Task(s) Data (test/train) Training Data Link
2016 PSB TASK 1: Binary classification of ADRs 7,574/3,284 tweets http://diego.asu.edu/psb2016/task1data.html (DEAD LINK)
    Task 2: ADR Extraction 2,000 tweets http://diego.asu.edu/psb2016/task2data.html (DEAD LINK)
    Task 3: Normalization of ADR mentions 2,000 tweets http://diego.asu.edu/psb2016/task3data.html (DEAD LINK)
2017 AMIA TASK 1: Automatic classification of adverse drug reaction mentioning posts—binary classification 10,822/10,000

tweets

http://diego.asu.edu/Publications/ADRClassify.html (DEAD LINK)
    TASK 2: Automatic classification of posts describing medication intake—three-class classification 8,000/5,000 tweets https://healthlanguageprocessing.org/wp-content/uploads/2017/05/download_binary_twitter_data.zip
2018 EMNLP TASK 1: Automatic detection of posts mentioning a drug name—binary classification 10,000/5,000 tweets https://healthlanguageprocessing.org/wp-content/uploads/2018/04/smm4h-emnlp-task1-trainingset1.zip
    TASK 2: Automatic classification of posts describing medication intake—three-class classification 17,000/8,000 tweets https://healthlanguageprocessing.org/wp-content/uploads/2018/04/smm4h-emnlp-task2-trainingsets.zip
    TASK 3: Automatic classification of adverse drug reaction mentioning posts—binary classification 25,000/8,000 tweets https://healthlanguageprocessing.org/wp-content/uploads/2018/04/task3_trainingset3_download_form.zip
    TASK 4 : Automatic detection of posts mentioning vaccination behavior—binary classification 8,180/1,664 tweets https://healthlanguageprocessing.org/wp-content/uploads/2018/05/smm4th-emnlp-task4-trainingset.zip
2019 ACL Task 1: Automatic classifications of adverse effects mentions in tweets 25,672/5,000 tweets  
    Task 2: Extraction of Adverse Effect mentions 2,367/1,000 tweets  
    Task 3: Normalization of adverse drug reaction mentions (ADR) 2,367/1,000 tweets  
    Task 4: Generalizable identification of personal health experience mentions 10,876/NR tweets  
2020 COLING Task 1: Automatic classification of tweets that mention medications 69,272/29,687 tweets  
    Task 2: Automatic classification of multilingual tweets that report adverse effects 25,672/5,000 tweets (English), 2,426/607 tweets (French), 7,612/1,903 tweets (Russian)  
    Task 3: Automatic extraction and normalization of adverse effects in English tweets 2,376/1,000 tweets  
    Task 4: Automatic characterization of chatter related to prescription medication abuse in tweets 13,172/3,271 tweets  
    Task 5: Automatic classification of tweets reporting a birth defect pregnancy outcome 18,397/4,602 tweets  
2021 NAACL Task 1 : Classification, extraction and normalization of adverse effect (AE) mentions in English tweets 18,000/10,000 tweets  
    Task 2 : Classification of Russian tweets for detecting presence of adverse effect (AE) mentions 11,610/1000 tweets  
    Task 3: Classification of changes in medication treatment in tweets Tweets: 7,470/2,360

WebMD Reviews: 11,675/1,297

 
    Task 4 : Classification of tweets self-reporting adverse pregnancy outcomes. 6,487/10,000 tweets  
    Task 5 : Classification of tweets self-reporting potential cases of COVID-19. 7,181/10,000 tweets  
    Task 6 : Classification of COVID19 tweets containing symptoms 9,567/6,500 tweets  
    Task 7 : Identification of professions and occupations (ProfNER) in Spanish tweets  8,000/2,000 tweets  
    Task 8 : Classification of self-reported breast cancer posts on Twitter 3815/1204 tweets  
2022 COLING Task 1 – Classification, detection and normalization of Adverse Events (AE) mentions in tweets (in English) 18,000/10,000 tweets  
    Task 2 – Classification of stance and premise in tweets about health mandates related to COVID-19 (in English) 4,269/2,000 tweets  
    Task 3 – Classification of changes in medication treatments in tweets and WebMD reviews (in English) Tweets: 7,470/2,360

WebMD Reviews: 11,675/1,297

 
    Task 4 – Classification of tweets self-reporting exact age (in English) 11,000/10,000 tweets  
    Task 5 – Classification of tweets containing self-reported COVID-19 symptoms (in Spanish) 13,630/6,851 tweets  
    Task 6 – Classification of tweets which indicate self-reported COVID-19 vaccination status (in English) 16,477/5,923 tweets  
    Task 7 – Classification of self-reported intimate partner violence on Twitter (in English) 5,057/1,291 tweets  
    Task 8 – Classification of self-reported chronic stress on Twitter (in English)

 

3,356/839 tweets  
    Task 9 – Classification of Reddit posts self-reporting exact age (in English)

 

10,000/2,000 posts  
    Task 10 – Detection of disease mentions in tweets – SocialDisNER (in Spanish) 8,000/2,000 tweets  

 

Resources

 

  1. Onishi, T., Weissenbacher, D., Klein, A., O’Connor, K. & Gonzalez, G. Dealing with medication non-adherence expressions in twitter. in Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task 32–33 (2018).
  2. Weissenbacher, D. et al. Active neural networks to detect mentions of changes to medication treatment in social media. J. Am. Med. Inform. Assoc. 28, 2551–2561 (2021).
  3. Golder, S. et al. Patient-Reported Reasons for Switching or Discontinuing Statin Therapy: A Mixed Methods Study Using Social Media. Drug Saf. 45, 971–981 (2022).
  4. Pimpalkhute, P., Patki, A., Nikfarjam, A. & Gonzalez, G. Phonetic spelling filter for keyword selection in drug mention mining from social media. AMIA Jt. Summits Transl. Sci. Proc. AMIA Jt. Summits Transl. Sci. 2014, 90–5 (2014).
  5. Sarker, A. & Gonzalez-Hernandez, G. An unsupervised and customizable misspelling generator for mining noisy health-related text sources. J. Biomed. Inform. 88, 98–107 (2018).
  6. Weissenbacher, D. et al. Deep neural networks ensemble for detecting medication mentions in tweets. J. Am. Med. Inform. Assoc. 26, 1618–1626 (12 01).
  7. Weissenbacher, D., Rawal, S., Magge, A. & Gonzalez-Hernandez, G. Addressing Extreme Imbalance for Detecting Medications Mentioned in Twitter User Timelines. in 19th Annual Conference on Artificail Intelligence (AIME, 2021). doi:10.1101/2021.02.09.21251453.
  8. Klein, A. et al. Overview of the fifth social media mining for health applications (# smm4h) shared tasks at coling 2020. in Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task 27–36 (2020).
  9. BioCreative – Latest 3 News Items. https://biocreative.bioinformatics.udel.edu/.
  10. Nikfarjam, A., Sarker, A., O’Connor, K., Ginn, R. & Gonzalez, G. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J. Am. Med. Inform. Assoc. JAMIA 22, 671–681 (2015).
  11. Magge, A. et al. DeepADEMiner: a deep learning pharmacovigilance pipeline for extraction and normalization of adverse drug event mentions on Twitter. J. Am. Med. Inform. Assoc. 28, 2184–2192 (2021).
  12. Magge, A., O’ Connor, K., Scotch, M. & Gonzalez-Hernandez, G. SEED: Symptom Extraction from English Social Media Posts using Deep Learning and Transfer Learning. MedRxiv Prepr. Serv. Health Sci. (2021) doi:10.1101/2021.02.09.21251454.
  13. Klein, A. Z. et al. Toward Using Twitter for Tracking COVID-19: A Natural Language Processing Pipeline and Exploratory Data Set. J. Med. Internet Res. 23, e25314 (2021).
  14. Golder, S. et al. A chronological and geographical analysis of personal reports of COVID-19 on Twitter from the UK. Digit. Health 8, 20552076221097508 (2022).
  15. Chandrashekar, P. B., Magge, A., Sarker, A. & Gonzalez, G. Social media mining for identification and exploration of health-related information from pregnant women. (2017).
  16. Klein, A. Z., Cai, H., Weissenbacher, D., Levine, L. D. & Gonzalez-Hernandez, G. A natural language processing pipeline to advance the use of Twitter data for digital epidemiology of adverse pregnancy outcomes. J. Biomed. Inform. X 100076 (2020) doi:10.1016/j.yjbinx.2020.100076.
  17. Davoudi, A., Klein, A. Z., Sarker, A. & Gonzalez-Hernandez, G. Towards Automatic Bot Detection in Twitter for Health-related Tasks. AMIA Summits Transl. Sci. Proc. 2020, 136–141 (2020).
  18. Klein, A. Z., Sarker, A., Weissenbacher, D. & Gonzalez-Hernandez, G. Towards scaling Twitter for digital epidemiology of birth defects. Npj Digit. Med. 2, 96 (2019).
  19. Klein, A. Z., Sarker, A., Cai, H., Weissenbacher, D. & Gonzalez-Hernandez, G. Social media mining for birth defects research: A rule-based, bootstrapping approach to collecting data for rare health-related events on Twitter. J. Biomed. Inform. 87, 68–78 (2018).
  20. Klein, A. Z. & Gonzalez-Hernandez, G. An annotated data set for identifying women reporting adverse pregnancy outcomes on Twitter. Data Brief 32, 106249 (2020).
  21. Klein, A. Z., Gebreyesus, A. & Gonzalez-Hernandez, G. Automatically Identifying Comparator Groups on Twitter for Digital Epidemiology of Pregnancy Outcomes. in AMIA Joint Summits on Translational Science 317–325 (2020).
  22. Klein, A. Z., O’Connor, K., Levine, L. D. & Gonzalez-Hernandez, G. Using Twitter Data for Cohort Studies of Drug Safety in Pregnancy: Proof-of-concept With β-Blockers. JMIR Form. Res. 6, e36771 (2022).
  23. Rouhizadeh, M., Magge, A., Klein, A., Sarker, A. & Gonzalez, G. A Rule-based Approach to Determining Pregnancy Timeframe from Contextual Social Media Postings. in Proceedings of the 2018 International Conference on Digital Health – DH ’18 16–20 (ACM Press, 2018). doi:10.1145/3194658.3194679.
  24. Klein, A. Z., Sarker, A., Weissenbacher, D. & Gonzalez-Hernandez, G. Towards scaling Twitter for digital epidemiology of birth defects. Npj Digit. Med. 2, 1–9 (2019).
  25. Klein, A. Z. et al. Detecting Personal Medication Intake in Twitter: An Annotated Corpus and Baseline Classification System. Proc. BioNLP 2017 Workshop 136–142 (2017).
  26. Klein, A. & Gonzalez-Hernandez, G. Data Specific Training for Detecting Reports of Medication Intake on Twitter. in AMIA 2021 Virtual Informatics Summit (AMIA, March 24, 20201).
  27. Klein, A. Z., Magge, A. & Gonzalez-Hernandez, G. ReportAGE: Automatically extracting the exact age of Twitter users based on self-reports in tweets. PLOS ONE 17, e0262087 (2022).
  28. O’Connor, K., Golder, S., Weissenbacher, D., Klein, A. & Gonzalez-Hernandez, G. Scoping Review of Methods and Annotated Datasets Used to Predict Gender and Age of Twitter Users. 2022.12.06.22283170 Preprint at https://doi.org/10.1101/2022.12.06.22283170 (2022).
  29. Golder, S., Stevens, R., O’Connor, K., James, R. & Gonzalez-Hernandez, G. Who is Tweeting? A Scoping Review of Methods to Establish Race and Ethnicity from Twitter Datasets. Preprint at https://doi.org/10.31235/osf.io/wru5q (2021).
  30. Sarker, A., Nikfarjam, A. & Gonzalez, G. Social Media Mining Shared Task Workshop. in Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing vol. 21 581–92 (2016).
  31. Sarker, A. & Gonzalez-Hernandez, G. Overview of the Second Social Media Mining for Health (SMM4H) Shared Tasks at AMIA 2017. in Proceedings of the Second Workshop on Social Media Mining for Health Applications (SMM4H) (2017).
  32. Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task. (Association for Computational Linguistics, 2018).
  33. Weissenbacher, D., Klein, A., Magge, A. & Gonzalez Hernandez. Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task. in Proceedings of the Fouth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task (2019).
  34. Proceedings of the Fifth Social Media Mining for Health Applications Workshop & Shared Task. (Association for Computational Linguistics, 2020).
  35. Proceedings of The Seventh Workshop on Social Media Mining for Health Applications, Workshop & Shared Task. (Association for Computational Linguistics, 2022).
  36. Proceedings of the Sixth Social Media Mining for Health (#SMM4H) Workshop and Shared Task. (Association for Computational Linguistics, 2021).
  37. Magge, A. et al. Overview of the sixth social media mining for health applications (# smm4h) shared tasks at naacl 2021. in Proceedings of the Sixth Social Media Mining for Health (# SMM4H) Workshop and Shared Task 21–32 (2021).
  38. Weissenbacher, D., Sarker, A., Paul, M. J. & Gonzalez-Hernandez, G. Overview of the Third Social Media Mining for Health (SMM4H) Shared Tasks at EMNLP 2018. in Proceedings of the 2018 EMNLP Workshop SMM4H: The 3rd Social Media Mining for Health Applications Workshop & Shared Task 13–16 (Association for Computational Linguistics, 2018). doi:10.18653/v1/W18-5904.
  39. Weissenbacher, D. et al. Overview of the Fourth Social Media Mining for Health (SMM4H) Shared Tasks at ACL 2019. in Proceedings of the Fourth Social Media Mining for Health Applications (#SMM4H) Workshop & Shared Task 21–30 (Association for Computational Linguistics, 2019). doi:10.18653/v1/W19-3203.