Davy Weissenbacher

About Me…

Researcher Associate in the HLP Center, University of Pennsylvania
The Perelman School of Medicine
University of Pennsylvania, USA

Education Projects Publications Teaching Activities Software

 

Research Interest

My main interest is the quality of the annotations produced by NLP systems. My PhD thesis puts in evidence that, to date, no NLP system is able to produce automatically perfect annotations. Consequently, it is important to design NLP systems based on inference models dealing with uncertain information. During my first PostDoc I worked in close interaction with users from different domains. This was an opportunity to evaluate the usability of current NLP approaches according to the user’s point of view. It seems that there is a certain threshold beyond which users will regard the output of an NLP system as reliable, and that current systems have not yet reached that point. This is particularly true for systems which produce semantic information (e.g. Anaphora Resolution or semantic frames extraction). Their use can even be obtrusive if they present noisy and distracting information to the user. I have been recently working on the problem of structured prediction with graphical models and constrained conditional models. These Machine Learning techniques predict jointly values of several random variables along with their relations. In this expressive framework linguistic constraints are easily expressed and integrated in the inference model to remove likely but inadequate solutions. I’m currently applying these techniques on the task of geographical relation extraction from medical texts to help phylogeography studies.

Education

  • PhD Thesis in Natural Language Processing, Paris XIII (Defended on November 20, 2008)
  • Master’s in Artificial Intelligence(DEA), Paris XIII (2003)
  • Master’s in Logic(M1), Paris I (2002)
  • DEUST in Science Computing(L2), Paris VI (2000)
  • Licence in Logic(L3), Paris I (1999)
  • Licence in Philosophy(L3), Paris I (1999)

Projects Participation

Publications

Refereed International Journals (13)
  • Weissenbacher D., O’Connor K., Hiraki A. T., Kim J.D., Gonzalez-Hernandez G. 2020. “An empirical evaluation of electronic annotation tools for Twitter data”. Genomics & Informatics 18(2):e24
  • D. Weissenbacher, A. Sarker, A. Klein, K. O’Connor, A. Magge Ranganatha, G. Gonzalez- Hernandez. 2019. “Deep Neural Networks Ensemble for Detecting Medication Mentions in Tweets”. Journal of the American Medical Informatics Association, 26(12) pp. 1618-1626
  • A. Klein, A. Sarker, D. Weissenbacher, G. Gonzalez-Hernandez. 2019. “Towards scaling social media for digital epidemiology of birth defects: automatically detecting pregnancy outcomes on Twitter”. npj Digital Medicine, 2(96)
  • M. Scotch, T. Tahsin, D. Weissenbacher, K. O’Connor, A. Magge, M. Vaiente, M. Suchard, G. Gonzalez-Hernandez. 2019. “Incorporating sampling uncertainty in the geospatial assignment of taxa for virus phylogeography”. Virus evolution, 5(1): vey043
  • A. Klein, A. Sarker, H. Cai, D. Weissenbacher, G. Gonzalez-Hernandez. 2018. “Social media mining for birth defects research: a rule-based, bootstrapping approach to collecting data for rare health-related events on Twitter”. Journal of Biomedical Informatics, 87, pp. 68-78
  • S. Golder, S. Chiuve, D. Weissenbacher, A. Klein, K. O’Connor, M. Bland, M. Malin, M. Bhattacharya, L. Scarazzini, G. Gonzalez-Hernandez. 2018. “Pharmacoepidemiologic evaluation of birth defects from health-related postings in social media during pregnancy”. Drug Safety, 42(3), pp. 389-400
  • A. Magge, D. Weissenbacher, A. Sarker, M. Scotch, G. Gonzalez-Hernandez. 2018. “Deep neural networks and distant supervision for geographic location mention extraction”. Bioinformatics, 34(13): i565-i573 (ISMB’18)
  • T. Tahsin, D. Weissenbacher, K O’Connor, A. Magge, M. Scotch, G. Gonzalez-Hernandez. 2017. “GeoBoost: accelerating research involving the geospatial metadata of virus GenBank records”. Bioinformatics Application Note, btx799 [short paper]
  • Davy Weissenbacher, Tasnia Tahsin, Demetrius Jones-Shargani, Daniel Magee, Matteo Vaiente, Graciela Gonzalez, Matthew Scotch. 2017. “Named Entity Linking of Geospatial and Host Metadata in GenBank for Advancing Biomedical Research”, Database: The Journal of Biological Database and Curation (.pdf)
  • Tasnia Tahsin, Davy Weissenbacher, Robert Rivera, Rachel Beard, Mari Firago, Garrick Wallstrom, Matthew Scotch, Graciela Gonzalez. 2016. “A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records”, Journal of the American Medical Informatics Association (JAMIA) (.pdf)
  • Davy Weissenbacher, Tahsin Tasnia, Beard Rachel, Figaro Mari, Rivera Robert, Scotch Matthew, Gonzalez Graciela. 2015. “Knowledge-driven geospatial location resolution for phylogeographic models of virus migration”, Bioinformatics 2015 31 (12): i348-i356 (ISMB/ECCB’15) (.pdf)
  • Davy Weissenbacher and Adeline Nazarenko. 2011. “Comprendre les effets des erreurs d’annotations des plates-formes de TAL”, Traitement Automatique des Langues. varia 52-1 pp. 161-185
  • Sophia Ananiadou, Paul Thompson, James Thomas, Tingting Mu, Sandy Oliver, Mark Rickinson, Yutaka Sasaki, Davy Weissenbacher and John McNaught. 2010. “Supporting the Education Evidence Portal via Text Mining”, Philosophical Transaction of the Royal Society A. Royal Society, Vol. 368, No. 1925, pp. 3829-3844 (.pdf)
Refereed International Conferences (13)
  • Scotch Matthew, Tahsin Tasnia, Weissenbacher Davy, O’Connor Karen, Magge Arjun, Suchard Marc, Gonzalez Graciela. 2018. “Incorporating Observation Error in the Geospatial Assignment of Taxa for Virus Phylogeography”. AMIA’18 Informatics Summit [short paper]
  • Tahsin Tasnia, Weissenbacher Davy, O’Connor Karen, Magge Arjun, Scotch Matthew, Gonzalez-Hernandez Graciela. 2017. “GeoBoost: accelerating research involving the geospatial metadata of virus GenBank records”. Bioinformatics Application Note, btx799 (.pdf) [short paper]
  • Weissenbacher Davy, Abeed Sarker, Tasnia Tahsin, Gonzalez Graciela, Matthew Scotch. 2016. “Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods”. In Proceedings of AMIA Joint Summits on Translational Science, 2017 (.pdf) [long paper]
  • Weissenbacher Davy, Johnson Travis, Laura Wojtulewicz, Dueck Amylou, Locke Dona, Caselli Richard and Gonzalez Graciela. 2016. “Automatic Prediction of Linguistic Decline in Writings of Patients with Degenerative Dementia”. 15th Annual Conference of the North American Chapter of the Association for Computational Linguistics (.pdf) [long paper]
  • Weissenbacher Davy, Tahsin Tasnia, Beard Rachel, Figaro Mari, Rivera Robert, Scotch Matthew and Gonzalez Graciela. 2015 “Detection and Disambiguation of Geospatial Locations for Phylogeography”. 13th Annual Rocky Mountain Bioinformatics Conference.
  • Weissenbacher Davy, Tahsin Tasnia, Beard Rachel, Figaro Mari, Rivera Robert, Scotch Matthew and Gonzalez Graciela. 2015. “Knowledge-driven geospatial location resolution for phylogeographic models of virus migration”. In Proceedings of International Conference on Intelligent Systems for Molecular Biology (ISMB/ECCB’15) <!–(.pdf) [long paper, Acceptance rate: 17.4%]
  • Weissenbacher Davy and Raymond Christian. 2015. “Tree-Structured Named Entities Extraction from Competing Speech Transcriptions”. In Proceedings of International Conference on Application of Natural Language to Information Systems (NLDB’15) (.pdf) [long paper, Acceptance rate: 18%]
  • Scotch Matthew, Rivera Robert, Tahsin Tasnia, Beard Rachel, Firago Mari, Weissenbacher Davy, Wallstrom Garrick and Graciela Gonzalez. 2014. “A Pipeline for Virus Phylogeography that Accounts for Geospatial Observation Error”. 12th Annual Rocky Mountain Bioinformatics Conference.
  • Weissenbacher Davy and Sasaki Yutaka. 2013. “Which Factors Contributes to Resolving Coreference Chains with Bayesian Networks?”. In Proceedings of Conference on Intelligent Text Processing and Computational Linguistics (CICLing’13) (.pdf) [long paper]
  • Sasaki Yutaka and Weissenbacher Davy. 2013. “Large-Scale Hierarchical Text Classification for LSHTC3 Data”. Annual Meeting of the Association for Natural Language Processing [short paper]
  • Weissenbacher Davy and Nazarenko Adeline. 2007. “A Bayesian approach combining surface clues and linguistic knowledge: Application to the anaphora resolution problem”. In Proceedings of the Recent Advances in Natural Language Processing (RANLP’07) (.pdf) [Poster]
  • Weissenbacher Davy and Nazarenko Adeline. 2007. “A bayesian classifier for the recognition of the impersonal occurrences of the it pronoun”. In Proceedings of the 6th Discourse Anaphora and Anaphor Resolution Colloquium (DAARC’07), pp.145-150 (.pdf) [long paper]
  • Weissenbacher Davy. 2006. “Bayesian Network, a model for NLP?”. In Proceedings EACL’06 Trento Italie , pp.195-198 (.pdf) [Poster, Acceptance rate: 39%] Corpus(.tar.gz)
Refereed International Workshops (8)
  • Sharma A., Weissenbacher D., Baral C. and Gonzalez G. 2015. “Generating Semantic Graphs from Image Descriptions for Alzheimer’s Disease Detection”. 3rd Coherence of Discourse Workshop
  • Sarker A., Nikfarjam A., Weissenbacher D., Gonzalez G. 2015 “DIEGOLab: An Approach for Message-level Sentiment Classification in Twitter”. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), p°510-514.
  • Tahsin T., Beard R., Rivera R., Lauder R., Weissenbacher D., Wallstrom G., Scotch M., Gonzalez G. 2014. “Natural Language Processing Methods for Enhancing Geographic Metadata for Phylogeography of Zoonotic Viruses”. In Proceedings of the 2014 Workshop on Biomedical Natural Language Processing (BioNLP 2014), p°1-9
  • Sasaki Y., Weissenbacher D. 2012 “TTI’s System for the LSHTC3 Challenge”. Proceedings of LSHTC3: ECML/PKDD – PASCAL Discovery Challenge Workshop on Large-Scale Hierarchical Classification [long paper, To be published]
  • Sasaki Y., Ishihara K., Yamamoto Y., Weissenbacher D. 2010. “TTI’s Systems for 2010 i2b2/VA Challenge” i2b2 Workshop 2010 [long paper]
  • Aubin S., Deriviere J., Hamon T., Nazarenko A., Poibeau T., Weissenbacher D. 2006. “A robust linguistic infrastructure for efficient web content analysis: the ALVIS project”. Symposium on Digital Semantic Content across Cultures.
  • Weissenbacher, Davy. 2005. “A Bayesian Network for the resolution of non-anaphoric pronoun it”. Workshop on Bayesian Methods for NLP, Neural Information Processing System (NIPS’05)(.pdf)
  • Erick Alphonse, Sophie Aubin, Philippe Bessières, Gilles Bisson, Thierry Hamon, sandrine Laguarigue, Adeline Nazarenko, Alain-Pierre Manine, Claire Nédellec, Mohamed Ould Abdel Vetah, Thierry Poibeau et Davy Weissenbacher. 2004. “Event-based Information Extraction for the Biomedical Domain: the Caderige Project”, Proceedings of the International Workshop on Natural language Processing in Biomedicine and its Applications (JNLPBA) pp. 43-49 (.pdf)
Refereed National Conferences (4)
  • Weissenbacher D., Pieri E., Ananiadou S., Rea B., Vis F., Lin Y., Procter R., Halfpenny P. 2009. “ASSIST: un moteur de recherche spécialisé pour l’analyse des cadres d’expériences”. Proceedings of Traitement Automatique des Langues Naturelles (TALN’09). (.pdf)[Demonstration]
  • Weissenbacher, Davy. and Nazarenko, Adeline. 2007. “Identifier les pronoms anaphoriques et trouver leurs antécédents: l’intérèt de la classification bayésienne”. Proceedings of Traitement Automatique des Langues Naturelles (TALN’07). (.pdf)[long paper]
  • Alphonse E., Aubin S., Bessières P., Bisson G., Hamon T., Lagarrigue S., Nazarenko A., Manine A-P.,Nédellec C., Ould Abdel Vetah M., Poibeau T., Weissenbacher D. 2004 “Extraction d’Information appliqué au domaine biomédical”. Proceedings of CIFT pp. 7-20 (.pdf)[long paper]
  • Weissenbacher, Davy. 2004. “La relation de synonymie en génomique”. In Actes RECITAL Fes, Maroc pp. 298-303 (.pdf) [Poster]
Refereed National Journals (2)
  • Weissenbacher, Davy. 2007. “Les réseaux bayésiens: un formalisme adapté au traitement automatique des langues?”. Revue d’Intelligence Artificielle (RIA), numéro spécial “Modèles Graphiques Probabilistes”, pp.371-389 [Acceptance rate:50%]
  • Aubin S., Deriviere J., Hamon T., Nazarenko A., Poibeau T., Weissenbacher D. 2007. “Une infrastructure pour l’annotation linguistique de documents issus du web: le projet ALVIS”. Revue Nouvelle des Technologies de l’information d’intelligence Artificielle (RNTI).
Other Communications
  • Rea B., Weissenbacher D., Sasaki Y., Thomas J., and Ananiadou S., “ASSIST: Education Evidence Portal”, UK e-Science All Hands Meeting 2009, Oxford, 7-9 Dec. 2009.
  • Weissenbacher D., Rea B., Ananiadou S., “Text Mining: beyond the CAQDAS tools?” Paper presented at the panel on Innovations in Methods in Media and Communication Studies at the Media, Communication and Cultural Studies Association (MeCCSA) 2009, Bradford [Short paper]
  • Weissenbacher D., Rea B., Ananiadou S., “Are the CAQDAS and the Text Mining Software Competitors?” Fourth International Conference on Interdisciplinary Social Sciences 2009, Athens [Abstract]
  • Ananiadou S., Weissenbacher D., Rea B., Pieri E., Lin Y., Vis F., Procter R., Halfpenny P., “Supporting Frame Analysis using Text Mining”. 5th International Conference on e-Social Science 2009, Cologne [long paper]
Theses
  • Weissenbacher, Davy. 2008. “Effects of imperfect annotations on Natural Language Processing systems, an applicative case study: the pronominal anaphora resolution”. PhD Thesis, Paris XIII. Under the supervision of the Professors Christophe Fouqueré and Adeline Nazarenko . (Thesis.pdf, Abstract.pdf)
  • Weissenbacher, Davy. 2003. “Etude et reconnaissance automatique des relations de synonymie et de renommage dans les textes de génomique”. Master’s Thesis Paris XIII (.doc)
Technical Reports
  • Project ASSIST:
    • D. Weissenbacher et al. 2009. Final report on ASSIST (.doc)
  • Project ALVIS:
    • A. Nazarenko et al. 2007. Final report on NLP analysis and normalization. Deliverable D5.3 ALVIS
    • A. Nazarenko et al. 2007. Complete document processing prototype. Deliverable D5.4 ALVIS
    • J. Deriviére et al. 2006. Report on NLP normalization options for IR (plateform conception). Deliverable D5.2 ALVIS
    • E. Alphonse et al. 2005. Report on method and language for the production of the augmented document representations. Deliverable D5.1 ALVIS
    • C. Nédellec et al. 2006. Prototype and documents for learning and integration of named entities and terminology. Deliverable D6.3 ALVIS
    • E. Alphonse et al. 2004. Requirements for integration of WP6 results into WP5 normalization and representation tasks and into WP9 query refinement task. Deliverable D6.2 ALVIS

 

Teaching

2019-2020BMIN 522 – AI III: Natural Language Processing for Biomedical Informatics, M1 Students (4th year)Course & Pratical work: NLP & Foundations of Machine Learning (4.5h)
2015-2016Biomedical Informatics, M1 Students (4th year) Course & Pratical work: Foundations of Biomedical Informatics Methods II, NLP & Database Modules (29h) Biomedical Informatics, M2/PhD Students Course: Software Engineering, Problem solving in Biomedical Informatics (29h)
2014-2015Biomedical Informatics, M2/PhD Students Course: Natural Language Processing Methods in Biomedical Text Mining (9h, Co-Teaching with Pr. Graciela Gonzalez) Biomedical Informatics, M1 Students (4th year) Course & Pratical work: Foundations of Biomedical Informatics Methods II, NLP Module (13h)
2010-2011Licence Physics, L3 students (3rd year) Course & Tutorial classes: Programming in Java (10h)
2006-2007Master Mathematics-Computing Science, M1 students (4th year) Course & Tutorial classes: Programming in C under Linux (18h) Licence Science and Communication, L3 students (3rd year) Course & Tutorial classes: Knowledge representation (39h) Licence Computing Science, L2 students (2nd year) Tutorial classes & Pratical work: Programming in Caml (39h)
2005-2006Master Mathematics-Computing Science, M1 students (4th year) Course & Tutorial classes: Programming in C under Linux (18h) Licence Science and Communication, L3 students (3rd year) Course & Tutorial classes: Knowledge representation (39h) Master Computing Science, M1 students (4th year) Supervision of project management (8h)
2004-2005Licence Mathematics, L1 students(1st year) Tutorial classes & Pratical work: Imperative programming in C (30h) Supervision of multiple C programming projects (19.5h) Licence Mathematics, L1 students (1st year) Drawing a business plan (19.5h)
2003-2004DEUG MIAS, L1 students(1st year) Tutorial classes & Pratical work: Imperative programming in C (69h) Supervision of multiple C programming projects

Other Activities

  • 2006 Presentation for ongoing education RISC-CNRS Bayesian Networks
  • 2006 Co-supervision with Adeline Nazarenko of Ayat Bouchouareb, a student Master (5th year)
  • 2006 Reviewer for the FinTAL conference
  • 2008 Reviewer for the Coling conference
  • 2015 Reviewer for the EMNLP conference
  • 2016 Reviewer for Social Media Mining for Public Health Monitoring and Surveillance Workshop, Pacific Symposium on Biocomputing conference

Software

    • Zodo: a biomedical informatics system to improve virus location data for phylogeography
website
    • ,
wiki
    • Zoophy: a public health informatics application for phylogeography of zoonotic RNA viruses
website
    • ,
wiki
    • WipeFinder: an enhanced search engine for the English Wikipedia
a short description
    • Bayaphora: an anaphora resolver infering from a Bayesian Network
Bitbucket (coming soon…)
    • ASSIST: a search engine specialized for Social Sciences
Video clip