PSB 2027 Session
NLP Methods for Embedding Real‑World Clinical Knowledge in LLMs: Towards Responsible, Transformative Medical AI

Call For Papers
Large language models (LLMs) are rapidly reshaping biomedical natural language processing and clinical AI, expanding the field from text mining and concept extraction to generation, reasoning, and interactive decision support. This session invites original research papers and demonstrations on NLP methods for embedding, updating, evaluating, and governing real-world clinical knowledge in LLMs for biomedical and healthcare applications. We are particularly interested in work that grounds LLMs in knowledge from clinical guidelines, ontologies, knowledge graphs, EHRs, registries, patient-reported outcomes, and patient narratives, while rigorously addressing safety, fairness, robustness, and real-world utility.
We welcome submissions on knowledge-aware model architectures and training strategies; retrieval-augmented, neuro-symbolic, and ontology-informed methods; approaches for integrating patient-centered and multimodal data as knowledge sources; benchmarks and evaluation frameworks for medical knowledge, calibration, and longitudinal or guideline-based reasoning; and methods for monitoring, updating, and governing deployed LLMs in clinical settings. We also encourage papers describing responsible implementations of knowledge-grounded LLMs in decision support, patient-facing applications, operational workflows, and public health, especially those that surface practical trade-offs, deployment lessons, and socio-technical considerations for accountable medical AI.
Organizing Team:
Session chair: Graciela Gonzalez-Hernandez, PhD (Cedars-Sinai) – Professor and Vice Chair of Computational Biomedicine and Director of the Health AI PhD Program at Cedars-Sinai, where she leads the Health Language Processing Lab. Her research focuses on clinical and biomedical natural language processing, with an emphasis on safety‑critical applications of machine learning and, more recently, LLM‑based methods for clinical reasoning, patient‑generated data, and real‑world evidence. She has a long track record of community leadership, including organizing multiple PSB sessions and workshops, as well as other national and international meetings at the intersection of NLP, clinical informatics, and public health. Cedars-Sinai fully supports her role as session chair and her commitment to organizing this proposed PSB 2027 session.
Co-organizers: Davy Weissenbacher, PhD (Cedars-Sinai), Carl Berdahl, MD (Cedars-Sinai), Timothy Daskivich, MD (Cedars-Sinai), Sumeet Chugh, MD (Cedars-Sinai), Ian M. Campbell, MD, PhD (Children’s Hospital of Philadelphia), Hongfang Liu, PhD (UTHealth Houston), Abeed Sarker, PhD (Emory University), Wendy W. Chapman, PhD and Brian Chapman, PhD (UT Southwestern Medical Center)
The organizers bring complementary expertise across clinical NLP, emergency and cardiovascular medicine, surgical oncology and outcomes research, biomedical informatics, and translational implementation of AI tools in health systems, positioning the session to bridge methodological innovation with real‑world evaluation and deployment. All the co-organizers have the support of their respective institutions.
Motivation
This session responds to an inflection point for biomedical NLP and clinical AI, where LLMs have shifted the field from text mining and concept extraction to generation, reasoning, and interactive decision support in safety‑critical settings. It is motivated by the need to understand how medical knowledge from guidelines, ontologies, EHRs, registries, and patient narratives is actually represented, updated, and tested within these models, and by growing concern about hallucinations, outdated or biased content, and opaque reasoning in high‑stakes use cases. By centering methods for integrating and validating real‑world clinical and patient‑centered knowledge in LLMs, and by foregrounding safety, fairness, and governance alongside technical innovation, the session aims to help the community articulate what “good” medical knowledge looks like inside an LLM and how it should be measured, constrained, and evolved over time.
Scope and topics of interest
The session will solicit original research papers and demonstrations on (but not limited to) the following themes, all centered on LLMs, medical knowledge, and benchmarking. Submissions should make a primary contribution to how medical knowledge is represented, evaluated, updated, or governed in LLMs, rather than to general clinical prediction or generic application of AI in medicine:
- Representing medical knowledge in LLMs
- Architectures and methods for injecting structured medical knowledge (e.g., terminologies, ontologies, knowledge graphs, institutional order sets, pathways, and clinical guidelines) into LLMs via pretraining, retrieval‑augmented generation, adapters, or hybrid neuro‑symbolic approaches.
- Techniques for aligning LLMs with institution‑ or context‑specific clinical knowledge while maintaining generalization, data protection, and robustness across sites and populations, including ontology‑ and knowledge‑graph‑based approaches.
- Methods for editing, constraining, or continually refreshing LLM medical knowledge post‑deployment (for example, knowledge‑editing and policy‑ or guideline‑driven updates that respond to emerging evidence).
- Analyses of how different knowledge‑representation choices (symbolic, distributional, hybrid) affect downstream clinical reasoning, uncertainty handling, and explainability in LLM behavior.
- Real‑world and patient‑centered data as knowledge sources
- Use of EHR text, longitudinal registries, multimodal clinical data, and real‑world evidence to refine and ground LLM medical knowledge, including cross‑institutional and cross‑population settings.
- Methods to represent and incorporate patient perspectives and lived experiences—such as patient‑generated content, patient‑reported outcomes, and community narratives—into LLMs for shared decision‑making and patient‑facing applications.
- Embedding or integrating multimodal medical data and knowledge (e.g., images, videos, waveforms, time‑series) into or alongside LLMs, including alignment of non‑text representations with clinical concepts, tasks, and workflows.
- Benchmarks and evaluation of LLM medical knowledge
- Design of tasks, datasets, and benchmarks that directly test LLM medical knowledge, including but not limited to guideline adherence, temporal and longitudinal reasoning, phenotype discovery, and calibration under uncertainty, with emphasis on generalization across institutions and populations.
- Evaluation methodologies for LLMs’ medical knowledge (e.g., multi‑center evaluation protocols, robustness under distribution shift, human‑AI team evaluation, assessment of reliability in high‑stakes or ambiguous clinical scenarios).
- Frameworks for detecting and mitigating hallucinations, outdated or conflicting knowledge, biased recommendations, and unsafe behaviors, including protocols that combine expert review, simulation with real‑world data, and prospective or quasi‑prospective studies.
- Safety, governance, and knowledge‑aware implementations
- Approaches for monitoring, updating, and governing medical knowledge within deployed LLM systems, including fairness and robustness across populations, institutions, and care settings, and alignment with regulatory, institutional, or payer policies.
- Knowledge‑aware LLM applications and deployment experiences (e.g., decision support, patient‑facing tools, operational and public health use cases) that explicitly analyze how medical knowledge is represented, constrained, and updated in practice, surfacing practical challenges, trade‑offs, and lessons learned.
- Socio‑technical and organizational designs that support safe and accountable use of LLM medical knowledge, such as governance structures, oversight and escalation processes, auditing and monitoring strategies, and training for clinicians, trainees, and other stakeholders.
Organizing team experience
The organizing team has a strong track record of successful PSB leadership and community-building in biomedical NLP and clinical informatics. Graciela Gonzalez-Hernandez has chaired and co-chaired multiple PSB sessions and workshops, including the PSB 2021 session “Advanced Methods for Big Data Analytics in Women’s Health” and the PSB 2019 “Text Mining and Visualization for Precision Medicine” workshop, demonstrating the ability to attract high-quality submissions and run well-attended, thematically coherent sessions. Davy Weissenbacher, Hongfang Liu, and Abeed Sarker have also served as organizers or co-organizers for prior PSB sessions and workshops, bringing additional experience with the PSB community and format. Together, the organizing team combines long-standing PSB leadership with complementary strengths in clinical practice, real-world AI implementation, patient-generated data, and learning health systems, well aligned with the goals of this proposed session.
Proposed format and anticipated outcomes
The session will follow the standard PSB structure with the top 6–8 peer-reviewed full papers presented as oral talks, complemented by one short, invited talk or focused panel with implementation experts (e.g., health system AI, clinical leadership, regulatory-facing roles). A closing moderated discussion will surface gaps in current benchmarks, datasets, and evaluation practices for medical knowledge in LLMs and outline priorities for shared resources and cross-institutional collaborations (e.g., shared tasks, working groups). Expected outcomes include a curated set of reference papers on medical knowledge integration and validation in LLMs and a clearer community view of what “good” evaluation looks like for medical LLM knowledge, helping catalyze responsibly transformative, knowledge-grounded LLMs for real-world biomedical and clinical use.
