Academic Journal
Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems
| Τίτλος: | Answering real-world clinical questions using large language model, retrieval-augmented generation, and agentic systems |
|---|---|
| Συγγραφείς: | Yen Sia Low, Michael L Jackson, Rebecca J Hyde, Robert E Brown, Neil M Sanghavi, Julian D Baldwin, C William Pike, Jananee Muralidharan, Gavin Hui, Natasha Alexander, Hadeel Hassan, Rahul V Nene, Morgan Pike, Courtney J Pokrzywa, Shivam Vedak, Adam Paul Yan, Dong-han Yao, Amy R Zipursky, Christina Dinh, Philip Ballentine, Dan C Derieg, Vladimir Polony, Rehan N Chawdry, Jordan Davies, Brigham B Hyde, Nigam H Shah, Saurabh Gombar |
| Πηγή: | Digit Health Digital Health, Vol 11 (2025) |
| Στοιχεία εκδότη: | SAGE Publications, 2025. |
| Έτος έκδοσης: | 2025 |
| Θεματικοί όροι: | Computer applications to medicine. Medical informatics, R858-859.7, Original Research Article |
| Περιγραφή: | Objective The practice of evidence-based medicine can be challenging when relevant data are lacking or difficult to contextualize for a specific patient. Large language models (LLMs) could potentially address both challenges by summarizing published literature or generating new studies using real-world data. Materials and Methods We submitted 50 clinical questions to five LLM-based systems: OpenEvidence, which uses an LLM for retrieval-augmented generation (RAG); ChatRWD, which uses an LLM as an interface to a data extraction and analysis pipeline; and three general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini 1.5 Pro). Nine independent physicians evaluated the answers for relevance, quality of supporting evidence, and actionability (i.e., sufficient to justify or change clinical practice). Results General-purpose LLMs rarely produced relevant, evidence-based answers (2–10% of questions). In contrast, RAG-based and agentic LLM systems, respectively, produced relevant, evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. OpenEvidence produced actionable results for 48% of questions with existing evidence, compared to 37% for ChatRWD and Discussion Special-purpose LLM systems greatly outperformed general-purpose LLMs in producing answers to clinical questions. Retrieval-augmented generation-based LLM (OpenEvidence) performed well when existing data were available, while only the agentic ChatRWD was able to provide actionable answers when preexisting studies were lacking. Conclusion Synergistic systems combining RAG-based evidence summarization and agentic generation of novel evidence could improve the availability of pertinent evidence for patient care. |
| Τύπος εγγράφου: | Article Other literature type |
| Γλώσσα: | English |
| ISSN: | 2055-2076 |
| DOI: | 10.1177/20552076251348850 |
| Σύνδεσμος πρόσβασης: | https://doaj.org/article/e32cc1f79b9645f3899c1190f2646bf0 |
| Rights: | URL: https://journals.sagepub.com/page/policies/text-and-data-mining-license URL: http://creativecommons.org/licenses/by-nc-nd/4.0/This article is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 License (http://creativecommons.org/licenses/by-nc-nd/4.0/) which permits non-commercial use, reproduction and distribution of the work as published without adaptation or alteration, without further permission provided the original work is attributed as specified on the SAGE and Open Access page (http://us.sagepub.com/en-us/nam/open-access-at-sage). |
| Αριθμός Καταχώρησης: | edsair.doi.dedup.....8686bcd51a9a499d7a4b763945ca835c |
| Βάση Δεδομένων: | OpenAIRE |
| ISSN: | 20552076 |
|---|---|
| DOI: | 10.1177/20552076251348850 |