[en] Background/Objectives: Artificial intelligence (AI), particularly large language models (LLMs), has demonstrated versatility in various applications but faces challenges in specialized domains like neurology. This study evaluates a specialized LLM’s capability and trustworthiness in complex neurological diagnosis, comparing its performance to neurologists in simulated clinical settings. Methods: We deployed GPT-4 Turbo (OpenAI, San Francisco, CA, US) through Neura (Sciense, New York, NY, US), an AI infrastructure with a dual-database architecture integrating “long-term memory” and “short-term memory” components on a curated neurological corpus. Five representative clinical scenarios were presented to 13 neurologists and the AI system. Participants formulated differential diagnoses based on initial presentations, followed by definitive diagnoses after receiving conclusive clinical information. Two senior academic neurologists blindly evaluated all responses, while an independent investigator assessed the verifiability of AI-generated information. Results: AI achieved a significantly higher normalized score (86.17%) compared to neurologists (55.11%, p < 0.001). For differential diagnosis questions, AI scored 85% versus 46.15% for neurologists, and for final diagnosis, 88.24% versus 70.93%. AI obtained 15 maximum scores in its 20 evaluations and responded in under 30 s compared to neurologists’ average of 9 min. All AI-provided references were classified as relevant with no hallucinatory content detected. Conclusions: A specialized LLM demonstrated superior diagnostic performance compared to practicing neurologists across complex clinical challenges. This indicates that appropriately harnessed LLMs with curated knowledge bases can achieve domain-specific relevance in complex clinical disciplines, suggesting potential for AI as a time-efficient asset in clinical practice.
Disciplines :
Neurology
Author, co-author :
Barrit, Sami ; Neurosurgery, Université Libre de Bruxelles, 1070 Brussels, Belgium ; Neurosurgery, CHU Tivoli, 7110 La Louvière, Belgium ; Neurodynamics Laboratory, Department of Neurosurgery, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, USA ; Sciense, New York, NY 10013, USA
Torcida, Nathan ; Sciense, New York, NY 10013, USA ; Neurology, Université Libre de Bruxelles, 1050 Brussels, Belgium
Mazeraud, Aurelien ; Anesthésie-Réanimation, GHU Paris, Pôle Neuro, 75014 Paris, France ; Neurosciences, Université de Paris, 75006 Paris, France
Boulogne, Sebastien ; Neurophysiology and Epileptology, Universite de Lyon, 69007 Lyon, France
Benoit, Jeanne ; Neurology, CHU de Nice, Université Côte d’Azur, UMR2CA, 06000 Nice, France
Maarouf, Adil ; Neurology, La Timone Hospital, AP-HM, 13385 Marseille, France ; Department of Neurology, Maladie Inflammatoire du Cerveau et de la Moelle Epinière (MICeME), Aix Marseille Université (AMU), CNRS, CRMBM, 13385 Marseille, France
Maldonado Slootjes, Sofia ; Department of Neurology, Universitair Ziekenhuis Brussel (UZ Brussel), 1090 Brussels, Belgium ; NEUR Research Group, Vrije Universiteit Brussel (VUB), 1090 Brussels, Belgium
Redon, Sylvain ; Evaluation and Treatment of Pain, FHU INOVPAIN, La Timone Hospital, AP-HM, 13385 Marseille, France
Robin, Alexis ; Neurology, CHU Grenoble, 38700 Grenoble, France
Hadidane, Sofiene; Cabinets de Neurologie d’Allauch et Plan de Cuques, 13190 Allauch, France
Harlay, Vincent ; Neuro-Oncology, AMU, La Timone Hospital, AP-HM, 13005 Marseille, France
Tota, Vito ; Université de Mons - UMONS > Faculté de Médecine et de Pharmacie > Service de Neurosciences ; Neurology, CHU Helora, 7000 Mons, Belgium
Madec, Tanguy ; Neurology, Hospital of Noumea, 98800 Nouméa, France
Niset, Alexandre; Sciense, New York, NY 10013, USA ; Emergency Medicine, Université Catholique de Louvain, 1348 Louvain-la-Neuve, Belgium ; Pediatric Intensive Care Unit, Cliniques Universitaires Saint-Luc, 1200 Brussels, Belgium
Al Barajraji, Mejdeddine ; Sciense, New York, NY 10013, USA ; Département des Neurosciences Cliniques, Centre Hospitalier Universitaire Vaudois (CHUV), 1005 Lausanne, Switzerland
Madsen, Joseph R.; Neurodynamics Laboratory, Department of Neurosurgery, Boston Children’s Hospital, Harvard Medical School, Boston, MA 02115, USA
El Hadwe, Salim; Neurosurgery, Université Libre de Bruxelles, 1070 Brussels, Belgium ; Sciense, New York, NY 10013, USA ; Clinical Neuroscience, University of Cambridge, Cambridge CB2 1TN, UK
Massager, Nicolas ; Université de Mons - UMONS > Faculté de Médecine et de Pharmacie > Service du Doyen de la Faculté de Médecine et Pharmacie ; Neurosurgery, Université Libre de Bruxelles, 1070 Brussels, Belgium ; Neurosurgery, CHU Tivoli, 7110 La Louvière, Belgium
Lagarde, Stanislas ; AMU, INSERM, Institut Neuroscience des Systèmes (INS), 13005 Marseille, France ; APHM, Timone Hospital, Epileptology and Cerebral Rhythmology, 13005 Marseille, France
Carron, Romain; Sciense, New York, NY 10013, USA ; AMU, INSERM, Institut Neuroscience des Systèmes (INS), 13005 Marseille, France ; Stereotactic and Functional Neurosurgery, La Timone Hospital, AP-HM, 13385 Marseille, France
Xu Y. Liu X. Cao X. Huang C. Liu E. Qian S. Liu X. Wu Y. Dong F. Zhang J. et al. Artificial intelligence: A powerful paradigm for scientific research Innovation 2021 2 100179 10.1016/j.xinn.2021.100179 34877560
Radford A. Narasimhan K. Salimans T. Sutskever I. Improving Language Understanding by Generative Pre-Training Preprint 2018 Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 1 March 2023)
Devlin J. Chang M.-W. Lee K. Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding arXiv 2018 10.48550/arXiv.1810.04805 1810.04805
Achiam J. Adler S. Agarwal S. Ahmad L. Akkaya I. Aleman F.L. Almeida D. Altenschmidt J. Altman S. McGrew B. et al. GPT-4 Technical Report arXiv 2023 10.48550/arXiv.2303.08774 2303.08774
Beam A.L. Drazen J.M. Kohane I.S. Leong T.Y. Manrai A.K. Rubin E.J. Artificial Intelligence in Medicine N. Engl. J. Med. 2023 388 1220 1221 10.1056/NEJMe2206291
Ling C. Zhao X. Lu J. Deng C. Zheng C. Wang J. Chowdhury T. Li Y. Cui H. Zhao L. et al. Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey arXiv 2023 10.48550/arXiv.2305.18703 2305.18703
Strubell E. Ganesh A. McCallum A. Energy and Policy Considerations for Deep Learning in NLP arXiv 2019 10.48550/arXiv.1906.02243 1906.02243
Singhal K. Azizi S. Tu T. Singhal K. Azizi S. Tu T. Mahdavi S.S. Wei J. Chung H.W. Natarajan V. et al. Large language models encode clinical knowledge Nature 2023 620 172 180 10.1038/s41586-023-06291-2
Lipton Z.C. The Mythos of Model Interpretability Queue 2018 16 31 57 10.1145/3236386.3241340
Huang K. Altosaar J. Ranganath R. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission arXiv 2019 10.48550/arXiv.1904.05342 1904.05342
Liu N.F. Lin K. Hewitt J. Paranjape A. Bevilacqua M. Petroni F. Liang P. Lost in the Middle: How Language Models Use Long Contexts (Version 3) arXiv 2023 10.48550/ARXIV.2307.03172 2307.03172
Lewis P. Perez E. Piktus A. Petroni F. Karpukhin V. Goyal N. Küttler H. Lewis M. Yih W.-T. Rocktäschel T. et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks arXiv 2020 10.48550/arXiv.2005.11401 2005.11401
Mikolov T. Chen K. Corrado G. Dean J. Efficient Estimation of Word Representations in Vector Space arXiv 2013 10.48550/arXiv.1301.3781 1301.3781
Pokorny J. NoSQL databases Proceedings of the 13th International Conference on Information Integration and Web-based Applications and Services Ho Chi Minh City, Vietnam 5–7 December 2011 10.1145/2095536.2095583
Taipalus T. Vector database management systems: Fundamental concepts, use-cases, and current challenges arXiv 2023 2309.11322 10.48550/arXiv.2309.11322
Han M.H. Adams and Victor’s Principles of Neurology American Association of Neuropathologists, Inc. Littleton, CO, USA 2009
Brazis P.W. Masdeu J.C. Biller J. Localization in Clinical Neurology 6th ed. Wolters Kluwer Health Adis (ESP) Waltham, MA, USA 2012 1 668
Jankovic J. Mazziotta J.C. Pomeroy S.L. Newman N.J. Bradley’s Neurology in Clinical Practice Elsevier Health Sciences Amsterdam, The Netherlands 2021
Cooper P.E. Cooper PE. DeJong’s The Neurologic Examination. 2005. Sixth edition. By William W. Campbell. Published by Lippincott, Williams & Wilkins. 671 pages. C$140 approx Can. J. Neurol. Sci. 2017 32 558 10.1017/s0317167100116099
Rowland L.P. Pedley T.A. Merritt H.H. Merritt’s Neurology Lippincott Williams & Wilkins Philadelphia, PA, USA 2010
Edition MMP Neurologic Disorders 2023 Available online: https://www.msdmanuals.com/professional/neurologic-disorders (accessed on 25 September 2023)
Wikipedia Category: Neurological Disorders 2023 Available online: https://en.wikipedia.org/wiki/Category:Neurological_disorders_%E2%80%8C (accessed on 25 September 2023)
Lun R. Niznick N. Padmore R. Mack J. Shamy M. Stotts G. Blacquiere D. Clinical Reasoning: Recurrent strokes secondary to unknown vasculopathy Neurology 2020 94 e2396 e2401 10.1212/WNL.0000000000009534
Francis A.W. Kiernan C.L. Huvard M.J. Vargas A. Zeidman L.A. Moss H.E. Clinical Reasoning: An unusual diagnostic triad. Susac syndrome, or retinocochleocerebral vasculopathy Neurology 2015 85 e17 e21 10.1212/WNL.0000000000001760
Choi J.H. Wallach A.I. Rosales D. Margiewicz S.E. Belmont H.M. Lucchinetti C.F. Minen M.T. Clinical Reasoning: A 50-year-old woman with SLE and a tumefactive lesion Neurology 2017 89 e140 e145 10.1212/WNL.0000000000004386
Harada Y. Elkhider H. Masangkay N. Lotia M. Clinical Reasoning: A 65-year-old man with asymmetric weakness and paresthesias Neurology 2019 93 856 861 10.1212/WNL.0000000000008444
McIntosh P. Scott B. Clinical Reasoning: A 55-Year-Old Man with Odd Behavior and Abnormal Movements Neurology 2021 97 1090 1093 10.1212/WNL.0000000000012663
Chai J. Evans L. Hughes T. Diagnostic aids: The Surgical Sieve revisited Clin Teach. 2017 14 263 267 10.1111/tct.12546
Kung T.H. Cheatham M. Medenilla A. Sillos C. De Leon L. Elepaño C. Madriaga M. Aggabao R. Diaz-Candido G. Tseng V. et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models PLoS Digit. Health 2023 2 e0000198 10.1371/journal.pdig.0000198
Schubert M.C. Wick W. Venkataramani V. Performance of Large Language Models on a Neurology Board-Style Examination JAMA Netw. Open 2023 6 e2346721 10.1001/jamanetworkopen.2023.46721 38060223
Singhal K. Tu T. Gottweis J. Sayres R. Wulczyn E. Amin M. Hou L. Clark K. Pfohl S.R. Cole-Lewis H. et al. Towards Expert-Level Medical Question Answering with Large Language Models arXiv 2023 10.48550/arXiv.2305.09617 2305.09617
Ray P.P. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope Internet Things Cyber-Phys. Syst. 2023 3 121 154 10.1016/j.iotcps.2023.04.003
Brown T. Mann B. Ryder N. Subbiah M. Kaplan J.D. Dhariwal P. Neelakantan A. Shyam P. Sastry G. Askell A. Language Models are Few-Shot Learners arXiv 2020 10.48550/arXiv.2005.14165 2005.14165
Touvron H. Martin L. Stone K. Subbiah M. Kaplan J. Dhariwal P. Neelakantan A. Shyam P. Sastry G. Askell A. et al. Llama 2: Open Foundation and Fine-Tuned Chat Models arXiv 2023 10.48550/arXiv.2307.09288 2307.09288
Jiang A.Q. Sablayrolles A. Mensch A. Bamford C. Chaplot D.S. de las Casas D. Bressand F. Lengyel G. Lample G. Saulnier L. et al. Mistral 7B arXiv 2023 10.48550/arXiv.2310.06825 2310.06825
Li Y. Du M. Song R. Wang X. Wang Y. A Survey on Fairness in Large Language Models arXiv 2023 10.48550/arXiv.2308.10149 2308.10149
Wu M. Fikri Aji A. Style Over Substance: Evaluation Biases for Large Language Models arXiv 2023 10.48550/arXiv.2307.03025 2307.03025
Sanderson K. GPT-4 is here: What scientists think arXiv 2023 615 773 10.1038/d41586-023-00816-5
Louie P. Wilkes R. Representations of race and skin tone in medical textbook imagery Soc. Sci. Med. 2018 202 38 42 10.1016/j.socscimed.2018.02.023 29501717
Belyaeva A. Cosentino J. Hormozdiari F. Eswaran K. Shetty S. Corrado G. Carroll A. McLean C.Y. Furlotte N.A. Multimodal LLMs for health grounded in individual-specific data arXiv 2023 10.48550/arXiv.2307.09018 2307.09018
Lyu C. Wu M. Wang L. Huang X. Liu B. Du Z. Shi S. Tu Z. Macaw-LLM: Multi-Modal Language Modeling with Image, Audio, Video, and Text Integration arXiv 2023 10.48550/arXiv.2306.09093 2306.09093
Chollet F. On the Measure of Intelligence arXiv 2019 10.48550/arXiv.1911.01547 1911.01547
Berglund L. Tong M. Kaufmann M. Balesni M. Cooper Stickland A. Korbak T. Evans O. The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A” arXiv 2023 10.48550/arXiv.2309.12288 2309.12288
Dziri N. Lu X. Sclar M. Li X.L. Jiang L. Lin B.Y. Welleck S. West P. Bhagavatula C. Le Bras R. et al. Faith and Fate: Limits of Transformers on Compositionality arXiv 2023 10.48550/arXiv.2305.18654 2305.18654
McCoy R.T. Yao S. Friedman D. Hardy M. Griffiths T.L. Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve arXiv 2023 10.48550/arXiv.2309.13638 2309.13638
Mitchell M. Palmarini A.B. Moskvichev A. Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks arXiv 2023 10.48550/arXiv.2311.09247 2311.09247
Gallegos I.O. Rossi R.A. Barrow J. Tanjim M.M. Kim S. Dernoncourt F. Yu T. Zhang R. Ahmed N.K. Bias and Fairness in Large Language Models: A Survey arXiv 2023 2309.00770 10.1162/coli_a_00524 Available online: https://ui.adsabs.harvard.edu/abs/2023arXiv230900770G (accessed on 1 September 2023)
Rowland L.P. Pedley T.A. Merritt H.H. Merritt’s Neurology Wolters Kluwer Alphen aan den Rijn, The Netherlands 2016 854 1472 854, 690, 1180, 1348, 1445, 1472 145119336X
Ferreri A.J. Campo E. Seymour J.F. Willemze R. Ilariucci F. Ambrosetti A. Zucca E. Rossi G. López-Guillermo A. Pavlovsky M.A. et al. Intravascular lymphoma: Clinical presentation, natural history, management and prognostic factors in a series of 38 cases, with special emphasis on the ‘cutaneous variant’ Br. J. Haematol. 2004 127 173 183 10.1111/j.1365-2141.2004.05177.x 15461623
Ropper A. Samuels M. Klein J. Adams and Victor’s Principles of Neurology 10th ed. McGraw-Hill New York, NY, USA 2014 889 2032 889, 1224, 1543, 2032 978-0071794794
Jung H.H. Danek A. Walker R.H. Neuroacanthocytosis Syndromes Orphanet J. Rare Dis. 2011 6 68 10.1186/1750-1172-6-68