Artificial intelligence; Bilateral vocal fold paralysis; ChatGPT; Decision-making; Laryngology; Llama; Otorhinolaryngology; Speech and Hearing; LPN and LVN
Abstract :
[en] [en] OBJECTIVES: The development of artificial intelligence-powered language models, such as Chatbot Generative Pre-trained Transformer (ChatGPT) or Large Language Model Meta AI (Llama), is emerging in medicine. Patients and practitioners have full access to chatbots that may provide medical information. The aim of this study was to explore the performance and accuracy of ChatGPT and Llama in treatment decision-making for bilateral vocal fold paralysis (BVFP).
METHODS: Data of 20 clinical cases, treated between 2018 and 2023, were retrospectively collected from four tertiary laryngology centers in Europe. The cases were defined as the most common or most challenging scenarios regarding BVFP treatment. The treatment proposals were discussed in their local multidisciplinary teams (MDT). Each case was presented to ChatGPT-4.0 and Llama Chat-2.0, and potential treatment strategies were requested. The Artificial Intelligence Performance Instrument (AIPI) treatment subscore was used to compare both Chatbots' performances to MDT treatment proposal.
RESULTS: Most common etiology of BVFP was thyroid surgery. A form of partial arytenoidectomy with or without posterior transverse cordotomy was the MDT proposal for most cases. The accuracy of both Chatbots was very low regarding their treatment proposals, with a maximum AIPI treatment score in 5% of the cases. In most cases even harmful assertions were made, including the suggestion of vocal fold medialisation to treat patients with stridor and dyspnea. ChatGPT-4.0 performed significantly better in suggesting the correct treatment as part of the treatment proposal (50%) compared to Llama Chat-2.0 (15%).
CONCLUSION: ChatGPT and Llama are judged as inaccurate in proposing correct treatment for BVFP. ChatGPT significantly outperformed Llama. Treatment decision-making for a complex condition such as BVFP is clearly beyond the Chatbot's knowledge expertise. This study highlights the complexity and heterogeneity of BVFP treatment, and the need for further guidelines dedicated to the management of BVFP.
Disciplines :
Otolaryngology
Author, co-author :
Dronkers, Emilie A C; National Centre for Airway Reconstruction, Imperial College Healthcare NHS Trust, London, UK. Electronic address: emiliedronkers@gmail.com
Geneid, Ahmed; Department of Otolaryngology and Phoniatrics-Head and Neck Surgery, Helsinki University Hospital and University of Helsinki, Helsinki, Finland
Al Yaghchi, Chadwan; National Centre for Airway Reconstruction, Imperial College Healthcare NHS Trust, London, UK
Lechien, Jérome ; Université de Mons - UMONS > Faculté de Psychologie et des Sciences de l'Education > Service de Métrologie et Sciences du langage ; Université de Mons - UMONS > Faculté de Médecine et de Pharmacie > Service de Chirurgie
Language :
English
Title :
Evaluating the Potential of AI Chatbots in Treatment Decision-making for Acquired Bilateral Vocal Fold Paralysis in Adults.
Hill-Yardin, E.L., Hutchinson, M.R., Laycock, R., et al. A Chat(GPT) about the future of scientific publishing. Brain Behav Immun 110 (2023), 152–154, 10.1016/j.bbi.2023.02.022.
Goodman, R.S., Patrinely, J.R., Stone, C.A. Jr, et al. Accuracy and reliability of chatbot responses to physician questions. JAMA Netw Open, 6, 2023, e2336483, 10.1001/jamanetworkopen.2023.36483.
Li, Y., Li, Z., Zhang, K., et al. ChatDoctor: a medical chat model fine-tuned on a Large Language Model Meta-AI (LLaMA) using medical domain knowledge. Cureus, 15, 2023, e40895, 10.7759/cureus.40895.
Djugai, S., Boeger, D., Buentzel, J., et al. Chronic vocal cord palsy in Thuringia, Germany: a population-based study on epidemiology and outcome. Eur Arch Otorhinolaryngol 271 (2014), 329–335, 10.1007/s00405-013-2655-1.
Sapundzhiev, N., Lichtenberger, G., Eckel, H.E., et al. Surgery of adult bilateral vocal fold paralysis in adduction: history and trends. Eur Arch Otorhinolaryngol 265 (2008), 1501–1514, 10.1007/s00405-008-0665-1.
Nawka, T., Gugatschka, M., Kolmel, J.C., et al. Therapy of bilateral vocal fold paralysis: real world data of an international multi-center registry. PLoS One, 14, 2019, e0216096, 10.1371/journal.pone.0216096.
de Almeida, R.B.S., Costa, C.C., Silva Duarte, P.L.E., et al. Surgical treatment applied to bilateral vocal fold paralysis in adults: systematic review. J Voice 37 (2023), 289.e1–289.e13, 10.1016/j.jvoice.2020.11.018.
Titulaer, K., Schlattmann, P., Guntinas-Lichius, O., Surgery for bilateral vocal fold paralysis: systematic review and meta-analysis. Front Surg, 22, 2022, 956338, 10.3389/fsurg.2022.956338.
Liu, S., Wright, A.P., Patterson, B.L., et al. Using AI-generated suggestions from ChatGPT to optimize clinical decision support. J Am Med Inform Assoc 30 (2023), 1237–1245, 10.1093/jamia/ocad072.
Lechien, J.R., Maniaci, A., Gengler, I., et al. Validity and reliability of an instrument evaluating the performance of intelligent chatbot: the Artificial Intelligence Performance Instrument (AIPI). Eur Arch Otorhinolaryngol 281 (2024), 2063–2079, 10.1007/s00405-023-08219-y.
Vaishya, R., Misra, A., Vaish, A., ChatGPT: is this version good for healthcare and research?. Diabetes Metab Syndr, 17, 2023, 102744, 10.1016/j.dsx.2023.102744.
Lechien, J.R., Chiesa-Estomba, C.M., Baudouin, R., et al. Accuracy of ChatGPT in head and neck oncological board decisions: preliminary findings. Eur Arch Otorhinolaryngol 281 (2024), 2105–2114, 10.1007/s00405-023-08326-w.
Lechien, J.R., Georgescu, B.M., Hans, S., et al. ChatGPT performance in laryngology and head and neck surgery: a clinical case-series. Eur Arch Otorhinolaryngol 281 (2024), 319–333, 10.1007/s00405-023-08282-5.
Vaira, L.A., Lechien, J.R., Abbate, V., et al. Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg, 2023, 10.1002/ohn.489 Online ahead of print.
Wiggers K. Glass health is building an AI for suggesting medical diagnoses; 2023. Available at: 〈https://techcrunch.com/2023/09/08/glass-health-is-building-an-ai-for-suggesting-medical-diagnoses/〉. Accessed December 12, 2023.
Singhal, K., Azizi, S., Tu, T., et al. Large language models encode clinical knowledge. Nature 620 (2023), 172–180, 10.1038/s41586-023-06291-2.
Kobie N. Babylon Disrupted the UK's Health System. Then It Left; 2023. Available at: 〈https://www.wired.co.uk/article/babylon-disrupted-uk-health-system-then-left〉. Accessed December 12, 2023.