Evaluating Large Language Models on American Board of Anesthesiology-style Anesthesiology Questions: Accuracy, Domain Consistency, and Clinical Implications.
以美國麻醉科醫學會(American Board of Anesthesiology)風格麻醉學試題評估大型語言模型:準確性、領域一致性與臨床意涵
J Cardiothorac Vasc Anesth 2025-06-15
Large Language Models Take on Cardiothoracic Surgery: A Comparative Analysis of the Performance of Four Models on American Board of Thoracic Surgery Exam Questions in 2023.
大型語言模型在心胸外科的應用:2023年四個模型在美國胸外科醫學會考試問題上的表現比較分析。
Cureus 2024-08-22
Performance of Publicly Available Large Language Models on Internal Medicine Board-style Questions.
公開可用的大型語言模型在內科醫學考試風格問題上的表現。
PLOS Digit Health 2024-09-17
Evaluating Large Language Models in Dental Anesthesiology: A Comparative Analysis of ChatGPT-4, Claude 3 Opus, and Gemini 1.0 on the Japanese Dental Society of Anesthesiology Board Certification Exam.
評估大型語言模型在牙科麻醉學中的應用:ChatGPT-4、Claude 3 Opus 和 Gemini 1.0 在日本牙科麻醉學會董事認證考試中的比較分析。
Cureus 2024-10-29
本研究評估了三個大型語言模型(LLMs)—ChatGPT-4、Gemini 1.0 和 Claude 3 Opus—在回答日本麻醉學會牙科麻醉專業認證考試問題的表現。結果顯示,ChatGPT-4的正確率為51.2%,Claude 3 Opus為47.4%,而Gemini 1.0僅有30.3%。雖然前兩者在某些領域表現較佳,但目前的正確率仍不足以支持臨床應用。研究指出,需改善高品質資訊的可獲得性及提示設計,以提升LLMs在牙科麻醉的實用性。
PubMedDOI
Harnessing advanced large language models in otolaryngology board examinations: an investigation using python and application programming interfaces.
在耳鼻喉科專科醫師考試中運用先進大型語言模型:以 Python 與應用程式介面進行的探討
Eur Arch Otorhinolaryngol 2025-04-25
Evaluating the Accuracy and Reliability of Large Language Models (ChatGPT, Claude, DeepSeek, Gemini, Grok, and Le Chat) in Answering Item-Analyzed Multiple-Choice Questions on Blood Physiology.
大型語言模型(ChatGPT、Claude、DeepSeek、Gemini、Grok 及 Le Chat)在回答血液生理學題項分析選擇題時之準確性與可靠性評估
Cureus 2025-05-09
Comparison of Large Language Models' Performance on 600 Nuclear Medicine Technology Board Examination-Style Questions.
大型語言模型在600題核醫技術師國家考試題型上的表現比較
J Nucl Med Technol 2025-05-09
Evaluating Large Language Models for Enhancing Radiology Specialty Examination: A Comparative Study with Human Performance.
用於提升放射科專科考試的大型語言模型評估:與人類表現的比較研究
Acad Radiol 2025-05-28
Evaluating and leveraging large language models in clinical pharmacology and therapeutics assessment: From exam takers to exam shapers.
在臨床藥理學與治療學評估中評價與應用大型語言模型:從考生到考題設計者
Br J Clin Pharmacol 2025-06-10
最新研究發現,像 ChatGPT-4 Omni 這類大型語言模型,在 CPT 和歐洲處方考試的表現跟醫學生差不多,甚至更厲害,特別是在知識和開藥技巧上。這些 AI 還能揪出題目寫不清楚的地方,不只適合當教學工具,也有助於改進考題品質。
PubMedDOI
The applications of ChatGPT and other large language models in anesthesiology and critical care: a systematic review.
ChatGPT 及其他大型語言模型在麻醉學與重症醫學中的應用:系統性回顧
Can J Anaesth 2025-06-16