Evaluating performance of large language models for atrial fibrillation management using different prompting strategies and languages.
使用不同提示策略與語言評估大型語言模型於心房顫動管理的表現
Sci Rep 2025-05-30
Performance Evaluation of Large Language Models in Cervical Cancer Management Based on a Standardized Questionnaire: Comparative Study.
基於標準化問卷的子宮頸癌管理中大型語言模型的性能評估:比較研究。
J Med Internet Res 2025-02-05
Evaluation of the Performance of Three Large Language Models in Clinical Decision Support: A Comparative Study Based on Actual Cases.
三種大型語言模型在臨床決策支持中的表現評估:基於實際案例的比較研究。
J Med Syst 2025-02-13
Evaluating Large Language Models in Cardiovascular Antithrombotic Care: Performance, Accuracy, and Implications for Clinical Practice.
心血管抗血栓治療中大型語言模型的評估:表現、準確性及其對臨床實務的影響
Can J Cardiol 2025-04-16
這項研究發現,Claude 3 Opus 在心血管抗凝治療案例的準確度勝過其他大型語言模型和臨床醫師,正確率達85%。部分LLMs表現媲美甚至超越有經驗醫師,但免費版模型有時會給出不佳或不安全的建議。所有LLMs在生活型態和飲食建議上表現穩定。研究提醒,醫療決策時應謹慎選用並驗證LLMs。
PubMedDOI
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.
大型語言模型在回答臨床研究問題時的準確性:系統性回顧與網絡統合分析
J Med Internet Res 2025-04-30
The actual performance of large language models in providing liver cirrhosis-related information: A comparative study.
大型語言模型在提供肝硬化相關資訊時的實際表現:一項比較研究
Int J Med Inform 2025-05-07
Large language model comparisons between English and Chinese query performance for cardiovascular prevention.
英語與中文查詢在心血管預防領域中大型語言模型表現之比較
Commun Med (Lond) 2025-05-16
Large language model evaluation in autoimmune disease clinical questions comparing ChatGPT 4o, Claude 3.5 Sonnet and Gemini 1.5 pro.
自體免疫疾病臨床問題中大型語言模型的評估:比較 ChatGPT 4o、Claude 3.5 Sonnet 與 Gemini 1.5 pro
Sci Rep 2025-05-21