Evaluating large language and large reasoning models as decision support tools in emergency internal medicine.
將標題「Evaluating large language and large reasoning models as decision support tools in emergency internal medicine.」翻譯為繁體中文(zh-TW)如下:
「評估大型語言模型與大型推理模型作為急診內科決策支援工具的應用」
Comput Biol Med 2025-05-13
Evaluating the use of large language models to provide clinical recommendations in the Emergency Department.
評估大型語言模型在急診科提供臨床建議的應用。
Nat Commun 2024-10-08
Comparative evaluation and performance of large language models on expert level critical care questions: a benchmark study.
大型語言模型在專家級重症護理問題上的比較評估與表現:基準研究。
Crit Care 2025-02-10
這項研究評估了五個大型語言模型(LLMs)在重症醫學中的表現,針對1181道選擇題進行測試。結果顯示,GPT-4o的準確率最高,達93.3%,其次是Llama 3.1 70B(87.5%)和Mistral Large 2407(87.9%)。所有模型的表現都超過隨機猜測和人類醫師,但GPT-3.5-turbo未顯著優於醫師。儘管準確性高,模型仍有錯誤,需謹慎評估。GPT-4o成本高昂,對能源消耗引發關注。總體而言,LLMs在重症醫學中展現潛力,但需持續評估以確保負責任的使用。
PubMedDOI
Evaluation of the Performance of Three Large Language Models in Clinical Decision Support: A Comparative Study Based on Actual Cases.
三種大型語言模型在臨床決策支持中的表現評估:基於實際案例的比較研究。
J Med Syst 2025-02-13
Evaluating Large Language Models in Cardiovascular Antithrombotic Care: Performance, Accuracy, and Implications for Clinical Practice.
心血管抗血栓治療中大型語言模型的評估:表現、準確性及其對臨床實務的影響
Can J Cardiol 2025-04-16
這項研究發現,Claude 3 Opus 在心血管抗凝治療案例的準確度勝過其他大型語言模型和臨床醫師,正確率達85%。部分LLMs表現媲美甚至超越有經驗醫師,但免費版模型有時會給出不佳或不安全的建議。所有LLMs在生活型態和飲食建議上表現穩定。研究提醒,醫療決策時應謹慎選用並驗證LLMs。
PubMedDOI
Evaluating large language model workflows in clinical decision support for triage and referral and diagnosis.
臨床決策支援中大型語言模型於分診、轉診與診斷流程的評估
NPJ Digit Med 2025-05-09
Performance evaluation of large language models in pediatric nephrology clinical decision support: a comprehensive assessment.
大型語言模型於兒童腎臟科臨床決策支援之表現評估:全面性評估
Pediatr Nephrol 2025-06-03
Comparative analysis of large language models in clinical diagnosis: performance evaluation across common and complex medical cases.
大型語言模型於臨床診斷的比較分析:於常見與複雜醫療案例中的表現評估
JAMIA Open 2025-06-13