MedFrenchmark, a Small Set for Benchmarking Generative LLMs in Medical French.
MedFrenchmark:一個用於基準測試醫學法語生成大型語言模型的小型數據集。
Stud Health Technol Inform 2024-08-23
Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation.
醫學領域中用於評估大型語言模型回應的資料集和基準(MedGPTEval):評估開發和驗證。
JMIR Med Inform 2024-07-02
MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering.
MedExpQA:大型語言模型在醫學問答中的多語言基準測試。
Artif Intell Med 2024-08-09