A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options.
最近對大型語言模型在放射腫瘤物理學表現的評估,使用隨機打亂選項的問題。
ArXiv 2025-01-13
A Comprehensive Analysis of a Social Intelligence Dataset and Response Tendencies Between Large Language Models (LLMs) and Humans.
大型語言模型(LLMs)與人類之間社會智慧數據集及反應傾向的綜合分析。
Sensors (Basel) 2025-01-25
APBench and benchmarking large language model performance in fundamental astrodynamics problems for space engineering.
APBench 與大型語言模型在太空工程基本天體力學問題中的性能基準測試。
Sci Rep 2025-03-06
Achieving GPT-4o level performance in astronomy with a specialized 8B-parameter large language model.
以專門的 8B 參數大型語言模型實現天文學領域 GPT-4o 等級的表現
Sci Rep 2025-04-21
AstroSage-Llama-3.1-8B 是專為天文學打造的 AI 模型,訓練時用上大量天文相關資料。它在天文學測試上表現超越其他同級模型,甚至能跟 GPT-4o 一較高下。現在已免費開放給研究和教育使用。
PubMedDOI
LLM-CGM: A Benchmark for Large Language Model-Enabled Querying of Continuous Glucose Monitoring Data for Conversational Diabetes Management.
LLM-CGM:用於對話式糖尿病管理之大型語言模型查詢連續血糖監測數據的基準
Pac Symp Biocomput 2025-04-29
Comparison of Large Language Models' Performance on 600 Nuclear Medicine Technology Board Examination-Style Questions.
大型語言模型在600題核醫技術師國家考試題型上的表現比較
J Nucl Med Technol 2025-05-09
3DBench: A scalable benchmark for object and scene-level instruction-tuning of 3D large language models.
3DBench:用於3D大型語言模型物件與場景層級指令微調的可擴展性基準
Neural Netw 2025-05-17
A recent evaluation on the performance of LLMs on radiation oncology physics using questions of randomly shuffled options.
近期以隨機重排選項題目評估大型語言模型(LLMs)於放射腫瘤物理學表現
Front Oncol 2025-06-09