Argus: Leveraging Multiview Images for Improved 3-D Scene Understanding With Large Language Models.
Argus:結合多視角影像與大型語言模型以提升3D場景理解
IEEE Trans Neural Netw Learn Syst 2025-06-25
MGFusion: a multimodal large language model-guided information perception for infrared and visible image fusion.
MGFusion:一種多模態大型語言模型引導的紅外與可見光影像融合信息感知。
Front Neurorobot 2025-01-07
LLMER: Crafting Interactive Extended Reality Worlds with JSON Data Generated by Large Language Models.
LLMER:利用大型語言模型生成的 JSON 數據創建互動擴展現實世界。
IEEE Trans Vis Comput Graph 2025-03-10
3DBench: A scalable benchmark for object and scene-level instruction-tuning of 3D large language models.
3DBench:用於3D大型語言模型物件與場景層級指令微調的可擴展性基準
Neural Netw 2025-05-17
A Multimodal Large Language Model Framework for Intelligent Perception and Decision-Making in Smart Manufacturing.
智慧製造中用於智能感知與決策的多模態大型語言模型框架
Sensors (Basel) 2025-05-28
When language and vision meet road safety: Leveraging multimodal large language models for video-based traffic accident analysis.
當語言與視覺相遇於道路安全:運用多模態大型語言模型進行基於影片的交通事故分析
Accid Anal Prev 2025-06-05
CAT+: Investigating and Enhancing Audio-visual Understanding in Large Language Models.
CAT+:探討與提升大型語言模型的視聽理解能力
IEEE Trans Pattern Anal Mach Intell 2025-06-25