Out-of-distribution generalization via composition: A lens through induction heads in Transformers.
透過組合實現超出分佈的泛化:從變壓器中的歸納頭的角度來看。
Proc Natl Acad Sci U S A 2025-02-07
2-D Transformer: Extending Large Language Models to Long-Context With Few Memory.
2-D Transformer:擴展大型語言模型以應對長上下文與少量記憶。
IEEE Trans Neural Netw Learn Syst 2025-03-21
Behavioral Dynamics Analysis in Language Education: Generative State Transitions and Attention Mechanisms.
語言教育中的行為動態分析:生成狀態轉換與注意機制。
Behav Sci (Basel) 2025-03-28