An Off-Policy Trust Region Policy Optimization Method With Monotonic Improvement Guarantee for Deep Reinforcement Learning.
一種具有單調改進保證的深度強化學習離策略信任區域優化方法。
IEEE Trans Neural Netw Learn Syst 2022-05-03
Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation.
透過適應性卡爾曼時間差異和後繼表徵的多智能體強化學習。
Sensors (Basel) 2022-04-01
Asynchronous Deep Double Dueling Q-learning for trading-signal execution in limit order book markets.
在限價訂單簿市場中,用於交易信號執行的異步深度雙重對決 Q-learning。
Front Artif Intell 2023-10-31