A. Rupam Mahmood
Título
Citado por
Citado por
Año
An Emphatic Approach to the Problem of Off-policy Temporal-Difference Learning
RS Sutton, AR Mahmood, M White
Journal of Machine Learning Research 17, 2016
1572016
Weighted importance sampling for off-policy learning with linear function approximation
AR Mahmood, H van Hasselt, RS Sutton
Advances in Neural Information Processing Systems 27, 2014
1062014
True Online Temporal-Difference Learning
H van Seijen, AR Mahmood, PM Pilarski, MC Machado, RS Sutton
Journal of Machine Learning Research 17, 2016
792016
Benchmarking Reinforcement Learning Algorithms on Real-World Robots
AR Mahmood, D Korenkevych, G Vasan, W Ma, J Bergstra
Proceedings of the 2nd Annual Conference on Robot Learning (CoRL), 2018
682018
Tuning-free step-size adaptation
AR Mahmood, RS Sutton, T Degris, PM Pilarski
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International …, 2012
452012
Off-policy TD (λ) with a true online equivalence
H van Hasselt, AR Mahmood, RS Sutton
Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence …, 2014
372014
A new Q (λ) with interim forward view and Monte Carlo equivalence
RS Sutton, AR Mahmood, D Precup, M CA, H van Hasselt, U CA
362014
Multi-step Off-policy Learning Without Importance Sampling Ratios
AR Mahmood, H Yu, RS Sutton
arXiv preprint arXiv:1702.03006, 2017
322017
Setting up a reinforcement learning task with a real-world robot
AR Mahmood, D Korenkevych, BJ Komer, J Bergstra
2018 IEEE/RSJ International Conference on Intelligent Robots and Systems …, 2018
282018
Off-policy learning based on weighted importance sampling with linear computational complexity
AR Mahmood, RS Sutton
Proceedings of the 31st Conference on Uncertainty in Artificial Intelligence …, 2015
262015
Representation Search through Generate and Test
AR Mahmood, RS Sutton
Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence, 2013
262013
Emphatic temporal-difference learning
AR Mahmood, H Yu, M White, RS Sutton
In European Workshops on Reinforcement Learning, 2015
252015
On generalized bellman equations and temporal-difference learning
H Yu, AR Mahmood, RS Sutton
The Journal of Machine Learning Research 19 (1), 1864-1912, 2018
192018
Autoregressive Policies for Continuous Control Deep Reinforcement Learning
D Korenkevych, AR Mahmood, G Vasan, J Bergstra
Proceedings of the 28th International Joint Conference on Artificial …, 2019
122019
Incremental Off-policy Reinforcement Learning Algorithms
A Mahmood
University of Alberta, 2017
122017
Structure Learning of Causal Bayesian Networks: A Survey
A Mahmood
Department of Computing Science, University of Alberta, Edmonton, Canada …, 2011
72011
Automatic step-size adaptation in incremental supervised learning
A Mahmood
University of Alberta, 2010
72010
Heteroscedastic Uncertainty for Robust Generative Latent Dynamics
O Limoyo, B Chan, F Marić, B Wagstaff, AR Mahmood, J Kelly
IEEE Robotics and Automation Letters 5 (4), 6654-6661, 2020
12020
An Empirical Evaluation of True Online TD (λ)
H van Seijen, AR Mahmood, PM Pilarski, RS Sutton
arXiv preprint arXiv:1507.00353, 2015
12015
Greedification Operators for Policy Optimization: Investigating Forward and Reverse KL Divergences
A Chan, H Silva, S Lim, T Kozuno, AR Mahmood, M White
arXiv preprint arXiv:2107.08285, 2021
2021
El sistema no puede realizar la operación en estos momentos. Inténtalo de nuevo más tarde.
Artículos 1–20