Philip Thomas
TitleCited byYear
Value function approximation in reinforcement learning using the Fourier basis
G Konidaris, S Osentoski, P Thomas
Twenty-fifth AAAI conference on artificial intelligence, 2011
2332011
Data-efficient off-policy policy evaluation for reinforcement learning
P Thomas, E Brunskill
International Conference on Machine Learning, 2139-2148, 2016
1132016
High-confidence off-policy evaluation
PS Thomas, G Theocharous, M Ghavamzadeh
Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015
892015
High confidence policy improvement
P Thomas, G Theocharous, M Ghavamzadeh
International Conference on Machine Learning, 2380-2388, 2015
642015
Increasing the action gap: New operators for reinforcement learning
MG Bellemare, G Ostrovski, A Guez, PS Thomas, R Munos
Thirtieth AAAI Conference on Artificial Intelligence, 2016
572016
Bias in natural actor-critic algorithms
P Thomas
International conference on machine learning, 441-448, 2014
522014
Personalized ad recommendation systems for life-time value optimization with guarantees
G Theocharous, PS Thomas, M Ghavamzadeh
Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015
492015
Application of the actor-critic architecture to functional electrical stimulation control of a human arm
PS Thomas, A van den Bogert, K Jagodnik, M Branicky
Twenty-First IAAI Conference, 2009
302009
Safe reinforcement learning
PS Thomas
University of Massachusetts Libraries, 2015
292015
Proximal reinforcement learning: A new theory of sequential decision making in primal-dual spaces
S Mahadevan, B Liu, P Thomas, W Dabney, S Giguere, N Jacek, I Gemp, ...
arXiv preprint arXiv:1405.6757, 2014
292014
TD_gamma: Re-evaluating Complex Backups in Temporal Difference Learning
G Konidaris, S Niekum, PS Thomas
Advances in Neural Information Processing Systems, 2402-2410, 2011
202011
Projected natural actor-critic
PS Thomas, WC Dabney, S Giguere, S Mahadevan
Advances in neural information processing systems, 2337-2345, 2013
162013
Motor primitive discovery
PS Thomas, AG Barto
2012 IEEE International Conference on Development and Learning and …, 2012
162012
Conjugate Markov Decision Processes
P Thomas, A Barto
International Conference on Machine Learning, 137-144, 2011
162011
Importance Sampling for Fair Policy Selection.
S Doroudi, PS Thomas, E Brunskill
Grantee Submission, 2017
132017
Using options and covariance testing for long horizon off-policy policy evaluation
Z Guo, PS Thomas, E Brunskill
Advances in Neural Information Processing Systems, 2492-2501, 2017
132017
A proportional derivative FES controller for planar arm movement
K Jagodnik, A Van Den Bogert
target 1 (2), 10, 2007
132007
Policy gradient coagent networks
PS Thomas
Advances in Neural Information Processing Systems, 1944-1952, 2011
122011
Training an actor-critic reinforcement learning controller for arm movement using human-generated rewards
KM Jagodnik, PS Thomas, AJ van den Bogert, MS Branicky, RF Kirsch
IEEE Transactions on Neural Systems and Rehabilitation Engineering 25 (10 …, 2017
112017
Policy evaluation using the Ω-return
PS Thomas, S Niekum, G Theocharous, G Konidaris
Advances in Neural Information Processing Systems, 334-342, 2015
112015
The system can't perform the operation now. Try again later.
Articles 1–20