There has been little research into multiobjective reinforcement learning (MORL) algorithms using stochastic or non-stationary policies, even though such policies may Pareto-dominate deterministic stationary policies. One approach is steering which forms a nonstationary combination of deterministic stationary base policies. This paper presents two new steering algorithms designed for the task of learning Pareto-optimal policies. The first algorithm (w-steering) is a direct adaptation of previous approaches to steering, and therefore requires prior knowledge of recurrent states which are guaranteed to be revisited. The second algorithm (Q-steering) eliminates this requirement. Empirical results show that both algorithms perform well when given knowledge of recurrent states, but that Q-steering provides substantial performance improvements over w-steering when this knowledge is not available.
History
Volume
9457
Pagination
596-608
Location
Canberra, A.C.T.
Start date
2015-11-30
End date
2015-12-04
ISSN
0302-9743
eISSN
1611-3349
ISBN-13
9783319263496
Language
eng
Publication classification
E Conference publication, E1.1 Full written paper - refereed
Copyright notice
2015, Springer International Publishing Switzerland
Editor/Contributor(s)
Pfahringer P, Renz J
Title of proceedings
AI 2015 : Proceedings of the 28th Australasian Joint Conference on Artificial Intelligence 2015
Event
Australian Computer Society. Conference (28th : 2015 : Canberra, A.C.T.)