Deakin University
Browse

File(s) under permanent embargo

Reinforcement learning of pareto-optimal multiobjective policies using steering

conference contribution
posted on 2015-01-01, 00:00 authored by P Vamplew, R Issabekov, Richard DazeleyRichard Dazeley, C Foale
There has been little research into multiobjective reinforcement learning (MORL) algorithms using stochastic or non-stationary policies, even though such policies may Pareto-dominate deterministic stationary policies. One approach is steering which forms a nonstationary combination of deterministic stationary base policies. This paper presents two new steering algorithms designed for the task of learning Pareto-optimal policies. The first algorithm (w-steering) is a direct adaptation of previous approaches to steering, and therefore requires prior knowledge of recurrent states which are guaranteed to be revisited. The second algorithm (Q-steering) eliminates this requirement. Empirical results show that both algorithms perform well when given knowledge of recurrent states, but that Q-steering provides substantial performance improvements over w-steering when this knowledge is not available.

History

Event

Australian Computer Society. Conference (28th : 2015 : Canberra, A.C.T.)

Volume

9457

Series

Australian Computer Society Conference

Pagination

596 - 608

Publisher

Springer

Location

Canberra, A.C.T.

Place of publication

Cham, Switzerland

Start date

2015-11-30

End date

2015-12-04

ISSN

0302-9743

eISSN

1611-3349

ISBN-13

9783319263496

Language

eng

Publication classification

E Conference publication; E1.1 Full written paper - refereed

Copyright notice

2015, Springer International Publishing Switzerland

Editor/Contributor(s)

P Pfahringer, J Renz

Title of proceedings

AI 2015 : Proceedings of the 28th Australasian Joint Conference on Artificial Intelligence 2015