Deakin University
Browse

Multi-UAV pursuit-evasion gaming based on PSO-M3DDPG schemes

journal contribution
posted on 2024-07-11, 05:48 authored by Y Zhang, M Ding, J Zhang, Q Yang, G Shi, M Lu, Frank JiangFrank Jiang
AbstractThe sample data for reinforcement learning algorithms often exhibit sparsity and instability, making the training results susceptible to falling into local optima. Mini-Max-Multi-agent Deep Deterministic Policy Gradient (M3DDPG) algorithm is a multi-agent reinforcement learning algorithm, which introduces the minimax theorem into Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. It also has unstable convergence caused by sparse sample data and randomization. However, the Particle Swarm Optimisation (PSO) algorithm, unlike traditional reinforcement learning methods, involves the construction of independent populations of policy networks to generate sample data, followed by training the reinforcement learning algorithm. PSO optimizes and updates the policy population based on a fitness function, aiming to enhance the efficiency and convergence speed of the algorithm in learning from the sample data. In order to address the multi-agent pursuit-evasion problem, we propose the PSO-M3DDPG algorithm, which combines the PSO algorithm with the M3DDPG algorithm. Through experimental simulations, the improved algorithm demonstrates superior training results and faster convergence speeds, thus validating its effectiveness.

History

Journal

Complex and Intelligent Systems

Pagination

1-17

Location

Berlin, Germany

ISSN

2199-4536

eISSN

2198-6053

Language

eng

Publication classification

C1.1 Refereed article in a scholarly journal

Publisher

Springer

Usage metrics

    Research Publications

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC