Decentralized partially observable Markov decision processes (Dec-POMDPs) are a framework for modeling decision-making in multi-agent systems where each agent has limited information about the environment. The traditional approach to solving Dec-POMDPs involves centralized training for decentralized execution, which can be computationally expensive. The sequential central planning approach offers a more scalable alternative by allowing a central planner to reason about sequential-move statistics rather than simultaneous-move ones. This approach leverages the Bellman's principle of optimality and introduces three new properties: reasoning with sequential-move statistics, proving that epsilon-optimal value functions are piecewise linear and convex, and reducing the complexity of backup operators from double exponential to polynomial. This paradigm enables the use of single-agent methods, such as the SARSA algorithm, while preserving convergence guarantees. Experiments have shown that this approach outperforms epsilon-optimal simultaneous-move solvers, making it a promising direction for efficient planning and reinforcement learning in multi-agent systems.
SARSA algorithm, Bellman's principle of optimality
Not specified
Not specified
Epsilon-optimal value functions
Not specified
No
No
Scalability, reduced complexity, single-agent method compatibility
No
Not specified
Not specified
Not specified
Not specified
Not specified
Not specified
No
Not specified
Not specified
Not specified
Not specified
Not specified
Not specified
Not specified
Not specified
Not specified
Not specified
Not specified
Not specified
Not specified
Not specified
Not specified
Not specified
No
Not specified
Not specified
No
Not specified
Not specified
Not specified
Not specified
Not specified
No
Not specified
Not specified
0.00
Not specified
Not specified
01/01/1970
01/01/1970
Not specified
Not specified
Yes