POPGym: Benchmarking Partially Observable Reinforcement Learning

Abstract

Real world applications of Reinforcement Learning (RL) are often partially observable, thus requiring memory. Despite this, partial observability is still largely ignored by contemporary RL benchmarks and libraries. We introduce Partially Observable Process Gym (POPGym), a two-part library containing (1) a diverse collection of 14 partially observable environments, each with multiple difficulties and (2) implementations of 13 memory model baselines – the most in a single RL library. Existing partially observable benchmarks tend to fixate on 3D visual navigation, which is computationally expensive and only one type of many possible POMDPs. In contrast, POPGym environments are diverse, produce smaller observations, use less memory, and often converge within two hours of training on a consumer-grade GPU. We implement our high-level memory API and memory baselines on top of the popular RLlib framework, providing plug-and-play compatibility with various training algorithms, exploration strategies, and distributed training paradigms. Using POPGym, we execute the largest comparison across RL memory models to date.

Publication
In International Conference on Learning Representations (ICLR)
Steven Morad
Steven Morad
PhD Candidate

Steven studies how long-term memory can improve decision making in reinforcement learning. He focuses on arranging collections of memories into graph structures, which he queries using graph neural networks. His research aims to improve the reasoning capabilities of robots, allowing them to solve human-level tasks and learn from and correct mistakes in real-time.

Ryan Kortvelesy
Ryan Kortvelesy
PhD Candidate

Ryan’s work focuses on multi-agent reinforcement learning. He is interested in the credit assignment problem, new graph neural network architectures and explainability (applying symbolic regression to multi-agent systems).

Matteo Bettini
Matteo Bettini
PhD Candidate

Matteo’s research is focused on studying heterogeneity and resilience in multi-agent and multi-robot systems.

Amanda Prorok
Amanda Prorok
Professor

Amanda’s research focuses on multi-agent and multi-robot systems. Our mission is to find new ways of coordinating artificially intelligent agents (e.g., robots, vehicles, machines) to achieve common goals in shared physical and virtual spaces.