USENIX Security 2025

CAMP in the Odyssey: Provably Robust Reinforcement Learning with Certified Radius Maximization

Certified-radius-maximizing policy training for robust deep reinforcement learning agents.

Derui Wang1,2, Kristen Moore1,2, Diksha Goel1,2, Minjune Kim1,2, Gang Li3, Yang Li3, Robin Doss3, Minhui Xue1,2, Bo Li4, Seyit Camtepe1,2, Liming Zhu1

1CSIRO's Data61, Australia 2Cyber Security Cooperative Research Centre, Australia 3Deakin University, Australia 4University of Chicago, USA

CAMP robust reinforcement learning overview
CAMP optimizes policies with certified-radius-aware objectives to improve the robustness-return trade-off.

Highlights

CAMP improves certified robustness for deep reinforcement learning by directly optimizing a training objective connected to certified radius.

Certified-radius maximization Trains policies with a surrogate loss derived from local certified radii.
Policy imitation Stabilizes certified-radius-aware optimization during reinforcement learning training.
Better trade-off Improves certified expected return, reaching up to twice the certified return of baselines.

Abstract

We introduce Certified-rAdius-Maximizing Policy (CAMP) training for certifiably robust deep reinforcement learning agents. CAMP improves the robustness-return trade-off by optimizing policies with a surrogate loss derived from certified-radius maximization.

The key insight is that the global certified radius can be derived from local certified radii based on training-time statistics. CAMP uses this relationship during training and introduces policy imitation to stabilize optimization.

Method

Local-to-global certification

CAMP links local certified radii observed during training to the global certified radius, allowing robustness to be optimized through a tractable policy-training objective.

Stable robust policy learning

Policy imitation provides a stabilizing signal, helping CAMP train agents that preserve utility while improving provable robustness.

BibTeX

@inproceedings{wang2025camp,
  title={CAMP in the Odyssey: Provably Robust Reinforcement Learning with Certified Radius Maximization},
  author={Wang, Derui and Moore, Kristen and Goel, Diksha and Kim, Minjune and Li, Gang and Li, Yang and Doss, Robin and Xue, Minhui and Li, Bo and Camtepe, Seyit and Zhu, Liming},
  booktitle={USENIX Security Symposium},
  year={2025}
}