USENIX Security 2025

CAMP in the Odyssey: Provably Robust Reinforcement Learning with Certified Radius Maximization

Certified-radius-maximizing policy training for robust deep reinforcement learning agents.

Derui Wang^1,2, Kristen Moore^1,2, Diksha Goel^1,2, Minjune Kim^1,2, Gang Li³, Yang Li³, Robin Doss³, Minhui Xue^1,2, Bo Li⁴, Seyit Camtepe^1,2, Liming Zhu¹

¹CSIRO's Data61, Australia ²Cyber Security Cooperative Research Centre, Australia ³Deakin University, Australia ⁴University of Chicago, USA

Paper Code Zenodo

CAMP robust reinforcement learning overview — CAMP optimizes policies with certified-radius-aware objectives to improve the robustness-return trade-off.

Highlights

CAMP improves certified robustness for deep reinforcement learning by directly optimizing a training objective connected to certified radius.

Certified-radius maximization Trains policies with a surrogate loss derived from local certified radii.

Policy imitation Stabilizes certified-radius-aware optimization during reinforcement learning training.

Better trade-off Improves certified expected return, reaching up to twice the certified return of baselines.

Abstract

We introduce Certified-rAdius-Maximizing Policy (CAMP) training for certifiably robust deep reinforcement learning agents. CAMP improves the robustness-return trade-off by optimizing policies with a surrogate loss derived from certified-radius maximization.

The key insight is that the global certified radius can be derived from local certified radii based on training-time statistics. CAMP uses this relationship during training and introduces policy imitation to stabilize optimization.

Method

Local-to-global certification

CAMP links local certified radii observed during training to the global certified radius, allowing robustness to be optimized through a tractable policy-training objective.

Stable robust policy learning

Policy imitation provides a stabilizing signal, helping CAMP train agents that preserve utility while improving provable robustness.

BibTeX

@inproceedings{wang2025camp,
  title={CAMP in the Odyssey: Provably Robust Reinforcement Learning with Certified Radius Maximization},
  author={Wang, Derui and Moore, Kristen and Goel, Diksha and Kim, Minjune and Li, Gang and Li, Yang and Doss, Robin and Xue, Minhui and Li, Bo and Camtepe, Seyit and Zhu, Liming},
  booktitle={USENIX Security Symposium},
  year={2025}
}