MultiRobustBench

MultiRobustBench is a standardized benchmark for evaluating adversarial robustness against multiple attacks. MultiRobustBench currently evaluates and ranks models based on performance on a set of 9 different attacks (L1, L2, Linf, Elastic, L1 JPEG, Linf JPEG, ReColor, StAdv, and LPIPS) at 20 different attack strengths. We provide 2 leaderboards for the CIFAR-10 dataset: one with rankings based on average competitiveness ratio (CRind-avg in paper) for measuring average multiattack robustness and the other with rankings based on worst-case CR (CRind-avg in paper) for measuring worst-case multiattack robustness. Users can toggle between these 2 leaderboards via the "Leaderboard selection" menu. Our leaderboards also report stability constant (SC) computed on this set of attacks. We note that higher CR indicates better performance while lower SC indicates better performance (although SC is best used only when comparing defenses which use the same training threat model).

MultiRobustBench offers the following additional features:

Contribute to MultiRobustBench: To add a new defense or attack to the MultiRobustBench leaderboard, please follow the steps present here.

Rank Defense Clean Acc CR
SC
% Attacks Seen PetaFLOPs Extra Real Data Architecture Select
Rank Defense Clean
Acc
CR
SC
% Attacks
Seen
PetaFLOPs Extra Real
Data
Architecture Select