MultiRobustBench is a standardized benchmark for evaluating adversarial robustness against multiple attacks. MultiRobustBench currently evaluates and ranks models based on performance on a set of 9 different attacks (L1, L2, Linf, Elastic, L1 JPEG, Linf JPEG, ReColor, StAdv, and LPIPS) at 20 different attack strengths. We provide 2 leaderboards for the CIFAR-10 dataset: one with rankings based on average competitiveness ratio (CRind-avg in paper) for measuring average multiattack robustness and the other with rankings based on worst-case CR (CRind-avg in paper) for measuring worst-case multiattack robustness. Users can toggle between these 2 leaderboards via the "Leaderboard selection" menu. Our leaderboards also report stability constant (SC) computed on this set of attacks. We note that higher CR indicates better performance while lower SC indicates better performance (although SC is best used only when comparing defenses which use the same training threat model).
MultiRobustBench offers the following additional features:
button next to entry of the defense of interest. These visualizations include a plot of
the defense accuracy compared to training on each attack individually using adversarial training, defense accuracy as perturbation size increases for a selected
attack type, a comparison of CR-in (CR computed on seen attacks) and CR-out (CR computed on unseen attacks) scores, and CR computed across each individual attack type.Contribute to MultiRobustBench: To add a new defense or attack to the MultiRobustBench leaderboard, please follow the steps present here.
| Rank | Defense | Clean Acc | CR ‣ |
SC ‣ |
% Attacks Seen | PetaFLOPs | Extra Real Data | Architecture | Select | |
|---|---|---|---|---|---|---|---|---|---|---|
| Rank | Defense | Clean Acc |
CR ‣ |
SC ‣ |
% Attacks Seen |
PetaFLOPs | Extra Real Data |
Architecture | Select |