Authors:
(1) Seokil Ham, KAIST;
(2) Jungwuk Park, KAIST;
(3) Dong-Jun Han, Purdue University;
(4) Jaekyun Moon, KAIST.
3. Proposed NEO-KD Algorithm and 3.1 Problem Setup: Adversarial Training in Multi-Exit Networks
4. Experiments and 4.1 Experimental Setup
4.2. Main Experimental Results
4.3. Ablation Studies and Discussions
5. Conclusion, Acknowledgement and References
B. Clean Test Accuracy and C. Adversarial Training via Average Attack
E. Discussions on Performance Degradation at Later Exits
F. Comparison with Recent Defense Methods for Single-Exit Networks
G. Comparison with SKD and ARD and H. Implementations of Stronger Attacker Algorithms
As can be seen from the results for the anytime prediction in the main manuscript, the adversarial test accuracy of the later exits is sometimes lower than the performance of earlier exits. This phenomenon can be explained as follows: In general, we observed via experiments that adversarial examples targeting later exits has the higher sum of losses from all exits compared to adversarial examples targeting earlier exits. This makes max-average or average attack mainly focus on attacking the later exits, leading to low adversarial test accuracy at later exits. The performance of later exits can be improved by adopting the ensemble strategy as in the main manuscript for the budgeted prediction setup.
This paper is available on arxiv under CC 4.0 license.