Skip to content

Significant and fundamental flaws in methodology, analysis, and conclusions #13

@carlini

Description

@carlini

This framework is designed to "systematically evaluate the existing adversarial attack and defense methods". The research community would be well served by such an analysis. When new defenses are proposed, authors must choose which set of attacks to apply in order to perform an evaluation. A systematic evaluation of which attacks have been most effective in the past could help inform the decision of which attacks should be tried in the future. Similarly, when designing new attacks, a comprehensive review of defenses could help researchers decide which defenses to test against.

Unfortunately, the analysis performed in the DeepSec paper is fundamentally flawed and does not achieve any of these goals. It neither accurately measures the power of attacks not measures the efficacy of defenses. I have filed a number of issues that summarizes the many ways in which the report is misleading in its methodology and analysis. (Almost all of the conclusions are misleading as a result of these other flaws. I do not make comments on the conclusions but I expect they will need to be completely re-written once true results are obtained.)

The issues raised are ordered roughly by importance:

#1 Attacks are not run on defenses in an all-pairs manner
#2 Paper uses averages instead of the minimum for security analysis
#3 FGSM implementation is incorrect
#4 PGD adversarial training is implemented incorrectly
#5 Computing the average over different threat models is meaningless
#6 Comparing attack effectiveness is done incorrectly
#7 Epsilon values studied are too large to be meaningful
#8 Detection defenses set per-attack thresholds
#9 Attack success rate decreases with distortion bound
#10 Reporting success rate of unbounded attacks is meaningless
#11 Paper does not report attack success rate for targeted adversarial examples
#12 Discrepancies between tables, text, and code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions