From Detection to Deception: Are AI-Generated Image Detectors Adversarially Robust?. Tsai, Y., Xu, R., Mao, C., & Yang, J. <span style="color: #0088cc; font-style: normal">CVPR 2024 Responsible Generative AI Workshop.</span>, 2024.
From Detection to Deception: Are AI-Generated Image Detectors Adversarially Robust? [link]Paper  abstract   bibtex   
Generative models are revolutionizing industries by synthesizing high-quality images, yet they pose societal risks as they are exploited at scale for generating disinformation, propaganda, scams, and phishing attacks. Recent work has developed detectors with remarkable accuracy in identifying images generated by current models, but the robustness of the detectors remains to be explored. This paper investigates the robustness of these detectors against adversarial perturbations designed to elude detection. We observe that an end-to-end adversarial attack on the entire detection pipeline is ineffective due to the long stochastic process of diffusion models. Instead, we create intermediate guidance for the attack at the model internals. Empirical results on both black box and white box attacks demonstrate the importance of our proposed intermediate supervision when constructing the attack. Our results show that our approach can fool the detector, reducing the detection accuracy by up to 69 points in the black-box setting and 91 to 100 points in the white-box setting. In addition, our attack transfers well togenerated images from unknown models, including StyleGAN. Our work suggests that existing AI-generated image detectors are easily deceived by adversarial perturbations, highlighting the need for more robust detectors.
@article{tsai2024detection,
  title = {From Detection to Deception: Are AI-Generated Image Detectors Adversarially Robust?},
  author = {Tsai, Yun-Yun and Xu, Ruize and Mao, Chengzhi and Yang, Junfeng},
  journal = {<span style="color: #0088cc; font-style: normal">CVPR 2024 Responsible Generative AI Workshop.</span>},
  year = {2024},
  url_Paper = {https://drive.google.com/file/d/13-Z0OBPEVs4OizMGW-hKQHvienPQACGP/view},
  abstract={Generative models are revolutionizing industries by synthesizing high-quality images, yet they pose societal risks
as they are exploited at scale for generating disinformation, propaganda, scams, and phishing attacks. Recent work has
developed detectors with remarkable accuracy in identifying images generated by current models, but the robustness of
the detectors remains to be explored. This paper investigates the robustness of these detectors against adversarial perturbations designed to elude detection. We observe that an end-to-end adversarial attack on the entire detection pipeline is ineffective due to the long stochastic process of diffusion models. Instead, we create intermediate guidance for the attack at the model internals. Empirical results on both black box and white box attacks demonstrate the importance of  our proposed intermediate supervision when constructing the attack. Our results show that our approach can fool the detector, reducing the detection accuracy by up to 69 points in the black-box setting and 91 to 100 points in the white-box setting. In addition, our attack transfers well togenerated images from unknown models, including StyleGAN. Our work suggests that existing AI-generated image detectors are easily deceived by adversarial perturbations, highlighting the need for more robust detectors.}
}

Downloads: 0