Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

Abstract

Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo- realistic images and giving rise to a multitude of appli- cations. However, the effective integration of T2I mod- els into fundamental image classification tasks remains an open question. A prevalent strategy to bolster image clas- sification performance is through augmenting the training set with synthetic images generated by T2I models. In this study, we scrutinize the shortcomings of both current gener- ative and conventional data augmentation techniques. Our analysis reveals that these methods struggle to produce im- ages that are both faithful (in terms of foreground objects) and diverse (in terms of background contexts) for domain- specific concepts. To tackle this challenge, we introduce an innovative inter-class data augmentation method known as Diff-Mix,, which enriches the dataset by performing image translations between classes. Our empirical results demon- strate that Diff-Mix achieves a better balance between faith- fulness and diversity, leading to a marked improvement in performance across diverse image classification scenarios, including few-shot, conventional, and long-tail classifica- tions for domain-specific datasets.

Publication
In CVPR 2024

Citation:

@misc{wang2024enhance,
      title={Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model}, 
      author={Zhicai Wang and Longhui Wei and Tan Wang and Heyu Chen and Yanbin Hao and Xiang Wang and Xiangnan He and Qi Tian},
      year={2024},
      eprint={2403.19600},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Zhicai Wang
Zhicai Wang
王志才
Yanbin Hao
Yanbin Hao
郝艳宾 副研究员
Xiang Wang
Xiang Wang
王翔 教授
Xiangnan He
Xiangnan He
何向南 教授