Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo- realistic images and giving rise to a multitude of appli- cations. However, the effective integration of T2I mod- els into fundamental image classification tasks remains an open question. A prevalent strategy to bolster image clas- sification performance is through augmenting the training set with synthetic images generated by T2I models. In this study, we scrutinize the shortcomings of both current gener- ative and conventional data augmentation techniques. Our analysis reveals that these methods struggle to produce im- ages that are both faithful (in terms of foreground objects) and diverse (in terms of background contexts) for domain- specific concepts. To tackle this challenge, we introduce an innovative inter-class data augmentation method known as Diff-Mix,, which enriches the dataset by performing image translations between classes. Our empirical results demon- strate that Diff-Mix achieves a better balance between faith- fulness and diversity, leading to a marked improvement in performance across diverse image classification scenarios, including few-shot, conventional, and long-tail classifica- tions for domain-specific datasets.
Citation:
@misc{wang2024enhance,
title={Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model},
author={Zhicai Wang and Longhui Wei and Tan Wang and Heyu Chen and Yanbin Hao and Xiang Wang and Xiangnan He and Qi Tian},
year={2024},
eprint={2403.19600},
archivePrefix={arXiv},
primaryClass={cs.CV}
}