CIA: Controllable Image Augmentation Framework Based on Stable Diffusion

[en] Computer vision tasks such as object detection and segmentation rely on the availability of extensive, accurately annotated datasets. In this work, We present CIA, a modular pipeline, for (1) generating synthetic images for dataset augmentation using Stable Diffusion, (2) filtering out low quality samples using defined quality metrics, (3) forcing the existence of specific patterns in generated images using accurate prompting and ControlNet. In order to show how CIA can be used to search for an optimal augmentation pipeline of training data, we study human object detection in a data constrained scenario, using YOLOv8n on COCO and Flickr30k datasets. We have recorded significant improvement using CIA-generated images, approaching the performances obtained when doubling the amount of real images in the dataset. Our findings suggest that our modular framework can significantly enhance object detection systems, and make it possible for future research to be done on data-constrained scenarios. The framework is available at: github.com/multitel-ai/CIA.

Disciplines :

Computer science

Author, co-author :

Benkedadra, Mohamed ; Université de Mons - UMONS > Faculté Polytechnique > Service Informatique, Logiciel et Intelligence artificielle

Rimez, Dany; UCLouvain,Louvain-La-Neuve,Belgium

Godelaine, Tiffanie; UCLouvain,Louvain-La-Neuve,Belgium

Chidambaram, Natarajan ; Université de Mons - UMONS > Faculté des Sciences > Service de Génie Logiciel

Khosroshahi, Hamed Razavi; Université libre de Bruxelles,Brussels,Belgium

Tellez, Horacio; Multitel,Mons,Belgium

Mancas, Matei ; Université de Mons - UMONS > Faculté Polytechnique > Service Information, Signal et Intelligence artificielle

Macq, Benoit; UCLouvain,Louvain-La-Neuve,Belgium

Mahmoudi, Sidi ; Université de Mons - UMONS > Faculté Polytechnique > Service Informatique, Logiciel et Intelligence artificielle

Language :

English

Title :

CIA: Controllable Image Augmentation Framework Based on Stable Diffusion

Original title :

[en] CIA: Controllable Image Augmentation Framework Based on Stable Diffusion

Publication date :

07 August 2024

Event name :

IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR) 2025

Event organizer :

IEEE

Event place :

San Jose, United States

Event date :

07/08/2024

Audience :

International

Journal title :

IEEE Access

ISSN :

2169-3536

Publisher :

Institute of Electrical and Electronics Engineers, United States - New Jersey

Pages :

600-606

Peer review/Selection committee :

Peer Reviewed verified by ORBi

Additional URL :

http://xplorestaging.ieee.org/ielx8/10707774/10707775/10707852.pdf?arnumber=10707852

Research unit :

F105 - Information, Signal et Intelligence artificielle
- Information, Signal and Artificial Intelligence
F114 - Informatique, Logiciel et Intelligence artificielle

Research institute :

Infortech
Numediart
R450 - Institut NUMEDIART pour les Technologies des Arts Numériques

Name of the research project :

5443 - ARIAC BY DIGITALWALLONIA4.AI - Applications et Recherche pour une Intelligence Artificielle de Confiance - Région wallonne

Available on ORBi UMONS :

since 21 October 2024

Statistics

Number of views

119 (6 by UMONS)

Number of downloads

0 (0 by UMONS)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

H. Su, J. Deng, and L. Fei-Fei, "Crowdsourcing annotations for visual object detection, " in Workshops at the twenty-sixth AAAI conference on artificial intelligence, Citeseer, 2012.
B. Settles, "Active learning literature survey, " 2009.
M. Xu, S. Yoon, A. Fuentes, and D. S. Park, "A comprehensive survey of image augmentation techniques for deep learning, " Pattern Recognition, vol. 137, p. 109347, 2023.
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "Highresolution image synthesis with latent diffusion models, " in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10684-10695, 2022.
L. Zhang, A. Rao, and M. Agrawala, "Adding conditional control to text-to-image diffusion models, " in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3836-3847, 2023.
C. Shorten and T. M. Khoshgoftaar, "A survey on image data augmentation for deep learning, " Journal of big data, vol. 6, no. 1, pp. 1-48, 2019.
Y. Chen, Y. Li, T. Kong, L. Qi, R. Chu, L. Li, and J. Jia, "Scale-aware automatic augmentation for object detection, " in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9558-9567, 2021.
G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T.-Y. Lin, E. D. Cubuk, Q. V. Le, and B. Zoph, "Simple copy-paste is a strong data augmentation method for instance segmentation, " in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2917-2927, 2021.
T. Ø. Eliassen and Y. Ma, "Data synthesis with stable diffusion for dataset imbalance-computer vision, " 2022.
B. Trabucco, K. Doherty, M. Gurinas, and R. Salakhutdinov, "Effective data augmentation with diffusion models, " arXiv preprint arXiv: 2302. 07944, 2023.
S. Azizi, S. Kornblith, C. Saharia, M. Norouzi, and D. J. Fleet, "Synthetic data from diffusion models improves imagenet classification, " arXiv preprint arXiv: 2304. 08466, 2023.
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database, " in 2009 IEEE conference on computer vision and pattern recognition, pp. 248-255, Ieee, 2009.
Y. Ge, J. Xu, B. Nlong Zhao, L. Itti, and V. Vineet, "Dall-e for detection: Language-driven compositional image synthesis for object detection, " arXiv preprint arXiv: 2206. 09592v3, 2022.
A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, "Zero-shot text-to-image generation, " in International Conference on Machine Learning, pp. 8821-8831, PMLR, 2021.
W. Wu, T. Dai, X. Huang, F. Ma, and J. Xiao, "Image augmentation with controlled diffusion for weakly-supervised semantic segmentation, " arXiv preprint arXiv: 2310. 09760, 2023.
A. Mittal, A. K. Moorthy, and A. C. Bovik, "Blind/referenceless image spatial quality evaluator, " in 2011 conference record of the forty fifth asilomar conference on signals, systems and computers (ASILOMAR), pp. 723-727, IEEE, 2011.
H. Talebi and P. Milanfar, "Nima: Neural image assessment, " IEEE transactions on image processing, vol. 27, no. 8, pp. 3998-4011, 2018.
J. Wang, K. C. Chan, and C. C. Loy, "Exploring clip for assessing the look and feel of images, " Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, pp. 2555-2563, 2023.
Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, "Realtime multi-person 2d pose estimation using part affinity fields, " in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7291-7299, 2017.
I. Grishchenko, A. Ablavatski, Y. Kartynnik, K. Raveendran, and M. Grundmann, "Attention mesh: High-fidelity face mesh prediction in real-time, " arXiv preprint arXiv: 2006. 10962, 2020.
J. Canny, "A computational approach to edge detection, " IEEE Transactions on pattern analysis and machine intelligence, vol. PAMI-8, no. 6, pp. 679-698, 1986.
W. S. Mseddi, R. Ghali, M. Jmal, and R. Attia, "Fire detection and segmentation using yolov5 and u-net, " in 2021 29th European Signal Processing Conference (EUSIPCO), pp. 741-745, IEEE, 2021.
H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. C. Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M.-A. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, and T. Scialom, "Llama 2: Open foundation and fine-tuned chat models, " 2023.
G. Jocher, A. Chaurasia, and J. Qiu, "Yolo by ultralytics, " jan 2023.
G. Jocher, "Yolov8 hyperparameter config files. "
T. Lin, M. Maire, S. J. Belongie, L. D. Bourdev, R. B. Girshick, J. Hays, P. Perona, D. Ramanan, P. Doll'a r, and C. L. Zitnick, "Microsoft COCO: common objects in context, " CoRR, vol. Abs/1405. 0312, 2014.
B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Hockenmaier, and S. Lazebnik, "Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models, " in Proceedings of the IEEE international conference on computer vision, pp. 2641-2649, 2015.
G. Jocher, "Yolov8 data augmentation docs of ultralytics, " Nov 2023.