Towards Human Performance on Sketch-Based Image Retrieval

CNN; Sketch-based image retrieval; Triplet networks; Batch sizes; Embeddings; Human performance; Image database; Large-scales; Model sharing; Normalisation; Sketch-based image retrievals; State of the art; Triplet network; Human-Computer Interaction; Computer Networks and Communications; Computer Vision and Pattern Recognition; Software

Abstract :

[en] Sketch-based image retrieval (SBIR) solutions are attracting increased interest in the field of computer vision. These solutions provide an intuitive and powerful tool to retrieve images in large-scale image databases. In this paper, we conduct a comprehensive study of classic triplet CNN training pipelines within the SBIR context. We study the impact of embeddings normalization, model sharing, margin selection, batch size, hard mining selection and the evolution of the number of hard triplets during training to propose several avenues for improvement. We also propose dropout column, an adaptation of dropout for triplet network and similar pipelines. In addition, we also introduce a novel approach to build state-of-the-art SBIR solutions that can be used with low power systems. The whole study is conducted using The Sketchy Database, a large-scale SBIR database. We carry out a series of experiments and show that adopting a few simple modifications enhances significantly existing SBIR pipelines (faster training & higher accuracy). Our study enables us to propose an enhanced pipeline that outperforms previous state-of-the-art on the Sketchy Database by a significant margin (a recall of 53.92% compared to 46.2% at k = 1) and reaches almost human performance (54.27%) on a large-scale benchmark.

Disciplines :

Computer science

Author, co-author :

Seddati, Omar ; Université de Mons - UMONS > Faculté Polytechnique > Service Information, Signal et Intelligence artificielle

Dupont, Stéphane ; Université de Mons - UMONS > Faculté des Sciences > Service d'Intelligence Artificielle

Mahmoudi, Saïd ; Université de Mons - UMONS > Faculté Polytechnique > Informatique, Logiciel et Intelligence artificielle

Dutoit, Thierry ; Université de Mons - UMONS

Language :

English

Title :

Towards Human Performance on Sketch-Based Image Retrieval

Publication date :

14 September 2022

Event name :

International Conference on Content-based Multimedia Indexing

Event place :

Graz, Aut

Event date :

14-09-2022 => 16-09-2022

By request :

Yes

Audience :

International

Main work title :

Proceedings of 19th International Conference on Content-based Multimedia Indexing, CBMI 2022

Publisher :

Association for Computing Machinery

ISBN/EAN :

978-1-4503-9720-9

Peer review/Selection committee :

Peer reviewed

Additional URL :

https://dl.acm.org/doi/pdf/10.1145/3549555.3549582

Research unit :

S841 - Artificial Intelligence

Research institute :

Numediart

Available on ORBi UMONS :

since 10 November 2022

Statistics

Number of views

108 (5 by UMONS)

Number of downloads

1 (1 by UMONS)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Ayan Kumar Bhunia, Pinaki Nath Chowdhury, Aneeshan Sain, Yongxin Yang, Tao Xiang, and Yi-Zhe Song. 2021. More Photos are All You Need: Semi-Supervised Learning for Fine-Grained Sketch Based Image Retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4247–4256.
Tu Bui, Leonardo Ribeiro, Moacir Ponti, and John Collomosse. 2016. Generalisation and sharing in triplet convnets for sketch based visual search. arXiv preprint arXiv:1611.05301 (2016).
Xiaochun Cao, Hua Zhang, Si Liu, Xiaojie Guo, and Liang Lin. 2013. Sym-fish: A symmetry-aware flip invariant sketch histogram shape descriptor. In Proceedings of the IEEE International Conference on Computer Vision. 313–320.
Ushasi Chaudhuri, Biplab Banerjee, Avik Bhattacharya, and Mihai Datcu. 2020. CrossATNet-a novel cross-attention based framework for sketch-based image retrieval. Image and Vision Computing 104 (2020), 104003.
Yangdong Chen, Zhaolong Zhang, Yanfei Wang, Yuejie Zhang, Rui Feng, Tao Zhang, and Weiguo Fan. 2022. AE-Net: Fine-grained sketch-based image retrieval via attention-enhanced network. Pattern Recognition 122 (2022), 108291.
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.
John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research 12, 7 (2011).
Albert Gordo, Jon Almazan, Jerome Revaud, and Diane Larlus. 2017. End-to-end learning of deep visual representations for image retrieval. International Journal of Computer Vision 124, 2 (2017), 237–254.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097–1105.
Joseph J LaViola Jr and Robert C Zeleznik. 2007. MathPad 2: a system for the creation and exploration of mathematical sketches. In ACM SIGGRAPH 2007 courses. ACM, 46.
Yi Li, Yi-Zhe Song, and Shaogang Gong. 2013. Sketch Recognition by Ensemble Matching of Structured Features.. In BMVC, Vol. 1. 2.
Hangyu Lin, Yanwei Fu, Peng Lu, Shaogang Gong, Xiangyang Xue, and Yu-Gang Jiang. 2019. Tc-net for isbir: Triplet classification network for instance-level sketch based image retrieval. In Proceedings of the 27th ACM International Conference on Multimedia. 1676–1684.
Peng Lu, Hangyu Lin, Yanwei Fu, Shaogang Gong, Yu-Gang Jiang, and Xiangyang Xue. 2018. Instance-level Sketch-based Retrieval by Deep Triplet Classification Siamese Network. arXiv preprint arXiv:1811.11375 (2018).
Liangchen Luo, Yuanhao Xiong, Yan Liu, and Xu Sun. 2019. Adaptive gradient methods with dynamic bound of learning rate. arXiv preprint arXiv:1902.09843 (2019).
Tom Y Ouyang and Randall Davis. 2011. ChemInk: a natural real-time recognition system for chemical drawings. In Proceedings of the 16th international conference on Intelligent user interfaces. ACM, 267–276.
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Cur-ran Associates, Inc., 8024–8035. http://papers.neurips.cc/paper/9015-pytorchan-imperative-style-high-performance-deep-learning-library.pdf
Yonggang Qi, Yi-Zhe Song, Honggang Zhang, and Jun Liu. 2016. Sketch-based image retrieval via siamese convolutional neural network. In 2016 IEEE International Conference on Image Processing (ICIP). IEEE, 2460–2464.
Filip Radenovic, Giorgos Tolias, and Ondrej Chum. 2018. Deep shape matching. In Proceedings of the european conference on computer vision (eccv). 751–767.
Herbert Robbins and Sutton Monro. 1951. A stochastic approximation method. The annals of mathematical statistics (1951), 400–407.
Aneeshan Sain, Ayan Kumar Bhunia, Yongxin Yang, Tao Xiang, and Yi-Zhe Song. 2021. Stylemeup: Towards style-agnostic sketch-based image retrieval. In ProceedingsoftheIEEE/CVFConferenceonComputerVisionandPatternRecognition. 8504–8513.
Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG) 35, 4 (2016), 119.
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition. 815–823.
Omar Seddati, Stéphane Dupont, and Saïd Mahmoudi. 2016. DeepSketch2Image: deep convolutional neural networks for partial sketch recognition and image retrieval. In Proceedings of the 24th ACM international conference on Multimedia. 739–741.
Omar Seddati, Stéphane Dupont, and Saïd Mahmoudi. 2017. Quadruplet networks for sketch-based image retrieval. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. 184–191.
Omar Seddati, Stéphane Dupont, and Saïd Mahmoudi. 2017. Triplet networks feature masking for sketch-based image retrieval. In International Conference Image Analysis and Recognition. Springer, 296–303.
Omar Seddati, Stéphane Dupont, Saïd Mahmoudi, and Mahnaz Parian. 2017. Towards good practices for image retrieval based on CNN features. In Proceedings of the IEEE international conference on computer vision workshops. 1246–1255.
Evgeny Smirnov, Aleksandr Melnikov, Sergey Novoselov, Eugene Luckyanets, and Galina Lavrentyeva. 2017. Doppelganger mining for face representation learning. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 1916–1923.
Jifei Song, Qian Yu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017. Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In Proceedings of the IEEE international conference on computer vision. 5551–5560.
Yuxin Song, Jianjun Lei, Bo Peng, Kaifu Zheng, Bolan Yang, and Yalong Jia. 2019. Edge-guided cross-domain learning with shape regression for sketch-based image retrieval. IEEE Access 7 (2019), 32393–32399.
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1–9.
Tijmen Tieleman, Geoffrey Hinton, et al. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2 (2012), 26–31.
Giorgos Tolias, Ronan Sicre, and Hervé Jégou. 2015. Particular object retrieval with integral max-pooling of CNN activations. arXiv preprint arXiv:1511.05879 (2015).
Yanfei Wang, Fei Huang, Yuejie Zhang, Rui Feng, Tao Zhang, and Weiguo Fan. 2020. Deep cascaded cross-modal correlation learning for fine-grained sketch-based image retrieval. Pattern Recognition 100 (2020), 107148.
Chao-Yuan Wu, R Manmatha, Alexander J Smola, and Philipp Krahenbuhl. 2017. Sampling matters in deep embedding learning. In Proceedings of the IEEE International Conference on Computer Vision. 2840–2848.
Kemal Tugrul Yesilbek, Cansu Sen, Serike Cakmak, and T Metin Sezgin. 2015. SVM-based sketch recognition: which hyperparameter interval to try?. In Proceedings of the workshop on Sketch-Based Interfaces and Modeling. Eurographics Association, 117–121.
Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M Hospedales, and Chen-Change Loy. 2016. Sketch me that shoe. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 799–807.
Xianlin Zhang, Mengling Shen, Xueming Li, and Fangxiang Feng. 2021. A deformable CNN-based triplet model for Fine-Grained Sketch-based Image Retrieval. Pattern Recognition (2021), 108508.