DeepRare: Generic Unsupervised Visual Attention Models

deep features; eye tracking; odd one out; rarity; saliency; visibility; visual attention prediction; Control and Systems Engineering; Signal Processing; Hardware and Architecture; Computer Networks and Communications; Electrical and Electronic Engineering

Abstract :

[en] Visual attention selects data considered as “interesting” by humans, and it is modeled in the field of engineering by feature-engineered methods finding contrasted/surprising/unusual image data. Deep learning drastically improved the models efficiency on the main benchmark datasets. However, Deep Neural Networks-based (DNN-based) models are counterintuitive: surprising or unusual data are by definition difficult to learn because of their low occurrence probability. In reality, DNN-based models mainly learn top-down features such as faces, text, people, or animals which usually attract human attention, but they have low efficiency in extracting surprising or unusual data in the images. In this article, we propose a new family of visual attention models called DeepRare and especially DeepRare2021 (DR21), which uses the power of DNNs’ feature extraction and the genericity of feature-engineered algorithms. This algorithm is an evolution of a previous version called DeepRare2019 (DR19) based on this common framework. DR21 (1) does not need any additional training other than the default ImageNet training, (2) is fast even on CPU, (3) is tested on four very different eye-tracking datasets showing that DR21 is generic and is always within the top models on all datasets and metrics while no other model exhibits such a regularity and genericity. Finally, DR21 (4) is tested with several network architectures such as VGG16 (V16), VGG19 (V19), and MobileNetV2 (MN2), and (5) it provides explanation and transparency on which parts of the image are the most surprising at different levels despite the use of a DNN-based feature extractor.

Disciplines :

Computer science

Author, co-author :

Kong, Phutphalla ; Institute of Technology of Cambodia (ITC), Russian Conf. Blvd, Phnom Penh, Cambodia ; Numediart Institute, University of Mons (UMONS), Mons, Belgium

Mancas, Matei ; Université de Mons - UMONS

Gosselin, Bernard ; Université de Mons - UMONS

Po, Kimtho; Institute of Technology of Cambodia (ITC), Russian Conf. Blvd, Phnom Penh, Cambodia

Language :

English

Title :

DeepRare: Generic Unsupervised Visual Attention Models

Publication date :

June 2022

Journal title :

Electronics

ISSN :

2079-9292

eISSN :

2079-9292

Publisher :

MDPI

Volume :

Issue :

Pages :

1696

Peer reviewed :

Peer reviewed

Additional URL :

https://www.mdpi.com/2079-9292/11/11/1696/pdf

Research institute :

R450 - Institut NUMEDIART pour les Technologies des Arts Numériques

Funding text :

Funding: This research was funded by ARES-CCD (program AI 2014-2019) by Belgian university cooperation.

Available on ORBi UMONS :

since 23 January 2023

Statistics

Number of views

18 (6 by UMONS)

Number of downloads

18 (1 by UMONS)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

Hubel, D.; Wiesel, T. Brain and Visual Perception: The Story of a 25-Year Collaboration; Oxford University Press: Oxford, UK, 2004.
Broadbent, D.E. Perception and Communication; Pergamon Press: Oxford, UK, 1958.
Mancas, M.; Le Meur, O. Applications of saliency models. In From Human Attention to Computational Attention; Springer: Berlin/Heidelberg, Germany, 2016; pp. 331–377.
Itti, L.; Koch, C.; Niebur, E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 1254–1259. [CrossRef]
Itti, L.; Koch, C. A saliency-based search mechanism for overt and covert shifts of visual attention. Vis. Res. 2000, 40, 1489–1506. [CrossRef]
Rosenholtz, R. A simple saliency model predicts a number of motion popout phenomena. Vis. Res. 1999, 39, 3157–3163. [CrossRef]
Bruce, N.; Tsotsos, J. Attention based on information maximization. J. Vis. 2010, 7, 950. [CrossRef]
Harel, J.; Koch, C.; Perona, P. Graph-based visual saliency. In Proceedings of the Twentieth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 4–7 December 2006; pp. 545–552.
Riche, N.; Mancas, M.; Duvinage, M.; Mibulumukini, M.; Gosselin, B.; Dutoit, T. RARE2012: A multi-scale rarity-based saliency detection with its comparative statistical analysis. Signal Process. Image Commun. 2013, 28, 642–658. [CrossRef]
Zhang, J.; Sclaroff, S. Saliency detection: A boolean map approach. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 153–160.
Garcia-Diaz, A.; Leboran, V.; Fdez-Vidal, X.; Pardo, X. On the relationship between optical variability, visual saliency, and eye fixations: A computational approach. J. Vis. 2012, 12, 17. [CrossRef]
MIT Saliency Benchmark. Available online: http://saliency.mit.edu/results_mit300.html (accessed on 30 November 2019).
Sun, P.; Qin, J. Neural networks based eeg-speech models. arXiv 2012, arXiv:1612.05369.
Zhao, R.; Ouyang, W.; Li, H.; Wang, X. Saliency detection by multi-context deep learning. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1265–1274.
Qin, J.; Xu, L. Data Acquisition and digital Instrumentation Engineering Modelling for Intelligent Learning and Recognition. Biosens. J. 2015, 4, 1–4.
Han, J.; Zhang, D.; Hu, X.; Li, K.; Ren, J.; Wu, F. Background Prior Based Salient Object Detection via Deep Reconstruction Residual. IEEE Trans. Circuits Syst. Video Technol. 2014, 25, 1.
Sun, P.; Qin, J. Enhanced factored three-way restricted boltzmann machines for speech detection. arXiv 2016, arXiv:1611.00326.
Jiang, M.; Huang, S.; Duan, J.; Zhao, Q. SALICON: Saliency in Context. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1072–1080.
Cornia, M.; Baraldi, L.; Serra, G.; Cucchiara, R. A Deep Multi-Level Network for Saliency Prediction. arXiv 2017, arXiv:1609.01064.
Cornia, M.; Baraldi, L.; Serra, G.; Cucchiara, R. Predicting Human Eye Fixations via an LSTM-based Saliency Attentive Model. IEEE Trans. Image Process. 2018, 27, 5142–5154. [CrossRef]
Lou, J.; Lin, H.; Marshall, D.; Saupe, D.; Liu, H. TranSalNet: Visual saliency prediction using transformers. arXiv 2021, arXiv:2110.03593.
Kroner, A.; Senden, M.; Driessens, K.; Goebel, R. Contextual encoder–decoder network for visual saliency prediction. Neural Netw. 2020, 129, 261–270. [CrossRef] [PubMed]
Ding, G.; İmamoğlu, N.; Caglayan, A.; Murakawa, M.; Nakamura, R. SalFBNet: Learning pseudo-saliency distribution via feedback convolutional networks. Image Vis. Comput. 2022, 120, 104395. [CrossRef]
Linardos, A.; Kümmerer, M.; Press, O.; Bethge, M. DeepGaze IIE: Calibrated prediction in and out-of-domain for state-of-the-art saliency modeling. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 12919–12928.
Droste, R.; Jiao, J.; Noble, J. Unified image and video saliency modeling. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 419–435.
Kümmerer, M.; Wallis, T.S.; Gatys, L.A.; Bethge, M. Understanding Low-and High-Level Contributions to Fixation Prediction. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4799–4808.
Kong, P.; Mancas, M.; Thuon, N.; Kheang, S.; Gosselin, B. Do Deep-Learning Saliency Models Really Model Saliency? In Proceedings of the 2018 25th IEEE International Conference on Image Processing (ICIP), Athens, Greece, 7–10 October 2018; pp. 2331–2335.
Kong, P.; Mancas, M.; Kheang, S.; Gosselin, B. Saliency and Object Detection. In Proceedings of the 2018 1th International Conference on Pattern Recognition and Artificial Intelligence (ICPRAI), Montréal, QC, Canada, 14–17 May 2018; pp. 523–528.
Kotseruba, I.; Wloka, C.; Rasouli, A.; Tsotsos, J. Do Saliency Models Detect Odd-One-Out Targets? New Datasets and Evaluations. arXiv 2020, arXiv:2005.06583.
Mahdi, A.; Qin, J. DeepFeat: A Bottom Up and Top Down Saliency Model Based on Deep Features of Convolutional Neural Nets. IEEE Trans. Cogn. Dev. Syst. 2019, 12, 54–63. [CrossRef]
Sun, X. Semantic and contrast-aware saliency. arXiv 2018, arXiv:1811.03736.
Mancas, M.; Kong, P.; Gosselin, B. Visual Attention: Deep Rare Features. In Proceedings of the 2020 Joint 9th International Conference on Informatics, Electronics Vision (ICIEV) and 2020 4th International Conference on Imaging, Vision Pattern Recognition (icIVPR), Kitakyushu, Japan, 26–29 August 2020; pp. 1–6.
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Li, F.-F. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255.
Chollet, F.; Keras. Available online: https://github.com/fchollet/keras (accessed on 14 March 2015).
Itti, L. Automatic foveation for video compression using a neurobiological model of visual attention. IEEE Trans. Image Process. 2004, 13, 1304–1318. [CrossRef]
Judd, T.; Durand, F.; Torralba, A. A Benchmark of Computational Models of Saliency to Predict Human Fixations. In MIT Technical Report; MIT: Cambridge, MA, USA, 2012, pp. 1–22.
Xu, J.; Jiang, M.; Wang, S.; Kankanhalli, M.S.; Zhao, Q. Predicting human gaze beyond pixels. J. Vis. 2014, 14, 28. [CrossRef]
Judd, T.; Ehinger, K.; Durand, F.; Torralba, A. Learning to predict where humans look. In Proceedings of the 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, 29 September–2 October 2009; pp. 2106–2113.
Wloka, C.; Yoo, S.A.; Sengupta, R.; Kunic, T.; Tsotsos, J. Psychophysical evaluation of saliency algorithms. J. Vis. 2016, 16, 1291. [CrossRef]