Entire system; Stereo cameras; Synchronization method; Synchronization systems; Video synchronizations; Software; Signal Processing; Electrical and Electronic Engineering
Abstract :
[en] Stereo vision is essential for many applications. Currently, the synchronization of the streams coming from two cameras is done using mostly hardware. A software-based synchronization method would reduce the cost, weight and size of the entire system and allow for more flexibility when building such systems. With this goal in mind, we present here a comparison of different deep learning-based systems and prove that some are efficient and generalizable enough for such a task. This study paves the way to a production ready software-based video synchronization system.
Disciplines :
Computer science
Author, co-author :
Boizard, Nicolas; University of Mons, ISIA Lab, Mons, Belgium
Haddad, Kevin El; University of Mons, ISIA Lab, Mons, Belgium ; Big Projects, Mons, Belgium
Ravet, Thierry ; Université de Mons - UMONS > Faculté Polytechniqu > Service Information, Signal et Intelligence artificielle
Cresson, Francois; University of Mons, ISIA Lab, Mons, Belgium
Dutoit, Thierry ; Université de Mons - UMONS > Faculté Polytechniqu > Service Information, Signal et Intelligence artificielle
Language :
English
Title :
Deep Learning-Based Stereo Camera Multi-Video Synchronization
Publication date :
08 June 2023
Event name :
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Event place :
Rhodes Island, Grc
Event date :
04-06-2023 => 10-06-2023
Audience :
International
Journal title :
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN :
1520-6149
Publisher :
Institute of Electrical and Electronics Engineers Inc.
Sameer Ansari, Neal Wadhwa, Rahul Garg, and Jiawen Chen, “Wireless software synchronization of multiple distributed cameras,” in 2019 IEEE International Conference on Computational Photography (ICCP), 2019, pp. 1–9.
Marsel Faizullin, Anastasiia Kornilova, Azat Akhmetyanov, and Gonzalo Ferrer, “Twist-n-sync: Software clock synchronization with microseconds accuracy using mems-gyroscopes,” Sensors, vol. 21, no. 1, pp. 68, 2020.
Luca Zini, Andrea Cavallaro, and Francesca Odone, “Action-based multi-camera synchronization,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 3, no. 2, pp. 165–174, 2013.
Liqiang Yin, Ruize Han, Wei Feng, and Song Wang, “Self-supervised human pose based multi-camera video synchronization,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 1739–1748.
Ido Freeman, Patrick Wieschollek, and Hendrik P. A. Lensch, “Robust video synchronization using unsupervised deep learning,” CoRR, vol. abs/1610.05985, 2016.
David Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, pp. 91–, 11 2004.
Iaroslav Melekhov, Juho Kannala, and Esa Rahtu, “Siamese network features for image matching,” in 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 378–383.
Bo Tao, Licheng Huang, Haoyi Zhao, Gongfa Li, and Xiliang Tong, “A time sequence images matching method based on the siamese network,” Sensors, vol. 21, no. 17, pp. 5900, 2021.
Sumit Chopra, Raia Hadsell, and Yann LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE, 2005, vol. 1, pp. 539–546.
Rahul Rama Varior, Mrinal Haloi, and Gang Wang, “Gated siamese convolutional neural network architecture for human re-identification,” in European conference on computer vision. Springer, 2016, pp. 791–808.
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah, “Signature verification using a” siamese” time delay neural network,” Advances in neural information processing systems, vol. 6, 1993.
De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng, “Person re-identification by multichannel parts-based cnn with improved triplet loss function,” in Proceedings of the iEEE conference on computer vision and pattern recognition, 2016, pp. 1335–1344.
Albert Gordo, Jon Almazán, Jerome Revaud, and Diane Larlus, “Deep image retrieval: Learning global representations for image search,” in Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, Eds., Cham, 2016, pp. 241–257, Springer International Publishing.
Gunnar Farnebäck, “Two-frame motion estimation based on polynomial expansion,” in Image Analysis, Josef Bigun and Tomas Gustavsson, Eds., Berlin, Heidelberg, 2003, pp. 363–370, Springer Berlin Heidelberg.
Mingxing Tan and Quoc V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” 2019.
Elad Hoffer and Nir Ailon, “Deep metric learning using triplet network,” 2014.
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2020.