Deep Learning-Based Stereo Camera Multi-Video Synchronization

Entire system; Stereo cameras; Synchronization method; Synchronization systems; Video synchronizations; Software; Signal Processing; Electrical and Electronic Engineering

Abstract :

[en] Stereo vision is essential for many applications. Currently, the synchronization of the streams coming from two cameras is done using mostly hardware. A software-based synchronization method would reduce the cost, weight and size of the entire system and allow for more flexibility when building such systems. With this goal in mind, we present here a comparison of different deep learning-based systems and prove that some are efficient and generalizable enough for such a task. This study paves the way to a production ready software-based video synchronization system.

Disciplines :

Computer science

Author, co-author :

Boizard, Nicolas; University of Mons, ISIA Lab, Mons, Belgium

Haddad, Kevin El; University of Mons, ISIA Lab, Mons, Belgium ; Big Projects, Mons, Belgium

Ravet, Thierry ; Université de Mons - UMONS > Faculté Polytechniqu > Service Information, Signal et Intelligence artificielle

Cresson, Francois; University of Mons, ISIA Lab, Mons, Belgium

Dutoit, Thierry ; Université de Mons - UMONS > Faculté Polytechniqu > Service Information, Signal et Intelligence artificielle

Language :

English

Title :

Deep Learning-Based Stereo Camera Multi-Video Synchronization

Publication date :

08 June 2023

Event name :

ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Event place :

Rhodes Island, Grc

Event date :

04-06-2023 => 10-06-2023

Audience :

International

Journal title :

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

ISSN :

1520-6149

Publisher :

Institute of Electrical and Electronics Engineers Inc.

Peer reviewed :

Peer reviewed

Additional URL :

http://xplorestaging.ieee.org/ielx7/10094559/10094560/10097105.pdf?arnumber=10097105

Research unit :

- Information, Signal and Artificial Intelligence

Research institute :

R450 - Institut NUMEDIART pour les Technologies des Arts Numériques

Funders :

IEEE
IEEE Signal Processing Society

Available on ORBi UMONS :

since 15 January 2024

Statistics

Number of views

3 (1 by UMONS)

Number of downloads

0 (0 by UMONS)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

Sameer Ansari, Neal Wadhwa, Rahul Garg, and Jiawen Chen, “Wireless software synchronization of multiple distributed cameras,” in 2019 IEEE International Conference on Computational Photography (ICCP), 2019, pp. 1–9.
Marsel Faizullin, Anastasiia Kornilova, Azat Akhmetyanov, and Gonzalo Ferrer, “Twist-n-sync: Software clock synchronization with microseconds accuracy using mems-gyroscopes,” Sensors, vol. 21, no. 1, pp. 68, 2020.
Luca Zini, Andrea Cavallaro, and Francesca Odone, “Action-based multi-camera synchronization,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 3, no. 2, pp. 165–174, 2013.
Liqiang Yin, Ruize Han, Wei Feng, and Song Wang, “Self-supervised human pose based multi-camera video synchronization,” in Proceedings of the 30th ACM International Conference on Multimedia, 2022, pp. 1739–1748.
Ido Freeman, Patrick Wieschollek, and Hendrik P. A. Lensch, “Robust video synchronization using unsupervised deep learning,” CoRR, vol. abs/1610.05985, 2016.
David Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, pp. 91–, 11 2004.
Iaroslav Melekhov, Juho Kannala, and Esa Rahtu, “Siamese network features for image matching,” in 2016 23rd International Conference on Pattern Recognition (ICPR), 2016, pp. 378–383.
Bo Tao, Licheng Huang, Haoyi Zhao, Gongfa Li, and Xiliang Tong, “A time sequence images matching method based on the siamese network,” Sensors, vol. 21, no. 17, pp. 5900, 2021.
Sumit Chopra, Raia Hadsell, and Yann LeCun, “Learning a similarity metric discriminatively, with application to face verification,” in 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05). IEEE, 2005, vol. 1, pp. 539–546.
Rahul Rama Varior, Mrinal Haloi, and Gang Wang, “Gated siamese convolutional neural network architecture for human re-identification,” in European conference on computer vision. Springer, 2016, pp. 791–808.
Jane Bromley, Isabelle Guyon, Yann LeCun, Eduard Säckinger, and Roopak Shah, “Signature verification using a” siamese” time delay neural network,” Advances in neural information processing systems, vol. 6, 1993.
De Cheng, Yihong Gong, Sanping Zhou, Jinjun Wang, and Nanning Zheng, “Person re-identification by multichannel parts-based cnn with improved triplet loss function,” in Proceedings of the iEEE conference on computer vision and pattern recognition, 2016, pp. 1335–1344.
Albert Gordo, Jon Almazán, Jerome Revaud, and Diane Larlus, “Deep image retrieval: Learning global representations for image search,” in Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling, Eds., Cham, 2016, pp. 241–257, Springer International Publishing.
Gunnar Farnebäck, “Two-frame motion estimation based on polynomial expansion,” in Image Analysis, Josef Bigun and Tomas Gustavsson, Eds., Berlin, Heidelberg, 2003, pp. 363–370, Springer Berlin Heidelberg.
Mingxing Tan and Quoc V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” 2019.
Elad Hoffer and Nir Ailon, “Deep metric learning using triplet network,” 2014.
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2020.