Soccer captioning: dataset, transformer-based model, and triple-level evaluation

Dupont, Stéphane ; Université de Mons > Faculté Polytechnique > Service Information, Signal et Intelligence artificielle ; Université de Mons > Faculté des Sciences > Service d'Intelligence Artificielle

Language :

English

Title :

Publication date :

2022

Journal title :

Procedia Computer Science

eISSN :

1877-0509

Publisher :

Elsevier, Amsterdam, Netherlands

Volume :

210

Issue :

Pages :

104-111

Peer reviewed :

Peer reviewed

Research unit :

F105 - Information, Signal et Intelligence artificielle
F151 - Mathématique et Recherche opérationnelle
S841 - MAIA - Service d'Intelligence Artificielle

Research institute :

R300 - Institut de Recherche en Technologies de l'Information et Sciences de l'Informatique
R450 - Institut NUMEDIART pour les Technologies des Arts Numériques

Available on ORBi UMONS :

since 10 February 2022

Statistics

Number of views

82 (13 by UMONS)

Number of downloads

9 (4 by UMONS)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

A. Crisell An introductory history of British broadcasting 2005 Routledge
B. Schultz Sports media: Reporting, producing, and planning 2012 Routledge
S. Guadarrama, N. Krishnamoorthy, G. Malkarnenkar, S. Venugopalan, R. Mooney, T. Darrell, and K. Saenko Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition Proceedings of the IEEE international conference on computer vision 2013 2712 2719
M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele Translating video content to natural language descriptions Proceedings of the IEEE international conference on computer vision 2013 433 440
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, K. Saenko, Translating videos to natural language using deep recurrent neural networks, arXiv preprint arXiv:1412.4729.
L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville Describing videos by exploiting temporal structure Proceedings of the IEEE international conference on computer vision 2015 4507 4515
L. Zhou, Y. Kalantidis, X. Chen, J.J. Corso, and M. Rohrbach Grounded video description Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019 6578 6587
B. Pan, H. Cai, D.-A. Huang, K.-H. Lee, A. Gaidon, E. Adeli, and J.C. Niebles Spatio-temporal graph for video captioning with knowledge distillation Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2020 10870 10879
J.-B. Delbrouck, S. Dupont, Umons submission for wmt18 multimodal translation task, arXiv preprint arXiv:1810.06233.
J.-B. Delbrouck, S. Dupont, Modulating and attending the source image during encoding improves multimodal translation, arXiv preprint arXiv:1712.03449.
J.-B. Delbrouck, S. Dupont, An empirical study on the effectiveness of images in multimodal neural machine translation, arXiv preprint arXiv:1707.00995.
J.-B. Delbrouck, S. Dupont, O. Seddati, Visually grounded word embeddings and richer visual features for improving multimodal neural machine translation, arXiv preprint arXiv:1707.01009.
J.-B. Delbrouck, S. Dupont, Multimodal compact bilinear pooling for multimodal neural machine translation, arXiv preprint arXiv:1703.08084.
J. Hessel, B. Pang, Z. Zhu, R. Soricut, A case study on combining asr and visual features for generating instructional video captions, arXiv preprint arXiv:1910.02930.
V. Iashin, and E. Rahtu Multi-modal dense video captioning Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops 2020 958 959
H. Yu, S. Cheng, B. Ni, M. Wang, J. Zhang, and X. Yang Fine-grained video captioning for sports narrative Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2018 6006 6015
M. Qi, Y. Wang, A. Li, and J. Luo Sports video captioning via attentive motion representation and group relationship modeling IEEE Transactions on Circuits and Systems for Video Technology 30 8 2019 2617 2633
M. Veloso, N. Armstrong-Crews, S. Chernova, E. Crawford, C. McMillen, M. Roth, D. Vail, and S. Zickler A team of humanoid game commentators International Journal of Humanoid Robotics 5 03 2008 457 480
A. Deliege, A. Cioppa, S. Giancola, M.J. Seikavandi, J.V. Dueholm, K. Nasrollahi, B. Ghanem, T.B. Moeslund, and M. Van Droogenbroeck Soccernet-v2: A dataset and benchmarks for holistic understanding of broadcast soccer videos Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2021 4508 4519
M. Tomei, L. Baraldi, S. Calderara, S. Bronzin, and R. Cucchiara Rms-net: Regression and masking for soccer event spotting 2020 25thInternational Conference on Pattern Recognition (ICPR) 2021 IEEE 7699 7706
M. Brousmiche, J. Rouat, and S. Dupont Multimodal attentive fusion network for audio-visual event recognition Information Fusion 85 2022 52 59
M. Brousmiche, S. Dupont, and J. Rout Intra and inter-modality interactions for audio-visual event detection Proceedings of the 1st International Workshop on Human-centric Multimedia Analysis 2020 5 11
M. Brousmiche, J. Rouat, S. Dupont, Multi-level attention fusion network for audio-visual event recognition, arXiv preprint arXiv:2106.06736.
M. Brousmiche, S. Dupont, J. Rouat, Avecl-umons database for audio-visual event classification and localization, arXiv preprint arXiv:2011.01018.
M. Brousmiche, J. Rouat, and S. Dupont Audio-visual fusion and conditioning with neural networks for event recognition 2019 IEEE 29thInternational Workshop on Machine Learning for Signal Processing (MLSP) 2019 IEEE 1 6
B. Vanderplaetse, and S. Dupont Improved soccer action spotting using both audio and video streams Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops 2020 896 897
Z. Teed, and J. Deng Raft: Recurrent all-pairs feld transforms for optical flow European conference on computer vision 2020 Springer 402 419
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, L. Kaiser, and I. Polosukhin Attention is all you need Advances in neural information processing systems 2017 5998 6008
X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, C. L. Zitnick, Microsoft coco captions: Data collection and evaluation server, arXiv preprint arXiv:1504.00325.
M. Nikolaus, M. Abdou, M. Lamm, R. Aralikatte, D. Elliott, Compositional generalization in image captioning, arXiv preprint arXiv:1909.04402.