MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and  Phonetic Domains for Speech Representation Learning

Tits, Noé

Download

Paper published in a book (Scientific congresses and symposiums)

MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and Phonetic Domains for Speech Representation Learning

Tits, Noé

2023 • In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

Peer reviewed

Permalink
https://hdl.handle.net/20.500.12907/46928

arXiV
2310.11541v1

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

2023.emnlp-industry.8.pdf

Publisher postprint (249.85 kB)

Download

All documents in ORBi UMONS are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

Computer Science - Computation and Language; Computer Science - Artificial Intelligence; Computer Science - Learning; eess.AS

Abstract :

[en] In this paper, we present a methodology for linguistic feature extraction, focusing particularly on automatically syllabifying words in multiple languages, with a design to be compatible with a forced-alignment tool, the Montreal Forced Aligner (MFA). In both the textual and phonetic domains, our method focuses on the extraction of phonetic transcriptions from text, stress marks, and a unified automatic syllabification (in text and phonetic domains). The system was built with open-source components and resources. Through an ablation study, we demonstrate the efficacy of our approach in automatically syllabifying words from several languages (English, French and Spanish). Additionally, we apply the technique to the transcriptions of the CMU ARCTIC dataset, generating valuable annotations available online\footnote{\url{https://github.com/noetits/MUST_P-SRL}} that are ideal for speech representation learning, speech unit discovery, and disentanglement of speech factors in several speech-related fields.

Disciplines :

Electrical & electronics engineering

Author, co-author :

Tits, Noé ; Université de Mons - UMONS > Faculté Polytechniqu > Service Information, Signal et Intelligence artificielle

Language :

English

Title :

MUST&P-SRL: Multi-lingual and Unified Syllabification in Text and Phonetic Domains for Speech Representation Learning

Publication date :

2023

Event name :

2023 Conference on Empirical Methods in Natural Language Processing

Event organizer :

Association for Computational Linguistics

Event place :

Singapore

Event date :

6-10 december 2023

Audience :

International

Main work title :

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track

Publisher :

Association for Computational Linguistics

Pages :

74-82

Peer reviewed :

Peer reviewed

Development Goals :

9. Industry, innovation and infrastructure

Additional URL :

https://aclanthology.org/2023.emnlp-industry.8/

Research unit :

F105 - Information, Signal et Intelligence artificielle

Research institute :

R450 - Institut NUMEDIART pour les Technologies des Arts Numériques

Funders :

SPW EER - Service Public de Wallonie. Economie, Emploi, Recherche

Funding text :

This work is part of the project REDCALL that is partially funded by a FIRST Entreprise Docteur program from SPW Recherche

Available on ORBi UMONS :

since 08 December 2023

Statistics

Number of views

15 (0 by UMONS)

Number of downloads

15 (0 by UMONS)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

Adaeze Adigwe, Noé Tits, Kevin El Haddad, Sarah Os-tadabbas, and Thierry Dutoit. 2018. The emotional voices database: Towards controlling the emotion dimension in voice generation systems. arXiv preprint arXiv:1806.09514.
Susan Bartlett, Grzegorz Kondrak, and Colin Cherry. 2009. On the syllabification of phonemes. In Proceedings of human language technologies: The 2009 annual conference of the north american chapter of the association for computational linguistics, pages 308-316.
Brigitte Bigi and Katarzyna Klessa. 2015. Automatic syllabification of polish. In 7th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, pages 262-266.
Brigitte Bigi, Christine Meunier, Irina Nesterenko, and Roxane Bertrand. 2010. Automatic detection of syllable boundaries in spontaneous speech. In 7th International conference on Language Resources and Evaluation (LREC 2010), pages 3285-3292.
Brigitte Bigi and Caterina Petrone. 2014. A generic tool for the automatic syllabification of italian. A generic tool for the automatic syllabification of Italian, pages 73-77.
Jessica DeLisi. 2015. Sonority sequencing violations and prosodic structure in latin and other indoeuropean languages. Indo-European Linguistics, 3(1):1-23.
Zenón Hernández-Figueroa, Francisco J Carreras-Riudavets, and Gustavo Rodríguez-Rodríguez. 2013. Automatic syllabification for spanish using lemmatization and derivation to solve the prefix's prominence issue. Expert systems with applications, 40(17):7122-7131.
Luca Iacoponi and Renata Savy. 2011. Sylli: Automatic phonological syllabification for italian. In Twelfth Annual Conference of the International Speech Communication Association.
John Kominek and Alan W Black. 2004. The cmu arctic speech databases. In Fifth ISCA workshop on speech synthesis.
Jacob Krantz, Maxwell Dulin, and Paul De Palma. 2019. Language-agnostic syllabification with neural sequence labeling. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pages 804-810. IEEE.
Jacob Krantz, Maxwell Dulin, Paul De Palma, and Mark VanDam. 2018. Syllabification by phone categorization. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, pages 47-48.
Yannick Marchand, Connie R Adsett, and Robert I Damper. 2009. Automatic syllabification in english: A comparison of different algorithms. Language and speech, 52(1):1-27.
Michael McAuliffe, Michaela Socolof, Sarah Mihuc, Michael Wagner, and Morgan Sonderegger. 2017. Montreal forced aligner: Trainable text-speech alignment using kaldi. In Interspeech, volume 2017, pages 498-502.
Meinard Müller. 2007. Dynamic time warping. Information retrieval for music and motion, pages 69-84.
Abhijit Pradhan, Anusha Prakash, Kamakoti Veezhinathan, Hema Murthy, et al. 2013. A syllable based statistical text to speech system. In 21st Euro-pean signal processing conference (EUSIPCO 2013), pages 1-5. IEEE.
Kseniya Rogova, Kris Demuynck, and Dirk Van Com-pernolle. 2013. Automatic syllabification using segmental conditional random fields. Computational Linguistics in the Netherlands Journal, 3:34-48.
Chuanqi Tan, Fuchun Sun, Tao Kong, Wenchang Zhang, Chao Yang, and Chunfang Liu. 2018. A survey on deep transfer learning. In International conference on artificial neural networks, pages 270-279. Springer.
Paul Taylor, Alan W Black, and Richard Caley. 1998. The architecture of the festival speech synthesis system. In The third ESCA/COCOSDA workshop (ETRW) on speech synthesis.
Noé Tits, Kevin El Haddad, and Thierry Dutoit. 2018. Asr-based features for emotion recognition: A transfer learning approach. In Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML), pages 48-52. Association for Computational Linguistics.
Noé Tits, Kevin El Haddad, and Thierry Dutoit. 2020. Exploring Transfer Learning for Low Resource Emotional TTS. In Intelligent Systems and Applications, pages 52-60, Cham. Springer International Publishing.
Noé Tits, Kevin El Haddad, and Thierry Dutoit. 2021. Analysis and assessment of controllability of an expressive deep learning-based tts system. In Informatics, volume 8, page 84. MDPI.
Noé Tits, Fengna Wang, Kevin El Haddad, Vincent Pagel, and Thierry Dutoit. 2019. Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis. In Proc. Interspeech 2019, pages 4475-4479.
Noé Tits and Zoé Broisson. 2023. Flowchase: a Mobile Application for Pronunciation Training. In Proc. 9th Workshop on Speech and Language Technology in Education (SLaTE), pages 93-94.
Theo Vennemann. 1987. Preference laws for syllable structure: And the explanation of sound change with special reference to German, Germanic, Italian, and Latin. de Gruyter.
Dong Wang and Thomas Fang Zheng. 2015. Transfer learning for speech and language processing. In 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pages 1225-1237. IEEE.
Ruihua Yin, Jeroen van de Weijer, and Erich R Round. 2023. Frequent violation of the sonority sequencing principle in hundreds of languages: how often and by which sequences? Linguistic Typology.
Guanlong Zhao, Sinem Sonsaat, Alif Silpachai, Ivana Lucic, Evgeny Chukharev-Hudilainen, John Levis, and Ricardo Gutierrez-Osuna. 2018. L2-arctic: A non-native english speech corpus. In Interspeech, pages 2783-2787.
Kun Zhou, Berrak Sisman, Rui Liu, and Haizhou Li. 2022. Emotional voice conversion: Theory, databases and esd. Speech Communication, 137:1-18.