Towards a Voice Conversion System Based on Frame Selection

[en] The subject of this paper is the conversion of a given speaker's voice (the source speaker) into another identified voice (the target one). We assume we have at our disposal a large amount of speech samples from source and target voice with at least a part of them being parallel. The proposed system is built on a mapping function between source and target spectral envelopes followed by a frame selection algorithm to produce final spectral envelopes. Converted speech is produced by a basic LP analysis of the source and LP synthesis using the converted spectral envelopes. We compared three types of conversion: without mapping, with mapping and using the excitation of the source speaker and finally with mapping using the excitation of the target. Results show that the combination of mapping and frame selection provide the best results, and underline the interest to work on methods to convert the LP excitation.

Disciplines :

Electrical & electronics engineering

Author, co-author :

Dutoit, Thierry ; Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle

Holzapfel, A.

Jottrand, Matthieu

Moinet, Alexis ; Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle

Perez, Javier

Stylianou, Y.

Language :

English

Title :

Towards a Voice Conversion System Based on Frame Selection

Publication date :

16 April 2007

Event name :

ICASSP 2007 - International Conference on Acoustics, Speech and Signal Processing

Event place :

Honolulu, United States - Hawaii

Event date :

2007

Research unit :

F105 - Information, Signal et Intelligence artificielle

Research institute :

R300 - Institut de Recherche en Technologies de l'Information et Sciences de l'Informatique
R450 - Institut NUMEDIART pour les Technologies des Arts Numériques

Commentary :

see also : proceedings of enterface 2006 : 'Multimodal Speaker Conversion - his master's voice... and face -'

Available on ORBi UMONS :

since 10 December 2010

Statistics

Number of views

98 (0 by UMONS)

Number of downloads

0 (0 by UMONS)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

Bibliography

M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, "Voice conversion through vector quantization," in Proc. ICASSP88, 1988, pp. 655-658.
H. Kuwabara and Y. Sagisaka, "Acoustic characteristics of speaker individuality: Control and conversion," Speech Communication, vol. 16, no. 2, pp. 165-173, 1995.
H. Valbret, E. Mulines, and J.P. Tubach, "Voice transformation using PSOLA techniques," Speech Communication, vol. 11, no. 2.
Y. Stylianou, O. Cappé, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, vol. 6, no. 2, pp. 131-142, 1998.
A. Kain and M. Macon, "Spectral voice conversion for text-to-speech synthesis," in Proc. ICASSP98, 1998, pp. 285-288.
N. Iwahashi and Y. Sagisaka, "Speech spectrum transformation based on speaker interpolation," in Proc. ICASSP94, 1994.
A. Mouchtaris, J. Van derSpiegel, and P.Mueller, "Non parallel training for voice conversion based on a parameter adaptation," IEEE TRANSACTIONS ON SPEECH AUDIO and LANGUAGE PROCESSING, vol. 14, no. 3, pp. 952-963, 2006.
D. Suendermann, H. Hoege, A. Bonafonte, H. Ney, A. Black, and S. Narayanan, "Text-independent voice conversion based on unit selection," in Proc. ICASSP06, Toulouse, 2006, pp. 81-84.
R.V. Cox K. Lee. "A segmental speech coder based on a concatenative tts," Speech Communication, vol. 38. no. 1, pp. 89-100. 2002.
A. Gersho and R. Gray, Vector quantization and signal compression, Kluwer Academic Publishers, Norwell, Massachusetts, 1992.