HMM-based Speech Segmentation: Improvements of Fully Automatic Approaches

Brognaux, Sandrine; Drugman, Thomas

Request a copy

Article (Scientific journals)

HMM-based Speech Segmentation: Improvements of Fully Automatic Approaches

Brognaux, Sandrine; Drugman, Thomas

2016 • In IEEE Transactions on Audio, Speech and Language Processing, 24 (1), p. 5-15

Peer Reviewed verified by ORBi

Permalink
https://hdl.handle.net/20.500.12907/41909

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

BrognauxDrugman_TASLP15.pdf

Publisher postprint (648.04 kB)

Request a copy

All documents in ORBi UMONS are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Disciplines :

Languages & linguistics

Author, co-author :

Brognaux, Sandrine ; Université de Mons > Administration > Service de l'Administration et Valorisation de la Recherche

Drugman, Thomas

Language :

English

Title :

HMM-based Speech Segmentation: Improvements of Fully Automatic Approaches

Publication date :

01 January 2016

Journal title :

IEEE Transactions on Audio, Speech and Language Processing

ISSN :

1063-6676

Publisher :

Institute of Electrical and Electronics Engineers, United States

Volume :

Issue :

Pages :

5-15

Peer reviewed :

Peer Reviewed verified by ORBi

Research unit :

F105 - Information, Signal et Intelligence artificielle

Research institute :

R450 - Institut NUMEDIART pour les Technologies des Arts Numériques

Available on ORBi UMONS :

since 20 December 2016

Statistics

Number of views

11 (0 by UMONS)

Number of downloads

0 (0 by UMONS)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

A. Hunt and A. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," in Proc. IEEE ICASSP, 1996, pp. 373-376.
H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis," Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
F. Jelinek, Statistical Methods for Speech Recognition. Cambridge, MA, USA: MIT Press, 1997.
K. Sjölander, "Wavesurfer-an open-source speech tool," in Proc. ICSLP, 2000, pp. 464-467.
P. Boersma and D. Weenink, Praat: doing phonetics by computer (version 5.1.05) [Computer Program]. May 2009 [Online]. Available: http://www.praat.org
P. Wittenburg, H. Brugman, A. Russel, A. Klassmann, and H. Sloetjes, "Elan: A professional framework for multimodality research," in Proc. LREC, 2006.
H. Kawai and T. Toda, "An evaluation of automatic phone segmentation for concatenative speech synthesis," in Proc. IEEE ICASSP, 2004, pp. 677-680.
F. Schiel, A. Kipp, and H. G. Tillman, "Statistical modeling of pronunciation: It's not the model, it's the data," in Proc. ISCA Modeling Pronunciat. Variat. for Autom. Speech Recogn., 1998.
A. Ljolje, J. Hirschberg, and J. van Santen, "ch. Automatic speech segmentation for concatenative inventory selection," in Progress in Speech Synthesis. New York, NY, USA: Springer-Verlag, 1997, pp. 305-311.
J.-P. Goldman, "EasyAlign: An automatic phonetic alignment tool under Praat," in Proc. Interspeech, 2011 [Online]. Available: http://www.isca-speech.org/archive/interspeech-2011/i11-3233.html
B. Bigi and D. Hirst, "Speech phonetization alignment and syllabification (SPPAS): A tool for the automatic analysis of speech prosody," in Proc. Speech Prosody, 2012.
J. Adell, A. Bonafonte, J. A. Gomez, and M. J. Castro, "Comparative study of automatic phone segmentation methods for TTS," in Proc. IEEE ICASSP, 2005, pp. 309-312.
D. van Niekerk and E. Barnard, "Phonetic alignment for speech synthesis in under-resourced languages," in Proc. Interspeech, 2009.
S. Brognaux, T. Drugman, and R. Beaufort, "Automatic phone alignment. a comparison between speaker-independent models and models trained on the corpus to align," Lecture Notes in Comput. Sci., vol. 7614, pp. 300-311, 2012.
J.-P. Goldman and S. Schwab, "Easyalign spanish: An (semi-) automatic segmentation tool under praat," in Proc. 5th Congr. de Fontica Experim., 2011.
J. P. H. van Santen and R. W. Sproat, "High-accuracy automatic segmentation," in Proc. Eurospeech, 1999.
O. Scharenborg, V. Wan, and M. Ernestus, "Unsupervised speech segmentation: An analysis of the hypothesized phone boundaries," J. Acoust. Soc. Amer., vol. 127, no. 2, pp. 1084-1095, 2010.
S. Brognaux, S. Roekhaut, T. Drugman, and R. Beaufort, "Train : A new online tool for automatic phonetic alignments," in Proc. IEEE Workshop Spoken Lang. Technol. (SLT), 2012 [Online]. Available: http://cental.fltr.ucl.ac.be/train-and-align/
M. Wagner, "Automatic labelling of continuous speech with a given phonetic transcription using dynamic programming algorithms," in Proc. IEEE ICASSP, 1981, pp. 1156-1159.
F. Malfrère and T. Dutoit, "High-quality speech synthesis for phonetic speech segmentation," in Proc. Eurospeech, 1997.
F. Brugnara, D. Falavigna, and M. Omologo, "Automatic segmentation and labeling of speech based on hidden Markov models," Speech Commun., vol. 12, no. 4, pp. 357-370, 1993.
K. Sjölander, "An HMM-based system for automatic segmentation and alignment of speech," in Proc. Fonetik, 2003, pp. 93-96.
D. Toledano and L. Gómez, "HMMs for automatic phonetic segmentation," in Proc. LREC, 2002.
A. Brandt, "Detecting and estimating parameters jumps using ladder algorithms and likelihood ratio test," in Proc. IEEE ICASSP, 1983, pp. 1017-1020.
S. Paulo and L. C. Oliveira, "Automatic phonetic alignment and its confidence measures," in Proc. 4th Int. Conf. EsTAL, 2004.
C. Wightman and T. Talkin, "The aligner: Text-to-speech alignment using Markov models," in Progress in Speech Synthesis. New York, NY, USA: Springer-Verlag, 1997, pp. 313-323.
L. Chen, Y. Liu, M. Harper, E. Maia, and S. McRoy, "Evaluating factors impacting the accuracy of forced alignments in a multimodal corpus," in Proc. LREC, 2004, pp. 759-762.
A. Sethy and S. Narayanan, "Refined speech segmentation for concatenative speech synthesis," in Proc. ICSLP, 2002, pp. 149-152.
H. Lo and H. Wang, "Phonetic boundary refinement using support vector machine," in Proc. IEEE ICASSP, 2007, pp. 933-936.
K. Demuynck and T. Laureys, "ch. A comparison of different approaches to automatic speech segmentation," in Text, Speech and Dialogue. Berlin/Heidelberg, Germany: Springer , 2002, pp. 277-284.
S. Jarifi, D. Pastor, and O. Rosec, "A fusion approach for automatic speech segmentation of large corpora with application to speech synthesis," Speech Commun., vol. 50, no. 1, pp. 67-80, 2008.
S. S. Park and N. S. Kim, "On using multiple models for automatic speech segmentation," IEEE Trans. Acoust., Speech, Signal Process., vol. 15, no. 8, pp. 2202-2212, Nov. 2007.
A. Katsamanis, M. P. Black, P. G. Georgiou, L. Goldstein, and S. Narayanan, "Sailalign: Robust long speech-text alignment," in Proc. Workshop New Tools Meth. for Very Large Scale Res. Phon. Sci., 2011.
S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, The HTK Book (for HTK Version 3). Cambridge, U.K.: Cambridge Univ. Press, 1995.
J.-P. Hosom, "Speaker-independent phoneme alignment using transition-dependent states," Speech Commun., vol. 51, pp. 352-368, 2008.
I. Mporas, T. Ganchev, and N. Fakotakis, "Phonetic segmentation using multiple speech features," Int. J. Speech Technol., vol. 11, pp. 73-85, 2008.
M.-B. Wesenick and A. Kipp, "Estimating the quality of phonetic transcriptions and segmentations of speech signals," in Proc. ICSLP, 1996.
P. Cosi, D. Falavigna, and M. Omologo, "A preliminary statistical evaluation of manual and automatic segmentation discrepancies," in Proc. Eurospeech, 1991, pp. 693-696.
M. A. Pitt, K. Johnson, E. Hume, K. S. , and W. Raymond, "The Buckeye corpus of conversational speech: Labeling conventions and a test of transcriber reliability," Speech Commun., vol. 45, pp. 89-95, 2005.
A. Ljolje and M. D. Riley, "Automatic segmentation of speech for TTS," in Proc. Eurospeech, 1993.
A. Burki, C. Gendrot, G. Gravier, G. Linars, and C. Fougeron, "Alignement automatique et analyse phonétique: Comparaison de différents systèmes pour l'analyse du schwa," Traitement Autom. des Langues, vol. 49, no. 3, pp. 165-197, 2008.
J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett., vol. 6, no. 1, pp. 1-3, Jan. 1999.
G. Peeters, "A large set of audio features for sound description (similarity and classification) in the Cuidado Project," Inst. de Recherche et Coordination Acoustique/Musique (IRCAM), Tech. Rep., 2004.
B. Bozkurt, B. Doval, C. D'Alessandro, and T. Dutoit, "Improved differential phase spectrum processing for formant tracking," in Proc. ICSLP, 2004.
T. Drugman and A. Alwan, "Joint robust voicing detection and pitch estimation based on residual harmonics," in Proc. Interspeech, 2011.
T. Drugman, T. Dubuisson, and T. Dutoit, "Phase-based information for voice pathology detection," in Proc. IEEE ICASSP, 2011, pp. 4612-4615.
F. Dellaert, T. Polzin, and A. Waibel, "Recognizing emotion in speech," in Proc. ICSLP, 1996.
S. Brognaux, B. Picart, and T. Drugman, "A new prosody annotation protocol for live sports commentaries," in Proc. Interspeech, 2013.
J. Yuan and M. Liberman, "Speaker identification on the SCOTUS corpus," in Proc. Acoust., 2008.
V. Colotte and R. Beaufort, "Linguistic features weighting for a textto-speech system without prosody model," in Proc. Interspeech, 2005.