No full text
Unpublished conference/Abstract (Scientific congresses and symposiums)
On the Use of Glottal Source for Expressive Speech Analysis
Drugman, Thomas; Dubuisson, Thomas; Dutoit, Thierry
20119th Pan European Conference (PEVOC 9)
 

Files


Full Text
No document available.
Annexes
pevoc9.pdf
Publisher postprint (12.2 kB)
Request a copy
PEVOC11DrugmanDubuissonDutoit.pdf
Publisher postprint (1.78 MB)
Request a copy

All documents in ORBi UMONS are protected by a user license.

Send to



Details



Abstract :
[en] This contribution summarizes our recent investigations in the use of the glottal source for characterizing expressive voice. It is organized in three main parts. First, we study which methods are the most suited for estimating the glottal flow directly from the speech signal. This is a particularly difficult task which is a typical case of blind separation, since neither the vocal tract nor the glottal components are observable. Secondly, we focus on the parameterization of the resulting glottal flow estimates, highlighting which features are the most appropriate to characterize it. Finally, we report our results of glottal analysis of expressive speech, revealing interesting modifications in the glottal behavior when producing Lombard speech, various voice qualities, or hypo/hyperarticulated speech. I. Glottal Source Estimation As mentioned above, reliably and accurately estimating the glottal source from speech recordings is a complex issue. This usually requires to process speech frames synchronized on glottal closure instants and whose length is proportional to the pitch period. For this, three of the most efficient approaches are the following [1]. The Closed Phase Inverse Filtering (CPIF, [2]) method computes an estimation of the vocal tract response during the glottal closed phase, during which the effects of the subglottal cavities are minimized. The Iterative Adaptive Inverse Filtering (IAIF, [3]) technique is based on an iterative refinement of both the vocal tract and the glottal components in order to improve the quality of the estimates. Finally the Mixed-Phase Separation (MPS, [4]) approach is a non-parametric technique which relies on the causal/anticausal properties of speech. More precisely, it isolates the anticausal component of speech as it corresponds to the glottal open phase. In [1], we have shown that CPIF and MPS lead to the most efficient estimation of the glottal flow for clean recordings. II. Glottal Source Parameterization From the resulting estimates of the glottal flow, several features can be extracted [1]. In the time domain, we found out that the Normalized Amplitude Quotient (NAQ) and the Quasi Open Quotient (QOQ) are amongst the most suited glottal characteristics. In the spectral domain, the Harmonic Richness Factor (HRF) and the ratio between the two first harmonic amplitudes (H1-H2) provided an efficient description of the glottal source [1]. III. Glottal Analysis of Expressive Speech Based on the conclusions drawn here above, several types of expressive were analyzed. Lombard Speech: The Lombard effect refers to the speech changes due to the immersion of the speaker in a noisy environment. In such a context, the speaker tends to modify its way of uttering so as to maximize the intelligibility of its message. We have shown in [5] that this is also reflected by a significant modification of the glottal behavior. Important variations of the glottal parameters were observed, depending on the level and type of the noise. For example in a factory noise of 84 dB, NAQ is decreased by 26.4%, QOQ by 12.6%, H1-H2 by 2.9dB and HRF is increased by 4.1dB. Voice Quality: Our study was here led on a database where the same speaker produces modal, but also soft and loud voice. It was shown in [1] that when the vocal effort becomes stronger, NAQ and H1-H2 are significantly decreased while HRF is consistently increased. Hypo/Hyperarticulated Speech: For hyperarticulated voice, speech clarity tends to be maximized, while hypoarticulation refers to speech produced with minimal efforts. We have shown in [6] that the stronger the degree of articulation, the higher the glottal formant frequency, the maximum voiced frequency and the fundamental frequency.
Disciplines :
Electrical & electronics engineering
Library & information sciences
Author, co-author :
Drugman, Thomas ;  Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle
Dubuisson, Thomas ;  Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle
Dutoit, Thierry ;  Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle
Language :
English
Title :
On the Use of Glottal Source for Expressive Speech Analysis
Publication date :
31 August 2011
Number of pages :
1
Event name :
9th Pan European Conference (PEVOC 9)
Event place :
Marseille, France
Event date :
2011
Research unit :
F105 - Information, Signal et Intelligence artificielle
Available on ORBi UMONS :
since 01 February 2012

Statistics


Number of views
9 (0 by UMONS)
Number of downloads
0 (0 by UMONS)

Bibliography


Similar publications



Contact ORBi UMONS