On the Use of Glottal Source for Expressive Speech Analysis

Drugman, Thomas; Dubuisson, Thomas; Dutoit, Thierry

No full text

Unpublished conference/Abstract (Scientific congresses and symposiums)

On the Use of Glottal Source for Expressive Speech Analysis

Drugman, Thomas; Dubuisson, Thomas; Dutoit, Thierry

2011 • 9th Pan European Conference (PEVOC 9)

Permalink
https://hdl.handle.net/20.500.12907/21511

Files (2)Send to Details Statistics Bibliography Similar publications

Files

Full Text

No document available.

Annexes

pevoc9.pdf

Publisher postprint (12.2 kB)

Request a copy

PEVOC11DrugmanDubuissonDutoit.pdf

Publisher postprint (1.78 MB)

Request a copy

All documents in ORBi UMONS are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Abstract :

[en] This contribution summarizes our recent investigations in the use of the glottal source for characterizing expressive voice. It is organized in three main parts. First, we study which methods are the most suited for estimating the glottal flow directly from the speech signal. This is a particularly difficult task which is a typical case of blind separation, since neither the vocal tract nor the glottal components are observable. Secondly, we focus on the parameterization of the resulting glottal flow estimates, highlighting which features are the most appropriate to characterize it. Finally, we report our results of glottal analysis of expressive speech, revealing interesting modifications in the glottal behavior when producing Lombard speech, various voice qualities, or hypo/hyperarticulated speech. I. Glottal Source Estimation As mentioned above, reliably and accurately estimating the glottal source from speech recordings is a complex issue. This usually requires to process speech frames synchronized on glottal closure instants and whose length is proportional to the pitch period. For this, three of the most efficient approaches are the following [1]. The Closed Phase Inverse Filtering (CPIF, [2]) method computes an estimation of the vocal tract response during the glottal closed phase, during which the effects of the subglottal cavities are minimized. The Iterative Adaptive Inverse Filtering (IAIF, [3]) technique is based on an iterative refinement of both the vocal tract and the glottal components in order to improve the quality of the estimates. Finally the Mixed-Phase Separation (MPS, [4]) approach is a non-parametric technique which relies on the causal/anticausal properties of speech. More precisely, it isolates the anticausal component of speech as it corresponds to the glottal open phase. In [1], we have shown that CPIF and MPS lead to the most efficient estimation of the glottal flow for clean recordings. II. Glottal Source Parameterization From the resulting estimates of the glottal flow, several features can be extracted [1]. In the time domain, we found out that the Normalized Amplitude Quotient (NAQ) and the Quasi Open Quotient (QOQ) are amongst the most suited glottal characteristics. In the spectral domain, the Harmonic Richness Factor (HRF) and the ratio between the two first harmonic amplitudes (H1-H2) provided an efficient description of the glottal source [1]. III. Glottal Analysis of Expressive Speech Based on the conclusions drawn here above, several types of expressive were analyzed. Lombard Speech: The Lombard effect refers to the speech changes due to the immersion of the speaker in a noisy environment. In such a context, the speaker tends to modify its way of uttering so as to maximize the intelligibility of its message. We have shown in [5] that this is also reflected by a significant modification of the glottal behavior. Important variations of the glottal parameters were observed, depending on the level and type of the noise. For example in a factory noise of 84 dB, NAQ is decreased by 26.4%, QOQ by 12.6%, H1-H2 by 2.9dB and HRF is increased by 4.1dB. Voice Quality: Our study was here led on a database where the same speaker produces modal, but also soft and loud voice. It was shown in [1] that when the vocal effort becomes stronger, NAQ and H1-H2 are significantly decreased while HRF is consistently increased. Hypo/Hyperarticulated Speech: For hyperarticulated voice, speech clarity tends to be maximized, while hypoarticulation refers to speech produced with minimal efforts. We have shown in [6] that the stronger the degree of articulation, the higher the glottal formant frequency, the maximum voiced frequency and the fundamental frequency.

Disciplines :

Electrical & electronics engineering
Library & information sciences

Author, co-author :

Drugman, Thomas ; Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle

Dubuisson, Thomas ; Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle

Dutoit, Thierry ; Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle

Language :

English

Title :

On the Use of Glottal Source for Expressive Speech Analysis

Publication date :

31 August 2011

Number of pages :

Event name :

9th Pan European Conference (PEVOC 9)

Event place :

Marseille, France

Event date :

2011

Research unit :

F105 - Information, Signal et Intelligence artificielle

Available on ORBi UMONS :

since 01 February 2012

Statistics

Number of views

41 (0 by UMONS)

Number of downloads

0 (0 by UMONS)

More statistics