Abstract :
[en] Several automatic phonetic alignment tools have been proposed in the
literature. They generally use speaker-independent acoustic models of the language
to align new corpora. The problem is that the range of provided models is
limited. It does not cover all languages and speaking styles (spontaneous, expressive,
etc.). This study investigates the possibility of directly training the statistical
model on the corpus to align. The main advantage is that it is applicable to any
language and speaking style. Moreover, comparisons indicate that it provides as
good or better results than using speaker-independent models of the language. It
shows that about 2% are gained, with a 20 ms threshold, by using our method.
Experiments were carried out on neutral and expressive corpora in French and
English. The study also points out that even a small neutral corpus of a few minutes
can be exploited to train a model that will provide high-quality alignment.
Scopus citations®
without self-citations
8