Abstract :
[en] Dealing with noise deteriorating the speech is still a major
problem for automatic speech recognition. An interesting approach to
tackle this problem consists of using multi-task learning. In this case,
an efficient auxiliary task is clean-speech generation. This auxiliary task
is trained in addition to the main speech recognition task and its goal
is to help improve the results of the main task. In this paper, we inves-
tigate this idea further by generating features extracted directly from
the audio file containing only the noise, instead of the clean-speech. Af-
ter demonstrating that an improvement can be obtained through this
multi-task learning auxiliary task, we also show that using both noise
and clean-speech estimation auxiliary tasks leads to a 4% relative word
error rate improvement in comparison to the classic single-task learning
on the CHiME4 dataset.
Scopus citations®
without self-citations
1