INVESTIGATING SPARSE DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION

Pironkov, Gueorgui; Dupont, Stéphane; Dutoit, Thierry

Request a copy

Paper published in a journal (Scientific congresses and symposiums)

INVESTIGATING SPARSE DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION

Pironkov, Gueorgui; Dupont, Stéphane; Dutoit, Thierry

2015

Permalink
https://hdl.handle.net/20.500.12907/41765

Files (1)Send to Details Statistics Bibliography Similar publications

Files

Full Text

pironkov.pdf

Author preprint (318.23 kB)

Request a copy

All documents in ORBi UMONS are protected by a user license.

Send to

RIS BibTex APA Chicago Permalink X Linkedin

Details

Keywords :

[en] Automatic speech recognition; [en] sparse; [en] deep neural network; [en] TIMIT

Abstract :

[en] We propose an organized sparse deep neural network architecture for automatic speech recognition. The proposed method is inspired by the tonotopic organization in the auditory nerve/cortex. The approach consists of limiting the neurons connections between the hidden layers, in a manner that preserves frequency proximity, resulting in a diffuse integration of the spectral information inside the neural network. This method is put in perspective with related work on sparser neural network architectures for speech recognition (tonotopy, convolutional nets, dropout). The model is trained and tested on the TIMIT database, showing encouraging results compared to the traditional fully connected architecture.

Disciplines :

Electrical & electronics engineering
Library & information sciences

Author, co-author :

Pironkov, Gueorgui ; Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle

Dupont, Stéphane ; Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle

Dutoit, Thierry ; Université de Mons > Faculté Polytechnique > Service Information, Signal et Intelligence artificielle

Language :

English

Title :

INVESTIGATING SPARSE DEEP NEURAL NETWORKS FOR SPEECH RECOGNITION

Publication date :

13 December 2015

Event name :

Automatic Speech Recognition & Understanding

Event place :

Event date :

2015

Research unit :

F105 - Information, Signal et Intelligence artificielle

Research institute :

R300 - Institut de Recherche en Technologies de l'Information et Sciences de l'Informatique
R450 - Institut NUMEDIART pour les Technologies des Arts Numériques

Available on ORBi UMONS :

since 04 January 2016

Statistics

Number of views

68 (0 by UMONS)

Number of downloads

0 (0 by UMONS)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

Bibliography

Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 82-97, 2012
Michael L Seltzer, Dong Yu, and YongqiangWang, "An investigation of deep neural networks for noise robust speech recognition," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 7398-7402
Warren S McCulloch and Walter Pitts, "A logical calculus of the ideas immanent in nervous activity," The bulletin of mathematical biophysics, vol. 5, no. 4, pp. 115-133, 1943
Nelson Yuan-Sheng Kiang, "Discharge patterns of single fibers in the cat's auditory nerve.," Tech. Rep., DTIC Document, 1965
Christo Pantev, Olivier Bertrand, Carsten Eulitz, Chantal Verkindt, S Hampson, Gerhard Schuierer, and Thomas Elbert, "Specific tonotopic organizations of different areas of the human auditory cortex revealed by simultaneous magnetic and electric recordings," Electroencephalography and clinical neurophysiology, vol. 94, no. 1, pp. 26-40, 1995
Hervé Bourlard and Sthéhane Dupont, "A mew asr approach based on independent processing and recombination of partial frequency bands," in Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on. IEEE, 1996, vol. 1, pp. 426-429
Nikko Strom, "A tonotopic artificial neural network architecture for phoneme probability estimation," in Automatic Speech Recognition and Understanding, 1997. Proceedings., 1997 IEEE Workshop on. IEEE, 1997, pp. 156-163
Honglak Lee, Peter Pham, Yan Largman, and Andrew Y Ng, "Unsupervised feature learning for audio classification using convolutional deep belief networks," in Advances in neural information processing systems, 2009, pp. 1096-1104
László Tóth, "Combining time-and frequency-domain convolution in convolutional neural network-based phone recognition," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014, pp. 190-194
Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu, "Convolutional neural networks for speech recognition," Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 22, no. 10, pp. 1533-1545, 2014
Misha Denil, Babak Shakibi, Laurent Dinh, Nando de Freitas, et al., "Predicting parameters in deep learning," in Advances in Neural Information Processing Systems, 2013, pp. 2148-2156
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich, "Going deeper with convolutions," arXiv preprint arXiv:1409.4842, 2014
Babak Hassibi and David G Stork, Second order derivatives for network pruning: Optimal brain surgeon, Morgan Kaufmann, 1993
George E Dahl, Tara N Sainath, and Geoffrey E Hinton, "Improving deep neural networks for lvcsr using rectified linear units and dropout," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 8609-8613
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting," The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014
John S Garofolo, Linguistic Data Consortium, et al., TIMIT acoustic-phonetic continuous speech corpus, Linguistic Data Consortium, 1993
Abdel-rahman Mohamed, Geoffrey Hinton, and Gerald Penn, "Understanding how deep belief networks perform acoustic modelling," in Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on. IEEE, 2012, pp. 4273-4276
George Saon, Mukund Padmanabhan, Ramesh Gopinath, and Scott Chen, "Maximum likelihood discriminant feature spaces," in Acoustics, Speech, and Signal Processing, 2000. ICASSP'00. Proceedings. 2000 IEEE International Conference on. IEEE, 2000, vol. 2, pp. 1129-1132
Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukáš Burget, Ondřej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlíček, Yanmin Qian, Petr Schwarz, et al., "The kaldi speech recognition toolkit," 2011
Geoffrey E Hinton and Ruslan R Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-507, 2006