CRTI - Centre de Recherche en Technologie de l'Information
Disciplines :
Library & information sciences
Author, co-author :
Wood, S. U. N.
Rouat, Jean
Dupont, Stéphane ; Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle
Pironkov, Gueorgui ; Université de Mons > Faculté Polytechnique > Information, Signal et Intelligence artificielle
Language :
English
Title :
Blind Speech Separation and Enhancement With GCC-NMF
Publication date :
01 April 2017
Journal title :
IEEE/ACM Transactions on Audio, Speech and Language Processing
ISSN :
2329-9290
Publisher :
Institute of Electrical and Electronics Engineers (IEEE), New York, United States - New York
Peer reviewed :
Peer Reviewed verified by ORBi
Research unit :
F105 - Information, Signal et Intelligence artificielle
Research institute :
R300 - Institut de Recherche en Technologies de l'Information et Sciences de l'Informatique R450 - Institut NUMEDIART pour les Technologies des Arts Numériques
S. U. N. Wood and J. Rouat, "Blind speech separation with GCC-NMF," in Proc. Interspeech Conf., Sep. 2016, pp. 3329-3333.
E. C. Cherry, "Some experiments on the recognition of speech, with one and with two ears," J. Acoust. Soc. Amer., vol. 25, no. 5, pp. 975-979, 1953.
S. Haykin and Z. Chen, "The cocktail party problem," Neural Comput., vol. 17, no. 9, pp. 1875-1902, 2005.
D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, vol. 401, no. 6755, pp. 788-791, 1999.
D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization," in Proc. Adv. Neural Inf. Process. Syst., 2001, pp. 556-562.
N. Ono, Z. Koldovsky, S. Miyabe, and N. Ito, "The 2013 signal separation evaluation campaign," in Proc. Int. Workshop Mach. Learn. Signal Process., 2013, pp. 1-6.
N. Ono, Z. Rafii, D. Kitamura, N. Ito, and A. Liutkus, "The 2015 signal separation evaluation campaign," in Latent Variable Analysis and Signal Separation. Berlin, Germany: Springer, 2015, pp. 387-395.
T. Virtanen, J. F. Gemmeke, B. Raj, and P. Smaragdis, "Compositional models for audio processing: Uncovering the structure of sound mixtures," IEEE Signal Process. Mag., vol. 32, no. 2, pp. 125-144, Mar. 2015.
E. Vincent, N. Bertin, R. Gribonval, and F. Bimbot, "From blind to guided audio source separation: How models and side information can improve the separation of sound," IEEE Signal Process. Mag., vol. 31, no. 3, pp. 107-115, May 2014.
B. Wang and M. D. Plumbley, "Musical audio stream separation by non-negative matrix factorization," in Proc. DMRN Summer Conf., 2005, pp. 23-24.
M. N. Schmidt and R. K. Olsson, "Single-channel speech separation using sparse non-negative matrix factorization," in Proc. Int. Conf. Spoken Language Process., 2006, pp. 2614-2617.
A. Ozerov, E. Vincent, and F. Bimbot, "A general flexible framework for the handling of prior information in audio source separation," IEEE Trans. Audio, Speech, Language Process., vol. 20, no. 4, pp. 1118-1133, May 2012.
S. Arberet et al., "Nonnegative matrix factorization and spatial covariance model for under-determined reverberant audio source separation," in Proc. IEEE Int. Conf. Inf. Sci. Signal Process. Appl., May 2010, pp. 1-4.
K. Adiloglu and E. Vincent, "Variational Bayesian inference for source separation and robust feature extraction," Ph.D. dissertation, INRIA, France, 2012.
J. Traa, P. Smaragdis, N. D. Stein, and D. Wingate, "Directional NMF for joint source localization and separation," in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., Oct. 2015, pp. 1-5.
T. T. Vu, B. Bigot, and E. S. Chng, "Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge," in Proc. IEEE Workshop Autom. Speech Recognit. Understanding, Dec. 2015, pp. 423-429.
N. D. Stein, "Nonnegative tensor factorization for directional blind audio source separation," 2014. [Online] Available: http://arxiv. org/abs/1411.5010
C. Févotte and J. Idier, "Algorithms for nonnegative matrix factorization with the β-divergence," Neural Comput., vol. 23, no. 9, pp. 2421-2456, 2011.
J. Roux, F. Weninger, and J. Hershey, "Sparse NMF-half-baked or well done?" Mitsubishi Elect. Res. Lab. Cambridge, MA, USA, Tech. Rep. TR- 2015-023, 2015.
C. H. Knapp and G. C. Carter, "The generalized correlation method for estimation of time delay," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-24, no. 4, pp. 320-327, Aug. 1976.
X. Anguera, "Robust speaker diarization for meetings," Ph.D. dissertation, Universitat Polit'ecnica de Catalunya, Barcelona, Spain, 2006.
C. Blandin, A. Ozerov, and E. Vincent, "Multi-source TDOAestimation in reverberant audio using angular spectra and clustering," Signal Process., vol. 92, no. 8, pp. 1950-1960, 2012.
N. Q. Duong, E. Vincent, and R. Gribonval, "Under-determined reverberant audio source separation using a full-rank spatial covariance model," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 7, pp. 1830-1840, Sep. 2010.
J. L. Roux, J. R. Hershey, and F. Weninger, "Deep NMF for speech separation," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 2015, pp. 66-70.
V. Emiya, E. Vincent, N. Harlander, and V. Hohmann, "Subjective and objective quality assessment of audio source separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2046-2057, May 2011.
E. Vincent, R. Gribonval, and C. Févotte, "Performance measurement in blind audio source separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1462-1469, Jul. 2006.
[Online] Available: https://sisec.inria.fr/
Y. Salaün et al., "The flexible audio source separation toolbox version 2.0," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Florence, Italy, May 2014. [Online] Available: https://hal.inria.fr/hal-00957412
Z. Rafii and B. Pardo, "Online REPET-SIM for real-time speech enhancement," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2013, pp. 848-852.
L. Le Magoarou, A. Ozerov, and N. Q. Duong, "Text-informed audio source separation using nonnegative matrix partial co-factorization," in Proc. 2013 IEEE Int. Workshop Mach. Learn. Signal Process., Sep. 2013, pp. 1-6.
L. Wang, H. Ding, and F. Yin, "A region-growing permutation alignment approach in frequency-domain blind source separation of speech mixtures," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 3, pp. 549-557, Mar. 2011.
S. Araki et al., "The 2011 signal separation evaluation campaign (SiSEC2011): Audio source separation," in Latent Variable Analysis and Signal Separation. Berlin, Germany: Springer, 2012, pp. 414-422.
L. Wang, T. Gerkmann, and S. Doclo, "Noise power spectral density estimation using maxnsr blocking matrix," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 9, pp. 1493-1508, Sep. 2015.
V. P. Pauca, J. Piper, and R. J. Plemmons, "Nonnegative matrix factorization for spectral data analysis," Linear Algebra Appl., vol. 416, no. 1, pp. 29-47, 2006.
A. Mehmood, T. Damarla, and J. Sabatier, "Separation of human and animal seismic signatures using non-negative matrix factorization," Pattern Recognit. Lett., vol. 33, no. 16, pp. 2085-2093, 2012.
H. Lee and S. Choi, "Group nonnegative matrix factorization for EEG classification," in Proc. Int. Conf. Artif. Intell. Statist., 2009, pp. 320-327.
N. Kaboodvand, F. Towhidkhah, and S. Gharibzadeh, "Extracting and study of synchronous muscle synergies during fast arm reaching movements," in Proc. Iranian Conf. Biomed. Eng., Dec. 2013, pp. 155-160.