Noise adaptive optimization of matrix initialization for frequency-domain independent component analysis

https://doi.org/10.1016/j.dsp.2012.08.010Get rights and content

Abstract

Initializing an unmixing matrix is an important problem in source separation since an objective function to be optimized is typically non-convex. In this paper, we consider the problem of two-source signal separation from a two-microphone array located on a mobile device, where a point source such as a speech signal is placed in front of the array, while no information is available about another interference signal. We propose a simple and computationally efficient method for estimating the geometry and source type (a point or diffuse) of the interference signal, which allows us to adaptively choose a suitable unmixing matrix initialization scheme. Our proposed method, noise adaptive optimization of matrix initialization (NAOMI), is shown to be effective through source separation simulations.

Section snippets

Makoto Yamada received the M.S. degree in electrical engineering from Colorado State University, Fort Collins, in 2005 and the Ph.D. degree in statistical science from The Graduate Universities of Advanced Studies, Tokyo, in 2010. He has held positions as a systems engineer for Hitachi corporation from 2005 to 2007, as a researcher with Yamaha corporation from 2007 to 2010, and as a postdoctoral fellow with Tokyo Institute of Technology. He has held an internship appointment in computer vision

References (21)

  • P. Smaragdis

    Blind separation of convolved mixtures in the frequency domain

    Neurocomput.

    (1998)
  • S. Ikeda, N. Murata, A method of ICA in time-frequency domain, in: Proceedings of International Workshop on Independent...
  • H. Saruwatari et al.

    Blind source separation combining independent component analysis and beamforming

    EURASIP J. Appl. Signal Process.

    (2003)
  • Y. Mori et al.

    Blind separation of acoustic signals combining simo-model-based independent component analysis and binary masking

    EURASIP J. Appl. Signal Process.

    (2006)
  • Y. Mori, H. Saruwatari, T. Takatani, K. Shikano, T. Hiekata, T. Morita, ICA and binary-mask-based blind source...
  • T. Hiekata, T. Morita, Y. Ikeda, H. Hashimoto, R. Zhang, Y. Takahashi, H. Saruwatari, K. Shikano, Multiple ICA-based...
  • H. Sawada et al.

    Frequency Domain Blind Source Separation

    (2005)
  • L. Parra, C. Alvino, Geometric source separation: Merging convolutive source separation with geometric beamforming, in:...
  • G.W. Taylor, M.L. Seltzer, A. Acero, Maximum a posteriori ICA: Applying prior knowledge to the separation of acoustic...
  • H. Sawada et al.

    A robust and precise method for solving the permutation problem of frequency-domain blind source separation

    IEEE Trans. Speech Audio Process.

    (2004)
There are more references available in the full text version of this article.

Cited by (0)

Makoto Yamada received the M.S. degree in electrical engineering from Colorado State University, Fort Collins, in 2005 and the Ph.D. degree in statistical science from The Graduate Universities of Advanced Studies, Tokyo, in 2010. He has held positions as a systems engineer for Hitachi corporation from 2005 to 2007, as a researcher with Yamaha corporation from 2007 to 2010, and as a postdoctoral fellow with Tokyo Institute of Technology. He has held an internship appointment in computer vision and machine learning at Carnegie Mellon University and Disney Research Pittsburgh. Currently, he is a research associate at NTT Communication Science Laboratories. His research interests include machine learning and its application to signal processing and computer vision.

Gordon Wichern received the B.S. and M.S. degrees in electrical engineering from Colorado State University, Fort Collins, in 2004 and 2006, respectively, and the Ph.D. degree in electrical engineering from Arizona State University, Tempe, in 2010, where he was supported by a National Science Foundation (NSF) Integrative Graduate Education and Research Traineeship (IGERT) in arts, media and engineering. He is currently a member of the technical staff in the Advanced RF Sensing and Exploitation group at MIT Lincoln Laboratory. His primary research interests include signal processing, machine learning, and information retrieval. He has held internship appointments in computational finance at SAP Labs and music information retrieval at the Yamaha Center for Advanced Sound Technologies.

Kazunobu Kondo was born in Aichi, Japan, on January 21, 1969. He received the B.E. and M.E. degrees from Nagoya University, Nagoya, Japan, in 1991 and 1993, respectively. He joined the Electronics Development Center, Yamaha Co., Ltd., Shizuoka, Japan, in 1993, where he conducted research and development on coding system for the musical sound sources. He is currently a Program Manager of Corporate Research and Development Center, Yamaha Corporation, Shizuoka, Japan. His research interests include array signal processing, blind source separation, and noise reduction. Mr. Kondo is a member of the IEICE and the Acoustical Society of Japan.

Masashi Sugiyama was born in Osaka, Japan, in 1974. He received the degrees of Bachelor of Engineering, Master of Engineering, and Doctor of Engineering in Computer Science from Tokyo Institute of Technology, Japan in 1997, 1999, and 2001, respectively. In 2001, he was appointed as Assistant Professor in the same institute, and from 2003, he is Associate Professor. He received Alexander von Humboldt Foundation Research Fellowship and stayed at Fraunhofer Institute, Berlin, Germany, from 2003 to 2004. In 2006, he received European Commission Program Erasmus Mundus Scholarship and stayed at University of Edinburgh, Edinburgh, UK. He was awarded Faculty Award from IBM in 2007 for his contribution to machine learning under nonstationarity. His research interest includes theories and algorithms of machine learning and data mining, and a wide range of applications such as signal processing, image processing, and robot control.

Hiroshi Sawada received the B.E., M.E., and Ph.D. degrees in information science from Kyoto University, Kyoto, Japan, in 1991, 1993, and 2001, respectively. He joined NTT Corporation in 1993. He is now the Group Leader of Learning and Intelligent Systems Research Group at the NTT Communication Science Laboratories, Kyoto, Japan. His research interests include statistical signal processing, audio source separation, array signal processing, machine learning, latent variable model, graph-based data structure, and computer architecture.

From 2006 to 2009, he served as an associate editor of the IEEE TRANSACTIONS ON AUDIO, SPEECH AND LANGUAGE PROCESSING. He is a member of the Audio and Acoustic Signal Processing Technical Committee of the IEEE Signal Processing Society. He received the Ninth TELECOM System Technology Award for Student from the Telecommunications Advancement Foundation in 1994, the Best Paper Award of the IEEE Circuits and System Society in 2000, and the MLSP Data Analysis Competition Award in 2007. Dr. Sawada is a member of the IEICE and the ASJ.

View full text