Abstract
With the recent attention towards audio processing in the time-frequency domain we increasingly encounter the problem of missing data within that representation. In this paper we present an approach that allows us to recover missing values in the time-frequency domain of audio signals. The presented approach is able to deal with real-world polyphonic signals by operating seamlessly even in the presence of complex acoustic mixtures. We demonstrate that this approach outperforms generic missing data approaches, and we present a variety of situations that highlight its utility.







Similar content being viewed by others
References
Raj, B. (2000). Reconstruction of incomplete spectrograms for robust speech recognition. Ph.D. Dissertation, Carnegie Mellon University.
Roweis, S. T. (2000). One microphone source separation (pp. 793–799). NIPS.
Brand, M. E. (2002). Incremental singular value decomposition of uncertain data with missing values, European conference on computer vision (ECCV) (Vol. 2350, pp. 707–720).
Reyes-Gomez, M. J., Jojic, N., & Ellis, D. P. W. (2004). Detailed graphical models for source separation and missing data interpolation in audio. Utah: Snowbird Learning Workshop Snowbird.
Le Roux, J, Kameoka, H., Ono, N., de Cheveigné, A., & Sagayama, S. (2008). Computational auditory induction by missing-data non-negative matrix factorization. Brisbane, Australia: SAPA.
Shashanka, M., Raj, B., & Smaragdis, P. (2000). Sparse overcomplete latent variable decomposition of counts data. NIPS.
Smith, J. O. (2007). Spectral audio signal processing. March 2007 draft. http://ccrma.stanford.edu/~jos/sasp/. Accessed June 2008.
David, M. H., Little, R. J. A., Samuhel, M. E., & Triest, R. K. (1983). Imputation methods based on the propensity to respond. In Proceedings of the business and economics section, American statistical association.
Quinlan, J. R. (1989). Unknown attribute values in induction. In Proc. of the sixth international conference on machine learning.
Ghaharamani, Z., & Jordan, M. I. (1994). Learning from incomplete data. Technical report AI Memo 1509. Artificial Intelligence Laboratory, MIT.
Raj, B., Seltzer, M. L., & Stern, R. M. (2004). Reconstruction of missing features for robust speech recognition. Speech Communication Journal, 43(4), 275–296.
Hastie, T., Tibshirani, R., Sherlock, G., Eisen, M., Brown, P., & Botstein, D. (1999). Imputing missing data for gene expression arrays. Technical report. Stanford Statistics Department.
Hofmann, T. (2000). Learning the similarity of documents: An information-geometric approach to document retrieval and categorization. In Advances in neural information processing systems (Vol. 12, pp. 914–920). Cambridge: MIT Press.
Hofmann, T., & Puzicha, J. (1998). Unsupervised learning from dyadic data. TR 98-042. Berkeley: ICSI.
Hazewinkel, M. Encyclopedia of mathematics. http://eom.springer.de/.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, B, 39, 1–38.
Griffin, D. W., & Lim, J. S. (1984). Signal reconstruction from short-time Fourier transform magnitude. IEEE Transactions of Acoustics, Speech, and Signal Processing, 32(2), 236–243.
Bouvrie, J., & Ezzat, T. (2006). An incremental algorithm for signal reconstruction from short-time Fourier transform magnitude, in interspeech. USA: Pittsburgh.
Gould, G. (1994). Bach: The two and three part inventions—the Glenn Gould edition, by SONY classics. ASIN B000GF2YZ8.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Smaragdis, P., Raj, B. & Shashanka, M. Missing Data Imputation for Time-Frequency Representations of Audio Signals. J Sign Process Syst 65, 361–370 (2011). https://doi.org/10.1007/s11265-010-0512-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-010-0512-7