ABSTRACT
In Audio Stream Retrieval (ASR) systems, clients periodically query an audio database with an audio segment taken from the input audio stream to keep track of the flow of the stream in the original content sources or to compare two differently edited streams. We recently developed a series of ASR applications such as broadcast monitoring systems, automatic caption fetching systems, and automatic media edit tracking systems. Based on this experience, we propose a probabilistic ranking model designed for ASR systems. In order to train and test the model, we create a new set of audio streams and make it publicly available. Our experiments with these new streams confirm that the proposed ranking model works effectively with the retrieved results and reduces the errors when used in various ASR applications.
- Google Play Sound Search: goo.gl/ahpvyO.Google Scholar
- M. Bartsch and G. Wakefield. Audio thumbnailing of popular music using chroma-based representations. Trans. on Multimedia, 7(1):96--104, Feb. 2005. Google ScholarDigital Library
- L. E. Baum and T. Petrie. Statistical inference for probabilistic functions of finite state markov chains. The Annals of Mathematical Statistics, 37(6):1554--1563, 12 1966.Google ScholarCross Ref
- N. Bertin and A. D. Cheveigné. Scalable metadata and quick retrieval of audio signals. In Proc. of the Int. Conf. on Music Info. Retrieval, pages 238--244, Sept. 2005.Google Scholar
- R. Cai et al. Scalable music recommendation by search. In Proc. of the Int. Conf. on Multimedia, pages 1065--1074, Sept. 2007. Google ScholarDigital Library
- P. Cano et al. A review of algorithms for audio fingerprinting. In Workshop on Multimedia Signal Proc., pages 169--173, Dec. 2002.Google ScholarCross Ref
- A. L. chun Wang. An industrial-strength audio search algorithm. In Proc. of the Int. Conf. on Music Info. Retrieval, Oct. 2003.Google Scholar
- M. Fink et al. Mass personalization: social and interactive applications using sound-track identification. Multimedia Tools and App., 36(1--2):115--132, 2008. Google ScholarDigital Library
- J. Foote. An overview of audio information retrieval. Multimedia Syst., 7(1):2--10. Google ScholarDigital Library
- R. Gray. Vector quantization. Acoustics, Speech, and Signal Proc. Magazine, 1(2):4--29, Apr. 1984.Google ScholarCross Ref
- C. Herley. Accurate repeat finding and object skipping using fingerprints. In Proc. of the Int. Conf. on Multimedia, pages 656--665, Nov. 2005. Google ScholarDigital Library
- K. S. Jones et al. A probabilistic model of information retrieval: Development and comparative experiments. Info. Proc. and Management, 36(6):779--808, Nov. 2000. Google ScholarDigital Library
- Y. Ke et al. Computer vision for music identification. In Proc. of the Conf. on Comp. Vision and Pattern Recog., pages 597--604, June 2005. Google ScholarDigital Library
- W. Li et al. Robust audio identification for MP3 popular music. In Proc. of the Int. Conf. on Research and Dev. in Info. Retrieval, pages 627--634, July 2010. Google ScholarDigital Library
- T. Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Info. Retrieval, (3):225--331, 2009. Google ScholarDigital Library
- D. Mitrovic, M. Zeppelzauer, and C. Breiteneder. Features for content-based audio retrieval. Advances in Comp.: Improving the Web, pages 71--150, Mar. 2010.Google ScholarCross Ref
- A. Poritz. Hidden Markov models: a guided tour. In Proc. of the Int. Conf. on Acoustics, Speech, and Signal Proc., pages 7--13, Apr. 1988.Google ScholarCross Ref
- S. Robertson and H. Zaragoza. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Info. Retrieval, 3(4):333--389, Apr. 2009. Google ScholarDigital Library
- D. Stowell and M. D. Plumbley. An open dataset for research on audio field recording archives: freefield1010. In Proc. of the Int. Conf. on Semantic Audio, Jan. 2014.Google Scholar
- R. Stratonovich. Conditional markov processes. Theory of Probability & Its Applications, 5(2):156--178, 1960.Google ScholarCross Ref
- J. Tejedor et al. Comparison of methods for language-dependent and language-independent query-by-example spoken term detection. Trans. on Inf. Syst., 30(3):18:1--18:34, Sept. 2012. Google ScholarDigital Library
- A. Velivelli, C. Zhai, and T. Huang. Audio segment retrieval using a short duration example query. In Proc. of the Int. Conf. on Multimedia and Expo, volume 3, pages 1603--1606, June 2004.Google ScholarCross Ref
- M. M. Zloof. Query by example. In Proc. of the National Comp. Conf. and Expo., pages 431--438, May 1975. Google ScholarDigital Library
Index Terms
- A Probabilistic Ranking Model for Audio Stream Retrieval
Recommendations
Statistical conversion of silent articulation into audible speech using full-covariance HMM
Conversion of silent articulation captured by ultrasound and video to modal speech.Comparison of GMM and full-covariance phonetic HMM without vocabulary limitation.HMM-based approach allows the use of linguistic information for regularization.Objective ...
Robust arabic multi-stream speech recognition system in noisy environment
ICISP'12: Proceedings of the 5th international conference on Image and Signal ProcessingIn this paper, the framework of multi-stream combination has been explored to improve the noise robustness of automatic speech recognition systems. The main important issues of multi-stream systems are which features representation to combine and what ...
Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM
In this paper, we propose a robust speaker recognition method based on position-dependent Cepstral Mean Normalization (CMN) to compensate for the channel distortion depending on the speaker position. In the training stage, the system measures the ...
Comments