research-article

A Probabilistic Ranking Model for Audio Stream Retrieval

Authors:
YoungHoon Jung

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

,
Jaehwan Koo

I-Yuno Media Group, Burbank, CA, USA

I-Yuno Media Group, Burbank, CA, USA
View Profile

,
Karl Stratos

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

,
Luca P. Carloni

Columbia University, New York, NY, USA

Columbia University, New York, NY, USA
View Profile

MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal InteractionJune 2016Pages 33–38https://doi.org/10.1145/2927006.2927013

Published:06 June 2016Publication History

MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction

Pages 33–38

ABSTRACT

In Audio Stream Retrieval (ASR) systems, clients periodically query an audio database with an audio segment taken from the input audio stream to keep track of the flow of the stream in the original content sources or to compare two differently edited streams. We recently developed a series of ASR applications such as broadcast monitoring systems, automatic caption fetching systems, and automatic media edit tracking systems. Based on this experience, we propose a probabilistic ranking model designed for ASR systems. In order to train and test the model, we create a new set of audio streams and make it publicly available. Our experiments with these new streams confirm that the proposed ranking model works effectively with the retrieved results and reduces the errors when used in various ASR applications.

References

Google Play Sound Search: goo.gl/ahpvyO.Google Scholar
M. Bartsch and G. Wakefield. Audio thumbnailing of popular music using chroma-based representations. Trans. on Multimedia, 7(1):96--104, Feb. 2005. Google ScholarDigital Library
L. E. Baum and T. Petrie. Statistical inference for probabilistic functions of finite state markov chains. The Annals of Mathematical Statistics, 37(6):1554--1563, 12 1966.Google ScholarCross Ref
N. Bertin and A. D. Cheveigné. Scalable metadata and quick retrieval of audio signals. In Proc. of the Int. Conf. on Music Info. Retrieval, pages 238--244, Sept. 2005.Google Scholar
R. Cai et al. Scalable music recommendation by search. In Proc. of the Int. Conf. on Multimedia, pages 1065--1074, Sept. 2007. Google ScholarDigital Library
P. Cano et al. A review of algorithms for audio fingerprinting. In Workshop on Multimedia Signal Proc., pages 169--173, Dec. 2002.Google ScholarCross Ref
A. L. chun Wang. An industrial-strength audio search algorithm. In Proc. of the Int. Conf. on Music Info. Retrieval, Oct. 2003.Google Scholar
M. Fink et al. Mass personalization: social and interactive applications using sound-track identification. Multimedia Tools and App., 36(1--2):115--132, 2008. Google ScholarDigital Library
J. Foote. An overview of audio information retrieval. Multimedia Syst., 7(1):2--10. Google ScholarDigital Library
R. Gray. Vector quantization. Acoustics, Speech, and Signal Proc. Magazine, 1(2):4--29, Apr. 1984.Google ScholarCross Ref
C. Herley. Accurate repeat finding and object skipping using fingerprints. In Proc. of the Int. Conf. on Multimedia, pages 656--665, Nov. 2005. Google ScholarDigital Library
K. S. Jones et al. A probabilistic model of information retrieval: Development and comparative experiments. Info. Proc. and Management, 36(6):779--808, Nov. 2000. Google ScholarDigital Library
Y. Ke et al. Computer vision for music identification. In Proc. of the Conf. on Comp. Vision and Pattern Recog., pages 597--604, June 2005. Google ScholarDigital Library
W. Li et al. Robust audio identification for MP3 popular music. In Proc. of the Int. Conf. on Research and Dev. in Info. Retrieval, pages 627--634, July 2010. Google ScholarDigital Library
T. Y. Liu. Learning to rank for information retrieval. Foundations and Trends in Info. Retrieval, (3):225--331, 2009. Google ScholarDigital Library
D. Mitrovic, M. Zeppelzauer, and C. Breiteneder. Features for content-based audio retrieval. Advances in Comp.: Improving the Web, pages 71--150, Mar. 2010.Google ScholarCross Ref
A. Poritz. Hidden Markov models: a guided tour. In Proc. of the Int. Conf. on Acoustics, Speech, and Signal Proc., pages 7--13, Apr. 1988.Google ScholarCross Ref
S. Robertson and H. Zaragoza. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Info. Retrieval, 3(4):333--389, Apr. 2009. Google ScholarDigital Library
D. Stowell and M. D. Plumbley. An open dataset for research on audio field recording archives: freefield1010. In Proc. of the Int. Conf. on Semantic Audio, Jan. 2014.Google Scholar
R. Stratonovich. Conditional markov processes. Theory of Probability & Its Applications, 5(2):156--178, 1960.Google ScholarCross Ref
J. Tejedor et al. Comparison of methods for language-dependent and language-independent query-by-example spoken term detection. Trans. on Inf. Syst., 30(3):18:1--18:34, Sept. 2012. Google ScholarDigital Library
A. Velivelli, C. Zhai, and T. Huang. Audio segment retrieval using a short duration example query. In Proc. of the Int. Conf. on Multimedia and Expo, volume 3, pages 1603--1606, June 2004.Google ScholarCross Ref
M. M. Zloof. Query by example. In Proc. of the National Comp. Conf. and Expo., pages 431--438, May 1975. Google ScholarDigital Library

Index Terms

A Probabilistic Ranking Model for Audio Stream Retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking
      1. Learning to rank
    2. Specialized information retrieval
      1. Multimedia and multimodal retrieval
        Speech / audio search
2. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic reasoning algorithms
      1. Kalman filters and hidden Markov models

Recommendations

Statistical conversion of silent articulation into audible speech using full-covariance HMM

Conversion of silent articulation captured by ultrasound and video to modal speech.Comparison of GMM and full-covariance phonetic HMM without vocabulary limitation.HMM-based approach allows the use of linguistic information for regularization.Objective ...
Read More
Robust arabic multi-stream speech recognition system in noisy environment
ICISP'12: Proceedings of the 5th international conference on Image and Signal Processing

In this paper, the framework of multi-stream combination has been explored to improve the noise robustness of automatic speech recognition systems. The main important issues of multi-stream systems are which features representation to combine and what ...
Read More
Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM

In this paper, we propose a robust speaker recognition method based on position-dependent Cepstral Mean Normalization (CMN) to compensate for the channel distortion depending on the speaker position. In the training stage, the system measures the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction
June 2016
46 pages
ISBN:9781450343626
DOI:10.1145/2927006
General Chairs:
Stefanos Vrochidis
CERTH-ITI, Greece
,
Leo Wanner
ICREA-UPF, Spain
,
Elisabeth André
University of Augsburg, Germany
,
Stephanie Elzer Schwartz
Millersville University, USA
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 6 June 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
HMM
audio stream retrieval
probabilistic ranking
Qualifiers
- research-article
Conference

Acceptance Rates
MARMI '16 Paper Acceptance Rate6of7submissions,86%Overall Acceptance Rate6of7submissions,86%
More
Upcoming Conference
MM '24

Sponsor:

sigmm

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 66
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

A Probabilistic Ranking Model for Audio Stream Retrieval

MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Statistical conversion of silent articulation into audible speech using full-covariance HMM

Robust arabic multi-stream speech recognition system in noisy environment

Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

A Probabilistic Ranking Model for Audio Stream Retrieval

MARMI '16: Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction

ABSTRACT

References

Cited By

Index Terms

Recommendations

Statistical conversion of silent articulation into audible speech using full-covariance HMM

Robust arabic multi-stream speech recognition system in noisy environment

Robust distant speaker recognition based on position-dependent CMN by combining speaker-specific GMM with speaker-adapted HMM

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media