Structural and Semantic Modeling of Audio for Content-Based Querying and Browsing

Sert, Mustafa; Baykal, Buyurman; Yazıcı, Adnan

doi:10.1007/11766254_27

Mustafa Sert^23,24,
Buyurman Baykal²⁵ &
Adnan Yazıcı²⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4027))

Included in the following conference series:

International Conference on Flexible Query Answering Systems

539 Accesses
1 Citations

Abstract

A typical content-based audio management system deals with three aspects namely audio segmentation and classification, audio analysis, and content-based retrieval of audio. In this paper, we integrate the three aspects of content-based audio management into a single framework and propose an efficient method for flexible querying and browsing of auditory data. More specifically, we utilize two robust feature sets namely MPEG-7 Audio Spectrum Flatness (ASF) and Mel Frequency Cepstral Coefficients (MFCC) as the underlying features in order to improve the content-based retrieval accuracy, since both features have some advantages for distinct types of audio (e.g., music and speech). The proposed system provides a wide range of opportunities to query and browse an audio data by content, such as querying and browsing for a chorus section, sound effects, and query-by-example. In addition, the clients can express their queries in the form of point, range, and k-nearest neighbor, which are particularly significant in the multimedia domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Efficient audio-driven multimedia indexing through similarity-based speech / music discrimination

Article 10 January 2017

Sound Sharing and Retrieval

Introduction to Sound Scene and Event Analysis

References

Aigrain, P., Zhang, H., Petkovic, D.: Content-based Representation and Retrieval of Visual Media: A State-of-the-art Review. Multimedia Tools and Applications 3(3), 179–202 (1996)
Article Google Scholar
Chai, W., Vercoe, B.: Music Thumbnailing via Structural Analysis. In: Proceedings of ACM Multimedia Conference (2003)
Google Scholar
Bartsch, M., Wakefield, G.: Audio Thumbnailing of Popular Music Using Croma-Based Representations. IEEE Transactions on Multimedia 7(1), 96–104 (2005)
Article Google Scholar
Bach, J.R.: The Virage Image Search Engine: An Open Framework for Image Management. In: Proceedings of SPIE 1996, San Jose, California (1996)
Google Scholar
Niblack, W., Zhu, X., et al.: Updates to the QBIC System. In: Proceedings of SPIE 1998, San Jose, California (1998)
Google Scholar
Amato, G., Mainetto, G., Savino, P.: An Approach to a Content-Based Retrieval of Multimedia Data. Multimedia Tools and Applications 7(1/2), 9–36 (1998)
Article Google Scholar
Wold, E., Blum, T., Keislar, D., et al.: Content-based Classification, Search, and Retrieval of Audio. IEEE Multimedia 3(3), 27–36 (1996)
Article Google Scholar
Foote, J.: Content-based Retrieval of Music and Audio. In: Proc. of SPIE 1997 (1997)
Google Scholar
Zhang, T., Jay Kuo, C.-C.: Content-based Classification and Retrieval of Audio. In: Proceedings of SPIE 1998, San Diego (1998)
Google Scholar
Lu, L., Jiang, H., Zhang, H.: A Robust Audio Classification and Segmentation Method. In: Proc. of the 9th ACM Int. Conf. on Multimedia, Ottawa Canada (2001)
Google Scholar
Tzanetakis, G., Cook, P.: Multifeature Audio Segmentation for Browsing and Annotation. In: IEEE WASPAA conference, New Paltz, NY (1999)
Google Scholar
Pfeiffer, S.: Pause Concepts for Audio Segmentation at Different Semantic Levels. In: Proc. of the 9th ACM Int. Conf. on Multimedia, Ottawa, Canada (2001)
Google Scholar
Chai, W., Vercoe, B.: Structural Analysis of Musical Signals for Indexing and Thumbnailing. In: Proceedings of the 3rd ACM/IEEE-CS joint Conference on Digital Libraries, Houston Texas (2003)
Google Scholar
Cooper, M., Foote, J.: Summarizing Popular Music via Structural Similarity Analysis. In: IEEE WASPAA conference, New Paltz, NY (2003)
Google Scholar
Goto, M.: A Chorus-Section Detecting Method for Musical Audio Signals. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Hong Kong, China (2003)
Google Scholar
MPEG-7.: Information Technology - Multimedia Content Description Interface - Part 4: Audio ISO/IEC JTC 1/SC 29/WG 11 (2000)
Google Scholar
Lloyd, S.P.: Least Squares Quantization in PCM. IEEE Transaction on Information Theory IT-2, 129–137 (1982)
Article MathSciNet Google Scholar
Wellhausen, J., Crysandt, H.: Temporal Audio Segmentation using MPEG-7 Descriptors. In: Proceedings of SPIE, Santa Clara (CA), USA (2003)
Google Scholar
Xu, C., Zhu, Y., Tian, Q.: Automatic Music Summarization Based on Temporal, Spectral and Cepstral Features. In: Proceedings of IEEE ICME 2002 (2002)
Google Scholar
Lu, L., Wang, M., Zhang, H.J.: Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data (MIR 2004), New York USA (2004)
Google Scholar
Lu, G.: Indexing and Retrieval of Audio: A Survey. Multimedia Tools and Applications 15(3), 269–290 (2001)
Article MATH Google Scholar
Sert, M., Baykal, B., Yazıcı, A.: Generating Expressive Summaries for Speech and Musical Audio using Self-similarity Clues. In: Proc. of IEEE ICME 2006, Toronto, Ontario Canada (to appear, 2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Başkent University, 06530, Ankara, Turkey
Mustafa Sert
Faculty of Technical Education, Department of Electronics, and Computer Education, Gazi University, 06500, Ankara, Turkey
Mustafa Sert
Department of Electrical and Electronics Engineering, Middle East Technical University, 06531, Ankara, Turkey
Buyurman Baykal
Department of Computer Engineering, Middle East Technical University, 06531, Ankara, Turkey
Adnan Yazıcı

Authors

Mustafa Sert
View author publications
You can also search for this author in PubMed Google Scholar
Buyurman Baykal
View author publications
You can also search for this author in PubMed Google Scholar
Adnan Yazıcı
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The European Center for Counterterrorism Research and Studies Department of Computer Science and Engineering, Aalborg University, DK-6700, Esbjerg, Denmark
Henrik Legind Larsen
DISCO, University Milano-Bicocca, Milano, Italy
Gabriella Pasi
Computer Science Department, Aalborg University Esbjerg, Niels Bohrs Vej 8, 6700, Esbjerg, Denmark
Daniel Ortiz-Arroyo
Department of Computer Science, Roskilde University, P.O. Box 260, DK-4000, Roskilde, Denmark
Troels Andreasen
Research group PLIS: Programming, Logic and Intelligent Systems, Department of Communication, Business and Information Technologies, Roskilde University, P.O. Box 260, DK-4000, Roskilde, Denmark
Henning Christiansen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sert, M., Baykal, B., Yazıcı, A. (2006). Structural and Semantic Modeling of Audio for Content-Based Querying and Browsing. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds) Flexible Query Answering Systems. FQAS 2006. Lecture Notes in Computer Science(), vol 4027. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11766254_27

Download citation

DOI: https://doi.org/10.1007/11766254_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34638-8
Online ISBN: 978-3-540-34639-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics