Structuring and Querying Documents in an Audio Database Management System

Lutfi, R.; Gelgon, M.; Martinez, J.

doi:10.1023/B:MTAP.0000036839.24141.9e

Structuring and Querying Documents in an Audio Database Management System

Published: November 2004

Volume 24, pages 105–123, (2004)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

R. Lutfi¹,
M. Gelgon¹ &
J. Martinez¹

95 Accesses
2 Citations
Explore all metrics

Abstract

The growing digitization of multimedia content must be supported by a set of tools to manipulate them, and especially to query them. This is one of the major goals of an audio DBMS. Yet, existing work related to audio documents, e.g., radio or television archives, often leave the DBMS question open. In this paper, we lay the foundations for integrating audio into a general purpose DBMS, in the form of an audio abstract data type, along with its properties and associated operators. This contribution is coupled with an unsupervised statistically-founded speaker-based partitioning technique. For each of these two aspects, the paper underlines the practical interest and some technical difficulties. Also, some query examples introduce the problem of the complexity of the querying expressions as well as of time complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Searching in the Structured Space of the Braille Music

Sound Sharing and Retrieval

Calculating Fourier Transforms in SQL

References

J.F. Allen, "Maintaining knowledge about temporal intervals," Communications of the ACM, Vol. 26, No. 11, pp. 832–843, 1983.
Google Scholar
J.R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Humphrey, R.C. Jain, and C. Shu, "Virage image search engine: An open framework for image management," in International Symposium on Electronic Imaging (IE'96): Science and Technology, Storage & Retrieval for Image and Video Databases IV, 1996, pp. 76–87.
S. Berchtold, D.A. Keim, and H.-P. Kriegel, "The X-tree: An index structure for high-dimensional data," in, Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), Mumbai (Bombay), India, 1996, pp. 28–39.
C. Biernacki, G. Celeux, and G. Govaert, "Strategies for getting the highest likelihood in mixture models," Technical Report 4255, INRIA, 2001.
C. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1995.
J.P. Campbell, "Speaker recognition: A tutorial," Proceedings of the IEEE, Vol. 85, No. 9, pp. 1437–1462, 1997.
Google Scholar
B.P. Carlin and T.A. Louis, Bayes and Empirical Bayes Methods for Data Analysis, Chapman and Hall-CRC, parag. 2.3.3, 2000.
R.G.G. Cattel, D.K. Barry, D. Bartels, M. Berler, J. Eastman, S. Gamerman, D. Jordan, Springer, and H.S.D. Wade, The Object Database Standard: ODMG 2.0, Morgan Kaufmann, 1997, p. 270.
D. Chickering and D. Heckerman, "Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables," Technical Report MSR-TR-96-08, Microsoft Research, 1996.
A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society Ser. B, Vol. 39, pp. 1–38, 1977.
Google Scholar
M. Figueiredo and A.K. Jain, "Unsupervised learning of finite mixtures," IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 24, No. 3, pp. 381–396, 2002.
Google Scholar
M. Flickner, H. Sawhnery, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, "Query by image and video content: The QBIC system," IEEE Computer Vol. 2, No. 4, pp. 23–32, 1995.
Google Scholar
J.-L. Gauvain, L. Lamel, and G. Adda, "Partitioning and transcription of broadcast news data," in International Conference on Spoken Language Processsing (ICSLP'98), Vol. 5. Sydney, Australia, 1998, pp. 1335–1338.
Google Scholar
P. Herrera and X. Serra, "A proposal for the description of audio in the context of MPEG-7," in 1st European Workshop on Content-based Multimedia Indexing (CBMI'99), Toulouse, France, 1999, pp. 81–88.
M.J. Hu and Y. Jian, "Multimedia description framework (MDF) for content description of audio/video documents," in 4th ACM International Conference on Digital Libraries (ICDL'99), Berkeley, California, ACM Press, 1999, pp. 67–75.
Google Scholar
F. Jelinek, Statistical Methods for Speech Recognition, MIT Press, 2000.
L. Kahn and D. McLeod, "Audio structuring and personalized retrieval using ontologies," in Proceedings of IEEE Advances in Digital Libraries (ADL'00), 2000, pp. 116–126.
R. Lutfi, J. Martinez, and M. Gelgon, "Manipulating audio into a DBMS," in Proceedings of the 8th International Conference on Multimedia Modeling (MMM'01), Amsterdam, The Netherlands, 2001, pp. 91–106.
D. MacKay, "Bayesian interpolation," Neural computation, Vol. 4, No. 3, pp. 415–447, 1992.
Google Scholar
J. Martinez and S. Guillaume, "Colour image retrieval fitted to 'Classical' Querying," Networking and Information Systems Journal (NISJ), Vol. 1, Nos. 2/3, pp. 251–278, 1998.
Google Scholar
J. Martinez, R. Lutfi, and M. Gelgon, "An object-oriented schema for querying audio," in Proceedings of the 8th International Conference on Object-Oriented Information Systems (OOIS'02), Vol. 2425 of LNCS, Montpellier, France, 2002, pp. 67–76
Google Scholar
F. Nack and A. Lindsay, "Everything you wanted to know about MPEG-7: Part 1," IEEE Multimedia, Vol. 6, No. 3, pp. 65–77, 1999a.
Google Scholar
F. Nack and A. Lindsay, "Everything you wanted to know about MPEG-7: Part 2," IEEE Multimedia, Vol. 6, No. 4, pp. 64–73, 1999b.
Google Scholar
J. Picone, "Signal modeling techniques in speech recognition," Proceedings of IEEE, Vol. 81, No. 9, pp. 1214–1247, 1993.
Google Scholar
A. Raftery, "Bayes factor and BIC: Comment on Weakliem," Technical Report 347, Dpt of statistics, Univ. of Washington, 1998.
G. Schwarz, "Estimating the dimension of a model," Annals of Statistics, Vol. 6, pp. 461–464, 1978.
Google Scholar
A. Sheth and W. Klas (Eds.), Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, Series on Data Warehousing and Data Management, McGraw-Hill, 1998, p. 384.
A.U. Tansel, J. Clifford, and S.E.A. Gadia, Temporal Databases: Theory, Design, and Implementation, Addison-Wesley, 1993, p. 656.
A. Turk, S.E. Johnson, P. Jourlin, P. Sparck-Jones, and P.C. Woodland, "The Cambridge University multimedia document retrieval demo system," in Proceedings of the ACM International Conference on Information Retrieval (SIGIR'00), Athens, Greece, 2000, p. 394.
H. Wactlar, A. Hauptmann, and M. Witbrock, "Informedia: News-on-demand experiments in speech recognition," in Proceedings of the ARPA Speech Recognition Workshop. Arden House, Harriman, USA, 1996.
E. Wold, T. Blum, D. Keislar, and J. Wheaton, "Content-based classification, search, and retrieval of audio," IEEE Multimedia, Vol. 3, No. 3, pp. 27–36, 1996.
Google Scholar
T. Zhang and C.-C.J. Kuo, "Heuristic approach for generic audio data segmentation and annotation," in Proceedings of the 7th ACM International Multimedia Conference (ACM MM'99) Orlando, Florida, 1999, pp. 67–76.

Download references

Author information

Authors and Affiliations

ATLAS Group, INRIA & IRIN, École polytechnique de l'université de Nantes, La Chantrerie, B.P. 50609, 44306, Nantes Cedex 3, France
R. Lutfi, M. Gelgon & J. Martinez

Authors

R. Lutfi
View author publications
You can also search for this author in PubMed Google Scholar
M. Gelgon
View author publications
You can also search for this author in PubMed Google Scholar
J. Martinez
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lutfi, R., Gelgon, M. & Martinez, J. Structuring and Querying Documents in an Audio Database Management System. Multimedia Tools and Applications 24, 105–123 (2004). https://doi.org/10.1023/B:MTAP.0000036839.24141.9e

Download citation

Issue Date: November 2004
DOI: https://doi.org/10.1023/B:MTAP.0000036839.24141.9e

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structuring and Querying Documents in an Audio Database Management System

Abstract

Access this article

Similar content being viewed by others

Searching in the Structured Space of the Braille Music

Sound Sharing and Retrieval

Calculating Fourier Transforms in SQL

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Structuring and Querying Documents in an Audio Database Management System

Abstract

Access this article

Similar content being viewed by others

Searching in the Structured Space of the Braille Music

Sound Sharing and Retrieval

Calculating Fourier Transforms in SQL

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation