Skip to main content
Log in

Structuring and Querying Documents in an Audio Database Management System

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The growing digitization of multimedia content must be supported by a set of tools to manipulate them, and especially to query them. This is one of the major goals of an audio DBMS. Yet, existing work related to audio documents, e.g., radio or television archives, often leave the DBMS question open. In this paper, we lay the foundations for integrating audio into a general purpose DBMS, in the form of an audio abstract data type, along with its properties and associated operators. This contribution is coupled with an unsupervised statistically-founded speaker-based partitioning technique. For each of these two aspects, the paper underlines the practical interest and some technical difficulties. Also, some query examples introduce the problem of the complexity of the querying expressions as well as of time complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. J.F. Allen, "Maintaining knowledge about temporal intervals," Communications of the ACM, Vol. 26, No. 11, pp. 832–843, 1983.

    Google Scholar 

  2. J.R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Humphrey, R.C. Jain, and C. Shu, "Virage image search engine: An open framework for image management," in International Symposium on Electronic Imaging (IE'96): Science and Technology, Storage & Retrieval for Image and Video Databases IV, 1996, pp. 76–87.

  3. S. Berchtold, D.A. Keim, and H.-P. Kriegel, "The X-tree: An index structure for high-dimensional data," in, Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), Mumbai (Bombay), India, 1996, pp. 28–39.

  4. C. Biernacki, G. Celeux, and G. Govaert, "Strategies for getting the highest likelihood in mixture models," Technical Report 4255, INRIA, 2001.

  5. C. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1995.

  6. J.P. Campbell, "Speaker recognition: A tutorial," Proceedings of the IEEE, Vol. 85, No. 9, pp. 1437–1462, 1997.

    Google Scholar 

  7. B.P. Carlin and T.A. Louis, Bayes and Empirical Bayes Methods for Data Analysis, Chapman and Hall-CRC, parag. 2.3.3, 2000.

  8. R.G.G. Cattel, D.K. Barry, D. Bartels, M. Berler, J. Eastman, S. Gamerman, D. Jordan, Springer, and H.S.D. Wade, The Object Database Standard: ODMG 2.0, Morgan Kaufmann, 1997, p. 270.

  9. D. Chickering and D. Heckerman, "Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables," Technical Report MSR-TR-96-08, Microsoft Research, 1996.

  10. A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society Ser. B, Vol. 39, pp. 1–38, 1977.

    Google Scholar 

  11. M. Figueiredo and A.K. Jain, "Unsupervised learning of finite mixtures," IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 24, No. 3, pp. 381–396, 2002.

    Google Scholar 

  12. M. Flickner, H. Sawhnery, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, "Query by image and video content: The QBIC system," IEEE Computer Vol. 2, No. 4, pp. 23–32, 1995.

    Google Scholar 

  13. J.-L. Gauvain, L. Lamel, and G. Adda, "Partitioning and transcription of broadcast news data," in International Conference on Spoken Language Processsing (ICSLP'98), Vol. 5. Sydney, Australia, 1998, pp. 1335–1338.

    Google Scholar 

  14. P. Herrera and X. Serra, "A proposal for the description of audio in the context of MPEG-7," in 1st European Workshop on Content-based Multimedia Indexing (CBMI'99), Toulouse, France, 1999, pp. 81–88.

  15. M.J. Hu and Y. Jian, "Multimedia description framework (MDF) for content description of audio/video documents," in 4th ACM International Conference on Digital Libraries (ICDL'99), Berkeley, California, ACM Press, 1999, pp. 67–75.

    Google Scholar 

  16. F. Jelinek, Statistical Methods for Speech Recognition, MIT Press, 2000.

  17. L. Kahn and D. McLeod, "Audio structuring and personalized retrieval using ontologies," in Proceedings of IEEE Advances in Digital Libraries (ADL'00), 2000, pp. 116–126.

  18. R. Lutfi, J. Martinez, and M. Gelgon, "Manipulating audio into a DBMS," in Proceedings of the 8th International Conference on Multimedia Modeling (MMM'01), Amsterdam, The Netherlands, 2001, pp. 91–106.

  19. D. MacKay, "Bayesian interpolation," Neural computation, Vol. 4, No. 3, pp. 415–447, 1992.

    Google Scholar 

  20. J. Martinez and S. Guillaume, "Colour image retrieval fitted to 'Classical' Querying," Networking and Information Systems Journal (NISJ), Vol. 1, Nos. 2/3, pp. 251–278, 1998.

    Google Scholar 

  21. J. Martinez, R. Lutfi, and M. Gelgon, "An object-oriented schema for querying audio," in Proceedings of the 8th International Conference on Object-Oriented Information Systems (OOIS'02), Vol. 2425 of LNCS, Montpellier, France, 2002, pp. 67–76

    Google Scholar 

  22. F. Nack and A. Lindsay, "Everything you wanted to know about MPEG-7: Part 1," IEEE Multimedia, Vol. 6, No. 3, pp. 65–77, 1999a.

    Google Scholar 

  23. F. Nack and A. Lindsay, "Everything you wanted to know about MPEG-7: Part 2," IEEE Multimedia, Vol. 6, No. 4, pp. 64–73, 1999b.

    Google Scholar 

  24. J. Picone, "Signal modeling techniques in speech recognition," Proceedings of IEEE, Vol. 81, No. 9, pp. 1214–1247, 1993.

    Google Scholar 

  25. A. Raftery, "Bayes factor and BIC: Comment on Weakliem," Technical Report 347, Dpt of statistics, Univ. of Washington, 1998.

  26. G. Schwarz, "Estimating the dimension of a model," Annals of Statistics, Vol. 6, pp. 461–464, 1978.

    Google Scholar 

  27. A. Sheth and W. Klas (Eds.), Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, Series on Data Warehousing and Data Management, McGraw-Hill, 1998, p. 384.

  28. A.U. Tansel, J. Clifford, and S.E.A. Gadia, Temporal Databases: Theory, Design, and Implementation, Addison-Wesley, 1993, p. 656.

  29. A. Turk, S.E. Johnson, P. Jourlin, P. Sparck-Jones, and P.C. Woodland, "The Cambridge University multimedia document retrieval demo system," in Proceedings of the ACM International Conference on Information Retrieval (SIGIR'00), Athens, Greece, 2000, p. 394.

  30. H. Wactlar, A. Hauptmann, and M. Witbrock, "Informedia: News-on-demand experiments in speech recognition," in Proceedings of the ARPA Speech Recognition Workshop. Arden House, Harriman, USA, 1996.

  31. E. Wold, T. Blum, D. Keislar, and J. Wheaton, "Content-based classification, search, and retrieval of audio," IEEE Multimedia, Vol. 3, No. 3, pp. 27–36, 1996.

    Google Scholar 

  32. T. Zhang and C.-C.J. Kuo, "Heuristic approach for generic audio data segmentation and annotation," in Proceedings of the 7th ACM International Multimedia Conference (ACM MM'99) Orlando, Florida, 1999, pp. 67–76.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lutfi, R., Gelgon, M. & Martinez, J. Structuring and Querying Documents in an Audio Database Management System. Multimedia Tools and Applications 24, 105–123 (2004). https://doi.org/10.1023/B:MTAP.0000036839.24141.9e

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:MTAP.0000036839.24141.9e

Navigation