Abstract
The growing digitization of multimedia content must be supported by a set of tools to manipulate them, and especially to query them. This is one of the major goals of an audio DBMS. Yet, existing work related to audio documents, e.g., radio or television archives, often leave the DBMS question open. In this paper, we lay the foundations for integrating audio into a general purpose DBMS, in the form of an audio abstract data type, along with its properties and associated operators. This contribution is coupled with an unsupervised statistically-founded speaker-based partitioning technique. For each of these two aspects, the paper underlines the practical interest and some technical difficulties. Also, some query examples introduce the problem of the complexity of the querying expressions as well as of time complexity.
Similar content being viewed by others
References
J.F. Allen, "Maintaining knowledge about temporal intervals," Communications of the ACM, Vol. 26, No. 11, pp. 832–843, 1983.
J.R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Humphrey, R.C. Jain, and C. Shu, "Virage image search engine: An open framework for image management," in International Symposium on Electronic Imaging (IE'96): Science and Technology, Storage & Retrieval for Image and Video Databases IV, 1996, pp. 76–87.
S. Berchtold, D.A. Keim, and H.-P. Kriegel, "The X-tree: An index structure for high-dimensional data," in, Proceedings of the 22nd International Conference on Very Large Data Bases (VLDB'96), Mumbai (Bombay), India, 1996, pp. 28–39.
C. Biernacki, G. Celeux, and G. Govaert, "Strategies for getting the highest likelihood in mixture models," Technical Report 4255, INRIA, 2001.
C. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1995.
J.P. Campbell, "Speaker recognition: A tutorial," Proceedings of the IEEE, Vol. 85, No. 9, pp. 1437–1462, 1997.
B.P. Carlin and T.A. Louis, Bayes and Empirical Bayes Methods for Data Analysis, Chapman and Hall-CRC, parag. 2.3.3, 2000.
R.G.G. Cattel, D.K. Barry, D. Bartels, M. Berler, J. Eastman, S. Gamerman, D. Jordan, Springer, and H.S.D. Wade, The Object Database Standard: ODMG 2.0, Morgan Kaufmann, 1997, p. 270.
D. Chickering and D. Heckerman, "Efficient approximations for the marginal likelihood of Bayesian networks with hidden variables," Technical Report MSR-TR-96-08, Microsoft Research, 1996.
A.P. Dempster, N.M. Laird, and D.B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of the Royal Statistical Society Ser. B, Vol. 39, pp. 1–38, 1977.
M. Figueiredo and A.K. Jain, "Unsupervised learning of finite mixtures," IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), Vol. 24, No. 3, pp. 381–396, 2002.
M. Flickner, H. Sawhnery, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker, "Query by image and video content: The QBIC system," IEEE Computer Vol. 2, No. 4, pp. 23–32, 1995.
J.-L. Gauvain, L. Lamel, and G. Adda, "Partitioning and transcription of broadcast news data," in International Conference on Spoken Language Processsing (ICSLP'98), Vol. 5. Sydney, Australia, 1998, pp. 1335–1338.
P. Herrera and X. Serra, "A proposal for the description of audio in the context of MPEG-7," in 1st European Workshop on Content-based Multimedia Indexing (CBMI'99), Toulouse, France, 1999, pp. 81–88.
M.J. Hu and Y. Jian, "Multimedia description framework (MDF) for content description of audio/video documents," in 4th ACM International Conference on Digital Libraries (ICDL'99), Berkeley, California, ACM Press, 1999, pp. 67–75.
F. Jelinek, Statistical Methods for Speech Recognition, MIT Press, 2000.
L. Kahn and D. McLeod, "Audio structuring and personalized retrieval using ontologies," in Proceedings of IEEE Advances in Digital Libraries (ADL'00), 2000, pp. 116–126.
R. Lutfi, J. Martinez, and M. Gelgon, "Manipulating audio into a DBMS," in Proceedings of the 8th International Conference on Multimedia Modeling (MMM'01), Amsterdam, The Netherlands, 2001, pp. 91–106.
D. MacKay, "Bayesian interpolation," Neural computation, Vol. 4, No. 3, pp. 415–447, 1992.
J. Martinez and S. Guillaume, "Colour image retrieval fitted to 'Classical' Querying," Networking and Information Systems Journal (NISJ), Vol. 1, Nos. 2/3, pp. 251–278, 1998.
J. Martinez, R. Lutfi, and M. Gelgon, "An object-oriented schema for querying audio," in Proceedings of the 8th International Conference on Object-Oriented Information Systems (OOIS'02), Vol. 2425 of LNCS, Montpellier, France, 2002, pp. 67–76
F. Nack and A. Lindsay, "Everything you wanted to know about MPEG-7: Part 1," IEEE Multimedia, Vol. 6, No. 3, pp. 65–77, 1999a.
F. Nack and A. Lindsay, "Everything you wanted to know about MPEG-7: Part 2," IEEE Multimedia, Vol. 6, No. 4, pp. 64–73, 1999b.
J. Picone, "Signal modeling techniques in speech recognition," Proceedings of IEEE, Vol. 81, No. 9, pp. 1214–1247, 1993.
A. Raftery, "Bayes factor and BIC: Comment on Weakliem," Technical Report 347, Dpt of statistics, Univ. of Washington, 1998.
G. Schwarz, "Estimating the dimension of a model," Annals of Statistics, Vol. 6, pp. 461–464, 1978.
A. Sheth and W. Klas (Eds.), Multimedia Data Management: Using Metadata to Integrate and Apply Digital Media, Series on Data Warehousing and Data Management, McGraw-Hill, 1998, p. 384.
A.U. Tansel, J. Clifford, and S.E.A. Gadia, Temporal Databases: Theory, Design, and Implementation, Addison-Wesley, 1993, p. 656.
A. Turk, S.E. Johnson, P. Jourlin, P. Sparck-Jones, and P.C. Woodland, "The Cambridge University multimedia document retrieval demo system," in Proceedings of the ACM International Conference on Information Retrieval (SIGIR'00), Athens, Greece, 2000, p. 394.
H. Wactlar, A. Hauptmann, and M. Witbrock, "Informedia: News-on-demand experiments in speech recognition," in Proceedings of the ARPA Speech Recognition Workshop. Arden House, Harriman, USA, 1996.
E. Wold, T. Blum, D. Keislar, and J. Wheaton, "Content-based classification, search, and retrieval of audio," IEEE Multimedia, Vol. 3, No. 3, pp. 27–36, 1996.
T. Zhang and C.-C.J. Kuo, "Heuristic approach for generic audio data segmentation and annotation," in Proceedings of the 7th ACM International Multimedia Conference (ACM MM'99) Orlando, Florida, 1999, pp. 67–76.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Lutfi, R., Gelgon, M. & Martinez, J. Structuring and Querying Documents in an Audio Database Management System. Multimedia Tools and Applications 24, 105–123 (2004). https://doi.org/10.1023/B:MTAP.0000036839.24141.9e
Issue Date:
DOI: https://doi.org/10.1023/B:MTAP.0000036839.24141.9e