Abstract
We explore the feasibility of using speech input to perform the task of indexing a large volume of digital photographs. As a natural medium for image communication, speech can be used to complement existing contentbased techniques thereby promoting the reliability and use-ability of image retrieval systems. We introduce a methodology for image indexing using speech annotation technique. Speech recognition tools, like Dragon NaturallySpeaking can be adapted to perform the main role of speech-to-text transcription. The use of structured speech as opposed to free form speech in a limited system can further boost the transcription accuracy. We also introduce the idea of using N-best lists from the speech recognition output to improve the recognition performance. The transcribed text is used to populate the metadata of the corresponding photograph. A photo query strategy is implemented to affirm the performance of proposed technique for photo indexing and retrieval.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Flickner, M., Sawhney H., Niblack, W., Ashley J., Huang Q. and Dom B.: Query by Image and Video Content: The QBIC System. IEEE Computer, Vol. 28 (1995) 23–32
Wu J.K.: Content-based Indexing of Multimedia Databases. IEEE Trans. on Knowledge and Data Engineering, Vol. 9(1997) 978–989
Tan T., Mulhem P.: Image Query System using Object Probes. Submitted to ICIP 2001, Thessaloniki, Greece, 2001
Satoh S., Nakamura Y., and Kanade T.: Name-It: Naming and Detection Faces in News Videos. IEEE Multimedia (1999) 22–35
Siegler M.A.: Integration of Continuous Speech Recognition and Information Retrieval for Mutually Optimal Performance,“ Ph.D. Thesis, Carnegie Mellon University, U.S.(1999)
Srihari R.K. et al: Multimedia Indexing and Retrieval of Voice-Annotated Consumer Photos. Proceedings of the Multimedia Indexing and Retrieval Workshop, SIGIR ‘99, University of California, Berkeley, U.S (1999) 1–16
Kuchinsky A. et al: FotoFile: A Consumer Multimedia Organization and Retrieval System. Proceedings of the CHI 99 Conference on Human Factors in Computing Systems, Pennsylvania, U.S. (1999) 496–503
Mills T.J., Pye D., Sinclair D. and Wood K.R.: Shoebox: A Digital Photo Management System. AT&T Labs Cambridge Technical Reports, UK (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, J., Tan, T., Mulhem, P. (2001). A Method for Photograph Indexing Using Speech Annotation. In: Shum, HY., Liao, M., Chang, SF. (eds) Advances in Multimedia Information Processing — PCM 2001. PCM 2001. Lecture Notes in Computer Science, vol 2195. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45453-5_113
Download citation
DOI: https://doi.org/10.1007/3-540-45453-5_113
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42680-6
Online ISBN: 978-3-540-45453-3
eBook Packages: Springer Book Archive