Abstract
This paper concentrates on the problem of designing and developing a spoken query retrieval (SQR) system to access large document databases via voice. The main challenge is to identify and address issues related to the adaptation and scalability of integrating automatic speech recognition (ASR) and information retrieval (IR). In this paper, a Context Aware Language Model (CALM) framework allowing information retrieval to large document databases via voice is presented and findings from a research study using the framework will be discussed as well.
Similar content being viewed by others
References
Allan, J. (2002). Perspectives on information retrieval and speech. In Proc. of the SIGIR'01 Workshop on Information Retrieval Techniques for Speech Applications, Springer LNCS 2273.
American Foundation for the Blind. (2002). [Online] Available: http://www.afb.org/info_documents.asp?collectionid=15.
Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. Wokingham, UK: Addison-Wesley.
Barnett, J., Anderson, S., Broglio, J., Singh, M., Hudson, R., and Kuo, S. (1997). Experiments in spoken queries for document retrieval. In Proceedings of Eurospeech97, pp. 1323–1326.
BeVocal, Inc. (2002). [Online] Available: http://www.bevocal.com/corporateweb/index.html.
Cenek, P. (2001). Dialogue Interfaces for Library Systems. [Online] Available: http://www.fi.muni.cz/informatics/reports/files/2001/FIMU-RS-2001-04.pdf.
Connolly, T. and Begg, C. (2002). Database Systems, 3rd ed., Addison Wesley.
Crestani, F. (2002). Spoken query processing for interactive information retrieval. Data and Knowledge Engineering, 41(1):105–124.
Drori, O. (2000). Improving display of search results in information retrieval systems—user's study. Technical Report of the Leibnitz Center for Research in Computer Science, No. 200034.
eTForecasts.com. (2002). [Online] Available: http://www.etfore-casts.com/products/ES_pcww.htm#list.
Fujii, A., Itou, K., and Ishikawa, T. (2002). Speech-driven text retrieval: Using target IR collections for statistical language model adaptation in speech recognition. In Proc. of the SIGIR'01 Workshop on Information Retrieval Techniques for Speech Applications. Springer LNCS 2273, pp. 94–104.
Franz, Alexander, and Milch, Brian. (2002). Searching the web by voice. In Proceedings of the 19th International Conference on Computational Linguistics (COLING), pp. 1213–1217.
Garofolo, J., Auzanne, C., and Voorhees, E. (2000). The TREC spoken document retrieval track: A success story. In Proceedings of TREC-8 (1999). NIST special publication.
Hersh, W., Buckley, C., Leone, T., and Hickam, D. (1994). OHSUMED: An Interactive Retrieval Evaluation and new large test collection for research. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 192–201.
Jansen, B.J., Spink, A., Bateman, J., and Saracevic, T. (1998). Real life information retrieval: A study of user queries on the web. SIGIR Forum, vol. 32. no. 1, pp. 5–17.
Litman, D., Pan, S., and Walker, M. (1998). Evaluating response strategies in a web-based spoken dialogue agent. In Proc. 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conf. on Computational Linguistics, pp. 780–786.
Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Science, 63:81–97.
Robertson, S. and Hull D. (2000). The TREC-9 Filtering Track Final Report. [online] Available: http://trec.nist.gov/pubs/trec9/papers/filtering_new.pdf.
Salton, G. and McGill, M. (1983). Introduction to Modern Information Retrieval. New York: McGraw-Hill Book Co.
Seattle Post-Intelligencer. (2002). World's cellular phones will outnumber fixed lines within months, U.N. predicts. [Online] Available: http://seattlepi.nwsource.com/business/57748_cell11.shtml.
Section508.gov. (2002). [Online] Available: http://www.section508.gov/.
Shneiderman, B. (2000). The future of the web: Visual, social, universal. [Online] Available: http://www.cs.umd.edu/hcil/pubs/presentations/FutureWeb/.
Shneiderman B., Byrd, D., and Croft, W.B. (1997). Clarifying search: A user-interface framework for text searches. D-Lib Magazine.
Steinbach, M., Karypis, G. and Kumar, V. (2000). A comparison of document clustering techniques. In KDD Workshop on Text Mining.
UniBatt Ltd. (2002) The Market & Marketing Strategy. [Online] Available: http://www.unibatt.com/market.htm.
Voice Extensible Markup Language (VoiceXML) Version 2.0. (2002). [Online] Available: http://www.w3.org/TR/voicexml20/.
Walker, M., Litman, D., Kamm, C. and Abella, A. (1997). PARADISE: A Framework for Evaluating Spoken Dialogue Agents. In 35th Annual Meeting of the Association of Computational Linguistics, ACL 97.
Webreview.com. (2000). [Online] Available: http://www.webreview.com/2000/09_15/developers/09_15_00_1.shtml.
W3C Extensible Markup Language. (2003). [Online] Available: http://www.w3.org/XML/.
Zhong, Y., Gilbert, J., and Hu, W. (2003). Speech user interface for document retrieval. In Proceedings of the 41st Annual ACM Southeast Conference. Savannah, Georgia, pp. 130–131.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhong, Y., Gilbert, J.E. A Context-Aware Language Model for Spoken Query Retrieval. Int J Speech Technol 8, 203–219 (2005). https://doi.org/10.1007/s10772-005-2171-9
Issue Date:
DOI: https://doi.org/10.1007/s10772-005-2171-9