Abstract
Streaming data poses a variety of new and interesting challenges for information retrieval and text analysis. Unlike static document collections, which are typically analyzed and indexed off-line to support ad-hoc queries, streaming data often must be analyzed on the fly and acted on as the data passes through the analysis system. Speech is one example of streaming data that is a challenge to exploit, yet has significant potential to provide value in a knowledge management system. We are specifically interested in techniques that analyze streaming data and automatically find collateral information, or information that clarifies, expands, and generally enhances the value of the streaming data. We present a system that analyzes a data stream and automatically finds documents related to the current topic of discussion in the data stream. Experimental results show that the system generates result lists with an average precision at 10 hits of better than 60%. We also present a hit-list re-ranking technique based on named entity analysis and automatic text categorization that can improve the search results by 6%–12%.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Apte C and Damerau F (1994) Automated learning of decision rules for text categorization: ACM Trans. Inf. Syst., 12:233–251.
Baeza-Yates R and Ribeiro-Neto B (1999) Modern information retrieval. Addison Wesley, New York.
Brown EW and Chong HA (1998) The guru system in TREC-6. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6), pp. 535–540.
Brown EW and Coden AR (2002) Capitalization recovery for text. In: Coden AR, Brown EW and Srinivasan S (eds.) Information Retrieval Techniques for Speech Applications. LNCS 2273. Springer, Berlin, pp. 11–22.
Brown EW, Srinivasan S, et al. (2001) Toward speech as a knowledge resource. IBM Systems Journal, 40:985–1001.
Chowdhury A, Beitzel S, et al. (2001) IIT TREC-9 - entity based feedback with fusion. In: Proceedings of the Ninth Text REtrieval Conference (TREC 9).
Cieri C, Graff D, et al. (1999) The TDT-2 text and speech corpus. In: Proceedings of the 1999 DARPA Broadcast News Workshop.
Coden A and Brown E (2001) Speech transcript analysis for automatic search. In: Proceedings of HICSS'34.
Cooper JW and Byrd RJ (1997) Lexical navigation: Visually prompted query expansion and refinement. In: Proceedings of the ACM International Conference on Digital Libraries, pp. 237–246.
DARPA (1998) Proceedings of the DARPA broadcast news transcription and understanding workshop. In: Proceedings.
Garofolo J, Voorhees E, et al. (1998) TREC-6 1997 spoken document retrieval track overview and results. In: Proceedings of The Sixth Text REtrieval Conference (TREC-6), pp. 83–91.
Johnson DE, Oles FJ, et al. (2002) A decision-tree-based symbolic rule induction system for text categorization. IBM Systems Journal, 41:428–437.
Manning C and Schuetze H (1999) Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA.
Mitra M, Buckley C, et al. (1997) An analysis of statistical and syntactic phrases. In : Proceedings of RIAO97, Computer-Assisted Information Searching on the Internet, pp. 200–214.
Ravin Y, Wacholder N, et al. (1997) Disambiguation of names in text. In: Proceedings of the ACL Conf. on Applied Natural Language Processing, pp. 202–208.
Strzalkowski T, Lin F, et al. (1998) Natural language information retrieval TREC-6 report. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6).
Strzalkowski T, Perez-Carballo J, et al. (2000) Natural language information retrieval: TREC-8 report. In: Proceedings of the Eigth Text REtrieval Conference (TREC 8).
Strzalkowski T, Stein G, et al. (1999) Natural language information retrieval: TREC-7 report. In: Proceedings of the Seventh Text REtreival Conference (TREC-7).
Turpin A and Moffat A (1999) Statistical phrases for vector-space information retrieval. In: Proceedings of the ACM Inter. Conf. on Research and Development in Information Retrieval, pp. 309–310.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Coden, A.R., Brown, E.W. Automatic search from streaming data. Inf Retrieval 9, 95–109 (2006). https://doi.org/10.1007/s10791-005-5723-3
Received:
Revised:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/s10791-005-5723-3