Abstract
Document retrieval systems conventionally use words as the basic unit of representation, a natural choice since words are primary carriers of semantic information. In this paper we propose the use of a different, phonetically defined unit of representation that we call “particles”. Particles are phonetic sequences that do not possess meaning. Both documents and queries are converted from their standard word-based form into sequences of particles. Indexing and retrieval is performed with particles. Experiments show that this scheme is capable of achieving retrieval performance that is comparable to that from words when the text in the documents and queries are clean, and can result in significantly improved retrieval when they are noisy.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Thong, J.M.V., Moreno, P.J., Logan, B., Fidler, B., Maffey, K., Moores, M.: Speechbot: an experimental speech-based search engine for multimedia content on the web. IEEE Trans. Multimedia 4, 88–96 (2002)
Wolf, P.P., Raj, B.: The MERL SpokenQuery information retrieval system: A system for retrieving pertinent documents from a spoken query. In: Proc. ICME (2002)
Ogilvie, P., Callan, J.: Experiments using the lemur toolkit. In: Proc. TREC (2001)
Whittaker, E.W.D.: Statistical language modelling for automatic speech recognition of Russian and English. PhD thesis, Cambridge University (2000)
Daelemans, W., Van Den Bosch, A.: Language-independent data-oriented grapheme-tophoneme conversion. In: Progress in Speech Processing. Springer, Heidelberg (1996)
Mikheev, A.: Document centered approach to text normalization. In: Proc. SIGIR, pp. 136–143. ACM, New York (2000)
Daniel Jurafsky, J.H.M.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice-Hall, Englewood Cliffs (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gouvêa, E.B., Raj, B. (2009). Word Particles Applied to Information Retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_38
Download citation
DOI: https://doi.org/10.1007/978-3-642-00958-7_38
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00957-0
Online ISBN: 978-3-642-00958-7
eBook Packages: Computer ScienceComputer Science (R0)