Skip to main content

Word Particles Applied to Information Retrieval

  • Conference paper
Advances in Information Retrieval (ECIR 2009)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5478))

Included in the following conference series:

  • 3321 Accesses

Abstract

Document retrieval systems conventionally use words as the basic unit of representation, a natural choice since words are primary carriers of semantic information. In this paper we propose the use of a different, phonetically defined unit of representation that we call “particles”. Particles are phonetic sequences that do not possess meaning. Both documents and queries are converted from their standard word-based form into sequences of particles. Indexing and retrieval is performed with particles. Experiments show that this scheme is capable of achieving retrieval performance that is comparable to that from words when the text in the documents and queries are clean, and can result in significantly improved retrieval when they are noisy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Thong, J.M.V., Moreno, P.J., Logan, B., Fidler, B., Maffey, K., Moores, M.: Speechbot: an experimental speech-based search engine for multimedia content on the web. IEEE Trans. Multimedia 4, 88–96 (2002)

    Article  Google Scholar 

  2. Wolf, P.P., Raj, B.: The MERL SpokenQuery information retrieval system: A system for retrieving pertinent documents from a spoken query. In: Proc. ICME (2002)

    Google Scholar 

  3. Ogilvie, P., Callan, J.: Experiments using the lemur toolkit. In: Proc. TREC (2001)

    Google Scholar 

  4. Whittaker, E.W.D.: Statistical language modelling for automatic speech recognition of Russian and English. PhD thesis, Cambridge University (2000)

    Google Scholar 

  5. Daelemans, W., Van Den Bosch, A.: Language-independent data-oriented grapheme-tophoneme conversion. In: Progress in Speech Processing. Springer, Heidelberg (1996)

    Google Scholar 

  6. Mikheev, A.: Document centered approach to text normalization. In: Proc. SIGIR, pp. 136–143. ACM, New York (2000)

    Chapter  Google Scholar 

  7. Daniel Jurafsky, J.H.M.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics and Speech Recognition. Prentice-Hall, Englewood Cliffs (2000)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gouvêa, E.B., Raj, B. (2009). Word Particles Applied to Information Retrieval. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds) Advances in Information Retrieval. ECIR 2009. Lecture Notes in Computer Science, vol 5478. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00958-7_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00958-7_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00957-0

  • Online ISBN: 978-3-642-00958-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics