Skip to main content

A New Measure for Query Disambiguation Using Term Co-occurrences

  • Conference paper
Intelligent Data Engineering and Automated Learning – IDEAL 2006 (IDEAL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4224))

Abstract

This paper explores techniques that discover terms to replace given query terms from a selected subset of documents. The Internet allows access to large numbers of documents archived in digital format. However, no user can be an expert in every field, and they trouble finding the documents that suit their purposes experts when they cannot formulate queries that narrow the search to the context they have in mind. Accordingly, we propose a method for extracting terms from searched documents to replace user-provided query terms. Our results show that our method is successful in discovering terms that can be used to narrow the search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  2. Church, K., Gale, W.: Inverse document frequency (IDF): A measure of deviations from poisson. In: Proc. of 3rd Workshop on Very Large Corpora, pp. 121–130 (1995)

    Google Scholar 

  3. Eguchi, K., Oyama, K., Ishida, E., Kando, N., Kuriyama, K.: Overview of the Web retrieval task at the third NTCIR workshop. In: Proc. of NTCIR-3, pp. 1–24 (2003)

    Google Scholar 

  4. Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: Proc. of SIGIR 2004, pp. 49–56 (2004)

    Google Scholar 

  5. Hisamitsu, T., Niwa, Y., Nishioka, S., Sakurai, H., Imaichi, O., Iwayama, M., Takano, A.: Extracting terms by a combination of term frequency and a measure of term representativeness. Terminology 6(2), 211–232 (2001)

    Google Scholar 

  6. ipadic-2.5.1, http://chasen.naist.jp/stable/ipadic/

  7. Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-ocurrence statistical information. International Journal on Artificial Intelligence Tools 13, 157–169 (2004)

    Article  Google Scholar 

  8. MeCab., http://mecab.sourceforge.jp/

  9. Rennie, J., Jaakkola, T.: Using term informativeness for named entity detection. In: Proc. of SIGIR 2005, pp. 353–360 (2005)

    Google Scholar 

  10. Robertson, S.E.: On term selection for query expansion. Journal of Documentation 46(4), 359–364 (1990)

    Article  Google Scholar 

  11. Toyoda, M., Kitsuregawa, M., Mano, H., Itoh, H., Ogawa, Y.: University of Tokyo/RICOH at NTCIR-3 Web retrieval task. In: Proc. of NTCIR-3, pp. 31–38 (2003)

    Google Scholar 

  12. TREC. trec_eval, http://trec.nist.gov/trec_eval

  13. Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Proc. of ICML 1997, pp. 412–420 (1997)

    Google Scholar 

  14. Yoshioka, M., Haraguchi, M.: Study on the combination of probabilistic and boolean ir models for www documents retrieval. Working Notes of NTCIR-4(Supplement Volume), 9–16 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wakaki, H., Masada, T., Takasu, A., Adachi, J. (2006). A New Measure for Query Disambiguation Using Term Co-occurrences. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_108

Download citation

  • DOI: https://doi.org/10.1007/11875581_108

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45485-4

  • Online ISBN: 978-3-540-45487-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics