Skip to main content
Log in

Automatically generating related queries in Japanese

  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Web searchers reformulate their queries, as they adapt to search engine behavior, learn more about a topic, or simply correct typing errors. Automatic query rewriting can help user web search, by augmenting a user’s query, or replacing the query with one likely to retrieve better results. One example of query-rewriting is spell-correction. We may also be interested in changing words to synonyms or other related terms. For Japanese, the opportunities for improving results are greater than for languages with a single character set, since documents may be written in multiple character sets, and a user may express the same meaning using different character sets. We give a description of the characteristics of Japanese search query logs and manual query reformulations carried out by Japanese web searchers. We use characteristics of Japanese query reformulations to extend previous work on automatic query rewriting in English, taking into account the Japanese writing system. We introduce several new features for building models resulting from this difference and discuss their impact on automatic query rewriting. We also examine enhancements in the form of rules which block conversion between some character sets, to address Japanese homophones. The precision/recall curves show significant improvement with the new feature set and blocking rules, and are often better than the English counterpart.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Similar content being viewed by others

References

  • American National Standards Institute. (1972). ANSI Z39.11-1972 American National Standard System for the Romanization of Japanese. New York: American National Standards Institute.

  • Basis Technology. (2006). BasisTech Knowledge Center. http://www.basistech.com/knowledge-center.

  • Chikamatsu, N., Shoichi, Y., Nozaki, H., & Long, E. (2006). Development of Japanese logographic character frequency lists for cognitive science research. http://nozaki-lab.ics.aichi-edu.ac.jp/nozaki/asahi/yes.html.

  • Jones, R., & Fain, D. C. (2003). Query word deletion prediction (pp. 435–436). SIGIR-2003.

  • Jones, R., Rey, B., Madani, O., & Greiner, W. (2006). Generating query substitutions. Edinburgh, UK: WWW2006.

    Google Scholar 

  • Kapur, S., & Parikh, S. (2006). Unity: Relevance feedback using user query logs. SIGIR 2006.

  • Makino, H., & Kizawa, M. (1980). An automatic translation system of non-segmented Kana sentences into Kanji-Kana sentences (pp. 295–302). COLING80.

  • Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. MIT Press.

  • Nagata, M. (2000). Synchronous morphological analysis of grapheme and phoneme for Japanese OCR. In Proceedings of ACL, pp. 384–391.

  • Ruthven, I. (2003). Re-examining the potential effectiveness of interactive query expansion. SIGIR-2003.

  • Salton, G., & Buckley, C. (1990). Improving retrieval performance by relevance feedback. Journal of the American Society for Information Science, 41(4), 288–297.

    Article  Google Scholar 

  • Spink, A., & Jansen, J. (2004). Web search: Public searching of the web. Springer Publishers.

  • Terra, E., & Clarke, C. L. A. (2004). Scoring missing terms in information retrieval tasks (pp. 50–58). ACM CIKM-2004.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rosie Jones.

Additional information

This work was done while Kevin Bartz and Pero Subasic were employees at Yahoo! Inc.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jones, R., Bartz, K., Subasic, P. et al. Automatically generating related queries in Japanese. Lang Resources & Evaluation 40, 219–232 (2006). https://doi.org/10.1007/s10579-007-9021-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-007-9021-0

Keywords

Navigation