Skip to main content
Log in

Web search engine: Characteristics of user behaviors and their implication

  • Scientific Papers
  • Published:
Science in China Series : Information Sciences Aims and scope Submit manuscript

Abstract

In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the characteristics of power-law function and exhibits strong similarity, and the user’s queries and clicked URLs present dramatic locality, which implies that query cache and ‘hot click’ cache can be employed to improve system performance. Then three typical cache replacement policies are compared, including LRU, FIFO, and LFU with attenuation. In addition, the distribution characteristics of web information are also analyzed, which demonstrates that the link popularity and replica popularity of a URL have positive influence on its importance. Finally, variance between the link popularity and user popularity, and variance between replica popularity and user popularity are analyzed, which give us some important insight that helps us improve the ranking algorithms in a search engine.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Sullivan, D. et al., Fifth Annual Search Engine Meeting Report, see http://websearch.about.com/internet/websearch/library/blsem.htm.

  2. Liu, J. G., Lei, M., Wang, J. Y. et al., Digging for gold on the web: Experience with the WebGather, in Proceedings of the 4th International Conference on High Performance Computing in the Asia-Pacific Region, 2000, 751–755.

  3. Salton, G., Introduction to Modern Information Retrieval, New York: McGraw-Hill, 1983.

    MATH  Google Scholar 

  4. Church, K. W., Hanks, P., Word association norms, mutual Information, and lexicography, Computational Linguistic, 1990, 16(1): 22–29.

    Google Scholar 

  5. Brin, S., Page, L., The anatomy of a large-scale hypertextual web search engine, in Proceedings of 7th World Wide Web Conference, 1998, 107–117.

  6. Members of the Clever Project, Hypersearching the Web, See http://www.sciam.com/1999/0699issue/0699raghavam.html.

  7. Culliss, G., User Popularity Ranked Search Engine, See http://www.infornortics.com/searchengines/boston1999/culliss/index.htm.

  8. Cho, J., Garcia-Molina, H., Efficient Crawling Through URL Ordering, See http://www-db.stanford.edu/~cho/crawler-paper/.

  9. Crovella, M. E., Bestavros, A., Self-similarity in World Wide Web traffic: Evidence and possible causes, in Proceedings of 1996 ACM SIGMETRICS Conference, Philadelphia, PA, USA, May 1996, 160–169.

  10. Leland, W. E. et al., On the self-similar nature of ethernet traffic (extended version), IEEE/ACM Transactions on Networking, 1994, 2(1): 1–15.

    Article  Google Scholar 

  11. Zhao, X. F., Liu, X., Xu Z. W., Analysis and application of the self-similarity in network traffic—the performance test of clustered WWW servers with a single entry point, Journal of Computer Research & Development (in Chinese), 1999, 36(9): 1032–1038.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, J., Shan, S., Lei, M. et al. Web search engine: Characteristics of user behaviors and their implication. Sci China Ser F 44, 351–365 (2001). https://doi.org/10.1007/BF02714738

Download citation

  • Received:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02714738

Keywords

Navigation