Abstract
In this paper, first studied are the distribution characteristics of user behaviors based on log data from a massive web search engine. Analysis shows that stochastic distribution of user queries accords with the characteristics of power-law function and exhibits strong similarity, and the user’s queries and clicked URLs present dramatic locality, which implies that query cache and ‘hot click’ cache can be employed to improve system performance. Then three typical cache replacement policies are compared, including LRU, FIFO, and LFU with attenuation. In addition, the distribution characteristics of web information are also analyzed, which demonstrates that the link popularity and replica popularity of a URL have positive influence on its importance. Finally, variance between the link popularity and user popularity, and variance between replica popularity and user popularity are analyzed, which give us some important insight that helps us improve the ranking algorithms in a search engine.
Similar content being viewed by others
References
Sullivan, D. et al., Fifth Annual Search Engine Meeting Report, see http://websearch.about.com/internet/websearch/library/blsem.htm.
Liu, J. G., Lei, M., Wang, J. Y. et al., Digging for gold on the web: Experience with the WebGather, in Proceedings of the 4th International Conference on High Performance Computing in the Asia-Pacific Region, 2000, 751–755.
Salton, G., Introduction to Modern Information Retrieval, New York: McGraw-Hill, 1983.
Church, K. W., Hanks, P., Word association norms, mutual Information, and lexicography, Computational Linguistic, 1990, 16(1): 22–29.
Brin, S., Page, L., The anatomy of a large-scale hypertextual web search engine, in Proceedings of 7th World Wide Web Conference, 1998, 107–117.
Members of the Clever Project, Hypersearching the Web, See http://www.sciam.com/1999/0699issue/0699raghavam.html.
Culliss, G., User Popularity Ranked Search Engine, See http://www.infornortics.com/searchengines/boston1999/culliss/index.htm.
Cho, J., Garcia-Molina, H., Efficient Crawling Through URL Ordering, See http://www-db.stanford.edu/~cho/crawler-paper/.
Crovella, M. E., Bestavros, A., Self-similarity in World Wide Web traffic: Evidence and possible causes, in Proceedings of 1996 ACM SIGMETRICS Conference, Philadelphia, PA, USA, May 1996, 160–169.
Leland, W. E. et al., On the self-similar nature of ethernet traffic (extended version), IEEE/ACM Transactions on Networking, 1994, 2(1): 1–15.
Zhao, X. F., Liu, X., Xu Z. W., Analysis and application of the self-similarity in network traffic—the performance test of clustered WWW servers with a single entry point, Journal of Computer Research & Development (in Chinese), 1999, 36(9): 1032–1038.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Wang, J., Shan, S., Lei, M. et al. Web search engine: Characteristics of user behaviors and their implication. Sci China Ser F 44, 351–365 (2001). https://doi.org/10.1007/BF02714738
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF02714738