Abstract
Information retrieval is a process of managing the user’s needed information. IR system captures dynamically crawling items that are to be stored and indexed into repositories; this dynamic process facilitates retrieval of needed information by search process and customized presentation to the visualization space. Search engines plays major role in finding the relevant items from the huge repositories, where different methods are used to find the items to be retrieved. The survey on search engines explores that the Naive users are not satisfying with the current searching results; one of the reason to this problem is “lack of capturing the intention of the user by the machine”. Artificial intelligence is an emerging area that addresses these problems and trains the search engine to understand the user’s interest by inputting training data set. In this paper we attempt this problem with a novel approach using new similarity measures. The learning function which we used maximizes the user’s preferable information in searching process. The proposed function utilizes the query log by considering similarity between ranked item set and the user’s preferable ranking. The similarity measure facilitates the risk minimization and also feasible for large set of queries. Here we have demonstrated the framework based on the comparison of performance of algorithm particularly on the identification of clusters using replicated clustering approach. In addition, we provided an investigation analysis on clustering performance which is affected by different sequence representations, different distance measures, number of actual web user clusters, number of web pages, similarity between clusters, minimum session length, number of user sessions, and number of clusters to form.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Baeza-Yates, R.A., Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Amsterdam (1999)
Beitzel, D.M., Jensen, E.C., Chowdhury, A., Grossman, D., Frieder, O.: Hourly analysis of a very large topically categorized Web query log. In: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 321–328 (2004)
Shen, X., Dumais, S., Horvitz, E.: Analysis of topic dynamics in Web search. In: Proceedings of the International Conference on World Wide Web, pp. 1102–1103 (2005)
Kumar, P., Bapi, R., Krishna, P.: SeqPAM: A Sequence Clustering Algorithm for Web Personalization. Institute for Development and Research in Banking Technology, India
Cohen, W., Shapire, R., Singer, Y.: Learning to order things. Journal of Artificial Intelligence Research
Shen, H.-z., Zhao, J.-d., Yang, Z.-z.: A Web Mining Model for Real-time Webpage Personalization. ACM, New York (2006)
Kolikipogu, R., Padmaja Rani, B., Kakulapati, V.: Information Retrieval in Indian Languages: Query Expansion model for telugu language as a case study. In: IITAIEEE, China, vol. 4(1) (November 2010)
Kolikipogu, R.: WordNet Based Term Selection for PRF Query Expansion Model. In: ICCMS 2011, vol. 1 (January 2011)
Vojnovi, M., Cruise, J., Gunawardena, D., Marbach, P.: Ranking and Suggesting Popular Item. IEEE Journal 21 (2009)
Eirinaki, M., Vazirgiannis, M.: Web Mining for Web Personalization. ACM Transactions on Internet Technology 3(1), 1–27 (2003)
Asasa Robertson, S.E., Spark Jones, K.: Relevance Weighting of Search Terms. J. American Society for Information Science 27(3) (1976)
Salton, G.E., Fox, E.A., Wu, H.: Extended Boolean Information Retrieval. Communications of the ACM 26(12), 1022–1036 (1983)
Kelly, D., Teevan, J.: Implicit feedback for inferring user preference: A bibliography. ACM SIGIR Forum 37(2), 18–28 (2003)
Fox, S., Karnawat, K., Mydland, M., Dumais, S., White, T.: Evaluating implicit measures to improve web search. ACM Transactions on Information Science (TOIS) 23(2), 147–168 (2005)
Radlinski, F., Kurupu, M.: How Does Clickthrough Data Reflect Retrieval Quality? In: CIKM 2008, Napa Valley, California, USA, October 26-30 (2008)
Zhao, Q., Hoi, S.C.H., Liu, T.-Y.: Time-dependent semantic similarity measure of queries using historical click-through data. In: 5th International Conference on WWW. ACM, New York (2006)
Xu, X.F.: Improving quality of training data for learning to rank using click-through data. In: ACM Proceedings of WSDM 2010 (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kakulapati, V., Kolikipogu, R., Revathy, P., Karunanithi, D. (2011). Improved Web Search Engine by New Similarity Measures. In: Abraham, A., Mauri, J.L., Buford, J.F., Suzuki, J., Thampi, S.M. (eds) Advances in Computing and Communications. ACC 2011. Communications in Computer and Information Science, vol 193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22726-4_30
Download citation
DOI: https://doi.org/10.1007/978-3-642-22726-4_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22725-7
Online ISBN: 978-3-642-22726-4
eBook Packages: Computer ScienceComputer Science (R0)