Skip to main content

Improved Web Search Engine by New Similarity Measures

  • Conference paper

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 193))

Abstract

Information retrieval is a process of managing the user’s needed information. IR system captures dynamically crawling items that are to be stored and indexed into repositories; this dynamic process facilitates retrieval of needed information by search process and customized presentation to the visualization space. Search engines plays major role in finding the relevant items from the huge repositories, where different methods are used to find the items to be retrieved. The survey on search engines explores that the Naive users are not satisfying with the current searching results; one of the reason to this problem is “lack of capturing the intention of the user by the machine”. Artificial intelligence is an emerging area that addresses these problems and trains the search engine to understand the user’s interest by inputting training data set. In this paper we attempt this problem with a novel approach using new similarity measures. The learning function which we used maximizes the user’s preferable information in searching process. The proposed function utilizes the query log by considering similarity between ranked item set and the user’s preferable ranking. The similarity measure facilitates the risk minimization and also feasible for large set of queries. Here we have demonstrated the framework based on the comparison of performance of algorithm particularly on the identification of clusters using replicated clustering approach. In addition, we provided an investigation analysis on clustering performance which is affected by different sequence representations, different distance measures, number of actual web user clusters, number of web pages, similarity between clusters, minimum session length, number of user sessions, and number of clusters to form.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baeza-Yates, R.A., Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Amsterdam (1999)

    MATH  Google Scholar 

  2. Beitzel, D.M., Jensen, E.C., Chowdhury, A., Grossman, D., Frieder, O.: Hourly analysis of a very large topically categorized Web query log. In: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 321–328 (2004)

    Google Scholar 

  3. Shen, X., Dumais, S., Horvitz, E.: Analysis of topic dynamics in Web search. In: Proceedings of the International Conference on World Wide Web, pp. 1102–1103 (2005)

    Google Scholar 

  4. Kumar, P., Bapi, R., Krishna, P.: SeqPAM: A Sequence Clustering Algorithm for Web Personalization. Institute for Development and Research in Banking Technology, India

    Google Scholar 

  5. Cohen, W., Shapire, R., Singer, Y.: Learning to order things. Journal of Artificial Intelligence Research

    Google Scholar 

  6. Shen, H.-z., Zhao, J.-d., Yang, Z.-z.: A Web Mining Model for Real-time Webpage Personalization. ACM, New York (2006)

    Google Scholar 

  7. Kolikipogu, R., Padmaja Rani, B., Kakulapati, V.: Information Retrieval in Indian Languages: Query Expansion model for telugu language as a case study. In: IITAIEEE, China, vol. 4(1) (November 2010)

    Google Scholar 

  8. Kolikipogu, R.: WordNet Based Term Selection for PRF Query Expansion Model. In: ICCMS 2011, vol. 1 (January 2011)

    Google Scholar 

  9. Vojnovi, M., Cruise, J., Gunawardena, D., Marbach, P.: Ranking and Suggesting Popular Item. IEEE Journal 21 (2009)

    Google Scholar 

  10. Eirinaki, M., Vazirgiannis, M.: Web Mining for Web Personalization. ACM Transactions on Internet Technology 3(1), 1–27 (2003)

    Article  Google Scholar 

  11. Asasa Robertson, S.E., Spark Jones, K.: Relevance Weighting of Search Terms. J. American Society for Information Science 27(3) (1976)

    Google Scholar 

  12. Salton, G.E., Fox, E.A., Wu, H.: Extended Boolean Information Retrieval. Communications of the ACM 26(12), 1022–1036 (1983)

    Article  MathSciNet  MATH  Google Scholar 

  13. Kelly, D., Teevan, J.: Implicit feedback for inferring user preference: A bibliography. ACM SIGIR Forum 37(2), 18–28 (2003)

    Article  Google Scholar 

  14. Fox, S., Karnawat, K., Mydland, M., Dumais, S., White, T.: Evaluating implicit measures to improve web search. ACM Transactions on Information Science (TOIS) 23(2), 147–168 (2005)

    Article  Google Scholar 

  15. Radlinski, F., Kurupu, M.: How Does Clickthrough Data Reflect Retrieval Quality? In: CIKM 2008, Napa Valley, California, USA, October 26-30 (2008)

    Google Scholar 

  16. Zhao, Q., Hoi, S.C.H., Liu, T.-Y.: Time-dependent semantic similarity measure of queries using historical click-through data. In: 5th International Conference on WWW. ACM, New York (2006)

    Google Scholar 

  17. Xu, X.F.: Improving quality of training data for learning to rank using click-through data. In: ACM Proceedings of WSDM 2010 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kakulapati, V., Kolikipogu, R., Revathy, P., Karunanithi, D. (2011). Improved Web Search Engine by New Similarity Measures. In: Abraham, A., Mauri, J.L., Buford, J.F., Suzuki, J., Thampi, S.M. (eds) Advances in Computing and Communications. ACC 2011. Communications in Computer and Information Science, vol 193. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22726-4_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22726-4_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22725-7

  • Online ISBN: 978-3-642-22726-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics