Skip to main content

A Clickstream Based Web Page Importance Metric for Customized Search Engines

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((TCCI,volume 8240))

Abstract

The immense growing dimension of the World Wide Web causes considerable obstacles for search engine applications. Since the performance of any Web search engine regarding the degree of result set’s accuracy and the portion of the presence of authoritative Web pages in result set is highly dependent to the applied Web page importance metric, therefore any enhancement to the existing metrics or designing novel related algorithms could guarantee better outcomes of search engine applications. Regarding the fact that employing the existing link dependent Web page importance metrics in search engines is not an absolute solution because of their accurateness dependency to the downloaded portion of the Web and their incapability in covering authoritative dark Web pages, therefore proposing and discussing on link independent approaches could be a solution to the mentioned barriers. This paper reviews our clickstream based Web page importance metric of LogRank that is independent of the link structure of the Web graph for extracting the best result set from the specific Web domain boundary and for importance estimation of a whole Web domain. Moreover, our Web page classification approach in order to be used in Web site importance calculation will be reviewed.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abiteboul, S., Preda, M., Cobena, G.: Adaptive On-line Page Importance Computation. In: Proceeding of 12th International Conference on World Wide Web, pp. 280–290. ACM (2003); 1-58113-680-3/03/0005

    Google Scholar 

  2. Ahmadi-Abkenari, F., Selamat, A.: A Clickstream-Based Focused Trend Parallel Web Crawler. International Journal of Information Sciences 184, 266–281 (2012)

    Article  Google Scholar 

  3. Ahmadi-Abkenari, F., Selamat, A.: LogRank: A Clickstream-based Web Page Importance Metric for Web Crawlers. JDCTA: International Journal of Digital Content Technology and its Applications 6(1), 200–207 (2012)

    Article  Google Scholar 

  4. Ahmadi-Abkenari, F., Selamat, A.: Application of the Clickstream-Based Web Page Importance Metric in Web Site Ranking and Rank Initialization of Infant Web Pages. IJACT: International Journal of Advancements in Computing Technology 4(1), 351–358 (2012)

    Article  Google Scholar 

  5. Attardi, G., Gulli, A., Sebastiani, F.: Automatic Web Page Categorization by link and conext analysis. In: Proceedings of THAI 1999, First European Symposium on Telematics, Hypermedia and Atificial Intelligence, Italy, pp. 105–119 (1999)

    Google Scholar 

  6. Bharat, K., Henzinger, M.R.: Improved Algorithms for Topic Distillation in a Hyperlinked Environment. In: Proceedings of the 21st ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 104–111 (1998)

    Google Scholar 

  7. Brin, S., Page, L.: The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks 30(1-7), 107–117 (1998)

    Google Scholar 

  8. Barbosa, L., Freire, J.: An Adaptive Crawler for Locating Hidden Web Entry Points. In: Proceedings of the 16th International Conference on World Wide Web (2007)

    Google Scholar 

  9. Catledge, L., Pitkow, J.: Characterizing Browsing Behaviors on the World Wide Web. Computer Networks and ISDN Systems 27(6) (1995)

    Google Scholar 

  10. Chackrabarti, S.: Integrating Document Object Model with Hyperlinks for Enhanced Topic Distillation and Information Extraction. In: Proceedings of the 13th International World Wide Web Conference (WWW 2001), pp. 211–220 (2001)

    Google Scholar 

  11. Chackrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Kumar, R., Raghavan, P., Rajagopalan, S., Tomkins, A.: Mining the Link Structure of the World Wide Web. IEEE Computer 32(8), 60–67 (1999)

    Article  Google Scholar 

  12. Chackrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D., Kleinberg, J.: Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text. In: Proceedings of the 7th International World Wide Web Conference, WWW 7 (1998)

    Google Scholar 

  13. Chakrabarti, S., Van den Berg, M., Dom, B.: Focused Crawling: A New Approach to Topic Specific Web Resource Discovery. Computer Networks 31(11-16), 1623–1640 (1999)

    Article  Google Scholar 

  14. Cho, J., Garcia-Molina, H.: Parallel Crawlers. In: Proceedings of 11th International Conference on World Wide Web. ACM Press (2002)

    Google Scholar 

  15. Cho, J., Garcia-Molina, H., Page, L.: Efficient Crawling through URL Ordering. In: Proceedings of 7th International Conference on World Wide Web, Brisbane, Australia (1998)

    Google Scholar 

  16. Cooley, R., Mobasher, B., Srivastava, J.: Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems 1(1), 5–32 (1999)

    Article  Google Scholar 

  17. Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Arsman, R.: Indexing by Latent Semantic Analysis. Journal of American Society for Information Science 41, 391–407 (1990)

    Article  Google Scholar 

  18. Giudici, P.: Applied Data Mining. In: Web Clickstream Analysis, ch. 8, pp. 229–253. Wiley Press (2003) ISBN: 0-470-84678-X

    Google Scholar 

  19. Guan, Z., Wang, C., Chen, C., Bu, J., Wang, J.: Guide Focused Crawlers Efficiently and Effectively Using On-line Topical Importance Estimation. In: Proceedings of the International Conference of SIGIR 2008. ACM (2008) 978-1-60558-164-4/08/07

    Google Scholar 

  20. Gyongyi, Z., Garcia-Molina, H., Pederson, J.: Combating Web Spam with TrustRank. In: Proceedings of 30th VLDB Conference, Toronto, Canada (2004)

    Google Scholar 

  21. Haveliwala, T.H.: Topic Sensitive PageRank. In: Proceedings of the WWW 2002, Hawaii, USA. ACM (2002), 1-58113-449-5/02/0005

    Google Scholar 

  22. Haveliwala, T.H.: Efficient Computation of PageRank. Technical Report, Stanford University, Stanford, CA, USA (1999)

    Google Scholar 

  23. Kleinberg, J.: Authoritative Sources in a Hyperlinked Environment. Journal of the ACM 46(5), 604–632 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  24. Liu, B.: Web Data Mining. In: Information Retrieval and Web Search, ch. 6, pp. 183–215. Springer Press (2007) ISBN: 978-3-540-37881-5

    Google Scholar 

  25. Madhavan, J., Ko, D., Kot, L., Ganapathy, V., Rasmussen, A., Halevy, A.: Google’s Deep Web Crawl. In: Proceedings of VLDB 2008, Auckland, New Zealand (2008)

    Google Scholar 

  26. Mangai, A., Kumar, S.: A Novel Approach for Web Page Classification Using Optimum Features. International Journal of Computer Science and Network Security (IJCSNS) 11(5) (2011)

    Google Scholar 

  27. Narayan, B.L., Murthy, C.A., Pal, S.K.: Topic Continuity for Web Document Categorization and Ranking. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, Halifax, Canada, pp. 310–315 (2004)

    Google Scholar 

  28. Ntoulas, A., Zerfos, P., Cho, J.: Downloading Texual Hidden Web Content Through Keyword Queries. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (2005)

    Google Scholar 

  29. Raghavan, S., Garcia-Molina, H.: Crawling The Hidden Web. In: Proceedings of the 27th VLDB Conference, Roma, Italy (2001)

    Google Scholar 

  30. Richardson, M., Domingos, P.: The Intelligent Surfer: Probabilistic Combination of Link and Content Information in Page Rank, vol. 14. MIT Press, MA (2002)

    Google Scholar 

  31. Robertson, S.E., Walker, S., Beaulieu, M.: Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Filtering Tracks. In: Proceedings of the 7th Text Retrieval Conference (TREC-7), pp. 253–264 (1999)

    Google Scholar 

  32. Singhal, A.: Modern Information Retrieval: A Brief Overview. IEEE Data Engineering Bulletin 24(4), 35–43 (2001)

    Google Scholar 

  33. W3C, The common log file format (1995), http://www.w3.org/Daemon/User/Config/Logging.html (retrieved November 2010)

  34. Ypma, A., Heskes, T.: Categorization of Web Pages and User Clustering with Mixtures of Hidden Markov Models. In: Zaïane, O.R., Srivastava, J., Spiliopoulou, M., Masand, B. (eds.) WebKDD 2003. LNCS (LNAI), vol. 2703, pp. 35–49. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  35. Yu, P.S., Li, X., Liu, B.: Adding the Temporal Dimension to Search- A Case Study in Publication Search. In: Proceedings of Web Intelligence (WI 2005), pp. 543–549 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Ahmadi-Abkenari, F., Selamat, A. (2013). A Clickstream Based Web Page Importance Metric for Customized Search Engines. In: Nguyen, N.T. (eds) Transactions on Computational Collective Intelligence XII. Lecture Notes in Computer Science, vol 8240. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53878-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-53878-0_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53877-3

  • Online ISBN: 978-3-642-53878-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics