Skip to main content

Experimental Verification of the Dependence Between the Expected and Observed Visit Rate of Web Pages

  • Conference paper
  • First Online:
Book cover Advanced Intelligent Computing Theories and Applications (ICIC 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9227))

Included in the following conference series:

  • 2954 Accesses

Abstract

This paper is focused on a utilization of the web usage mining and web structure mining methods. We tried to answer the question if the expected visit rate of individual web pages correlates with the observed visit rate of the same web pages. We used web server log files as a data source. We applied several log file pre-processing methods to identify the user sessions on different levels of granularity. We found out that the quality of acquired knowledge about the users’ behaviour depends on the method of the session identification. We have experimentally proved a higher dependence between the observed and expected visit rates of the examined web pages in well-prepared files with identified user sessions. We found out statistically significant differences between PageRank and a real visit rate in the files with application of more advanced methods of session identification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: discovery and applications of usage patterns from Web data. SIGKDD Explorations Newsletter 1, pp. 12–23 (2000)

    Google Scholar 

  2. Romero, C., Ventura, S., Zafra, A., Bra, P.D.: Applying web usage mining for personalizing hyperlinks in web-based adaptive educational systems. Comput. Educ. 53, 828–840 (2009)

    Article  Google Scholar 

  3. Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowl. Inf. Syst. 1, 5–32 (1999)

    Article  Google Scholar 

  4. Zhang, C., Zhuang, L.: New path filling method on data preprocessing in web mining. In: Proceedings of Computer and Information Science 1, pp. 112–115 (2008)

    Google Scholar 

  5. Li, Y., Feng, B., Mao, Q.: Research on path completion technique in web usage mining. In: Proceedings of the 2008 International Symposium on Computer Science and Computational Technology, vol. 01, pp. 554–559. IEEE Computer Society (2008)

    Google Scholar 

  6. Huynh, T., Miller, J.: Empirical observations on the session timeout threshold. Inf. Process. Manag. 45, 513–528 (2009)

    Article  Google Scholar 

  7. Downey, D., Dumais, S., Horvitz, E.: Models of searching and browsing: languages, studies, and applications. In: Proceedings of the 20th international joint conference on Artifical intelligence, pp. 2740–2747. Morgan Kaufmann Publishers Inc., Hyderabad (2007)

    Google Scholar 

  8. Chien, S., Immorlica, N.: Semantic similarity between search engine queries using temporal correlation. In: Proceedings of the 14th International Conference on World Wide Web, pp. 2–11. ACM, Chiba, (2005)

    Google Scholar 

  9. He, D., Göker, A.: Detecting session boundaries from web user logs. In: Proceedings of the BCS-IRSG 22nd Annual Colloquium on Information Retrieval Research, pp. 57–66 (2000)

    Google Scholar 

  10. Radlinski, F., Joachims, T.: Query chains: learning to rank from implicit feedback. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge discovery in Data Mining, pp. 239–248. ACM, Chicago (2005)

    Google Scholar 

  11. Mehrzadi, D., Feitelson, D.G.: On extracting session data from activity logs. In: Proceedings of the 5th Annual International Systems and Storage Conference, pp. 1–7. ACM, Haifa (2012)

    Google Scholar 

  12. Guerbas, A., Addam, O., Zaarour, O., Nagi, M., Elhajj, A., Ridley, M., Alhajj, R.: Effective web log mining and online navigational pattern prediction. Knowl. Based Syst. 49, 50–62 (2013)

    Article  Google Scholar 

  13. Cooley, R.: Web usage mining: discovery and application of interesting patterns from web data. Ph.D. thesis. University of Minnesota (2000)

    Google Scholar 

  14. Schmitt, E., Manning, H., Paul, Y., Tong, J.: Measuring Web Success. Forrester Report (1999)

    Google Scholar 

  15. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30, 107–117 (1998)

    Google Scholar 

  16. Jain, A., Sharma, R., Dixit, G., Tomar, V.: Page ranking algorithms in web mining, limitations of existing methods and a new method for indexing web pages. In: Proceedings of the 2013 International Conference on Communication Systems and Network Technologies, pp. 640–645. IEEE Computer Society (2013)

    Google Scholar 

  17. Lorentzen, D.G.: Webometrics benefitting from web mining? An investigation of methods and applications of two research fields. Scientometrics 99, 409–445 (2014)

    Article  Google Scholar 

  18. Lili, Y., Yingbin, W., Zhanji, G., Yizhuo, C.: Research on PageRank and hyperlink-induced topic search in web structure mining. In: Conference Research on PageRank and Hyperlink-Induced Topic Search in Web Structure Mining, pp. 1–4 (2011)

    Google Scholar 

  19. Wu, G., Wei, Y.: Arnoldi versus GMRES for computing PageRank: a theoretical contribution to google’s PageRank problem. ACM Trans. Inf. Syst. 28, 1–28 (2010)

    Article  Google Scholar 

  20. Xu, G., Zhang, Y., Li, L.: Web Mining and Social Networking Techniques and Applications. Springer, Heidelberg (2011)

    Book  Google Scholar 

  21. Ahmadi-Abkenari, F., Selamat, A.: A clickstream based web page importance metric for customized search engines. In: Nguyen, N. (ed.) Transactions on Computational Collective Intelligence XII, vol. 8240, pp. 21–41. Springer, Berlin Heidelberg (2013)

    Chapter  Google Scholar 

  22. Agichtein, E., Brill, E., Dumais, S.: Improving web search ranking by incorporating user behavior information. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research And Development in Information Retrieval, pp. 19–26. ACM, Seattle (2006)

    Google Scholar 

  23. Meiss, M.R., Menczer, F., Fortunato, S., Flammini, A., Vespignani, A.: Ranking web sites with real user traffic. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 65–76. ACM, Palo Alto (2008)

    Google Scholar 

  24. Su, J.-H., Wang, B.-W., Tseng, V.S.: Effective Ranking and Recommendation on web page retrieval by integrating association mining and PageRank. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 03, pp. 455–458. IEEE Computer Society (2008)

    Google Scholar 

  25. Srikant, R., Yang, Y.: Mining web logs to improve website organization. In: Proceedings of the 10th International Conference on World Wide Web, pp. 430–437. ACM, Hong Kong (2001)

    Google Scholar 

  26. Liu, H., Keselj, V.: Combined mining of web server logs and web contents for classifying user navigation patterns and predicting users’ future requests. Data Knowl. Eng. 61, 304–330 (2007)

    Article  MATH  Google Scholar 

  27. Das, R., Turkoglu, I.: Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method. Expert Syst. Appl. 36, 6635–6644 (2009)

    Article  Google Scholar 

  28. Yang, Q., Ling, C., Gao, J.: Mining web logs for actionable knowledge. In: Liu, J., Zhong, N. (eds.) Intelligent Technologies for Information Analysis, pp. 169–191. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  29. Eirinaki, M., Vazirgiannis, M.: Usage-based PageRank for web personalization. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 130–137. IEEE Computer Society (2005)

    Google Scholar 

  30. Masseglia, F., Poncelet, P., Teisseire, M.: Using data mining techniques on web access logs to dynamically improve hypertext structure. SIGWEB Newsletter, 8, pp. 13–19 (1999)

    Google Scholar 

  31. Tripathy, A., Patra, P.K.: A Web mining architectural model of distributed crawler for internet searches using PageRank algorithm. In: Proceedings of the 2008 IEEE Asia-Pacific Services Computing Conference, pp. 513–518. IEEE Computer Society (2008)

    Google Scholar 

  32. Fang, Y., Huang, Z.: An improved algorithm for session identification on web log. In: Wang, F., Gong, Z., Luo, X., Lei, J. (eds.) Web Information Systems and Mining, vol. 6318, pp. 53–60. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  33. Munk, M., Kapusta, J., Švec, P.: Data preprocessing evaluation for web log mining: reconstruction of activities of a web visitor. Procedia Comput. Sci. 1, 2273–2280 (2010)

    Article  Google Scholar 

  34. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc., (1994)

    Google Scholar 

  35. Pilkova, A., Volna, J., Papula, J., Holienka, M.: The influence of intellectual capital on firm performance among slovak SMEs. In: Proceedings of the 10th International Conference on Intellectual Capital, Knowledge Management and Organisational Learning (ICICKM-2013), pp. 329–338 (2013)

    Google Scholar 

Download references

Acknowledgments

This paper is supported by the project VEGA 1/0392/13 Modelling of Stakeholders’ Behaviour in Commercial Bank during the Recent Financial Crisis and Expectations of Basel Regulations under Pillar 3- Market Discipline and project KEGA 015UKF-4/2013 Modern computer science – New methods and forms for effective education.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jozef Kapusta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Kapusta, J., Munk, M., Drlik, M. (2015). Experimental Verification of the Dependence Between the Expected and Observed Visit Rate of Web Pages. In: Huang, DS., Han, K. (eds) Advanced Intelligent Computing Theories and Applications. ICIC 2015. Lecture Notes in Computer Science(), vol 9227. Springer, Cham. https://doi.org/10.1007/978-3-319-22053-6_66

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-22053-6_66

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-22052-9

  • Online ISBN: 978-3-319-22053-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics