Abstract
This paper is focused on a utilization of the web usage mining and web structure mining methods. We tried to answer the question if the expected visit rate of individual web pages correlates with the observed visit rate of the same web pages. We used web server log files as a data source. We applied several log file pre-processing methods to identify the user sessions on different levels of granularity. We found out that the quality of acquired knowledge about the users’ behaviour depends on the method of the session identification. We have experimentally proved a higher dependence between the observed and expected visit rates of the examined web pages in well-prepared files with identified user sessions. We found out statistically significant differences between PageRank and a real visit rate in the files with application of more advanced methods of session identification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Srivastava, J., Cooley, R., Deshpande, M., Tan, P.-N.: Web usage mining: discovery and applications of usage patterns from Web data. SIGKDD Explorations Newsletter 1, pp. 12–23 (2000)
Romero, C., Ventura, S., Zafra, A., Bra, P.D.: Applying web usage mining for personalizing hyperlinks in web-based adaptive educational systems. Comput. Educ. 53, 828–840 (2009)
Cooley, R., Mobasher, B., Srivastava, J.: Data preparation for mining world wide web browsing patterns. Knowl. Inf. Syst. 1, 5–32 (1999)
Zhang, C., Zhuang, L.: New path filling method on data preprocessing in web mining. In: Proceedings of Computer and Information Science 1, pp. 112–115 (2008)
Li, Y., Feng, B., Mao, Q.: Research on path completion technique in web usage mining. In: Proceedings of the 2008 International Symposium on Computer Science and Computational Technology, vol. 01, pp. 554–559. IEEE Computer Society (2008)
Huynh, T., Miller, J.: Empirical observations on the session timeout threshold. Inf. Process. Manag. 45, 513–528 (2009)
Downey, D., Dumais, S., Horvitz, E.: Models of searching and browsing: languages, studies, and applications. In: Proceedings of the 20th international joint conference on Artifical intelligence, pp. 2740–2747. Morgan Kaufmann Publishers Inc., Hyderabad (2007)
Chien, S., Immorlica, N.: Semantic similarity between search engine queries using temporal correlation. In: Proceedings of the 14th International Conference on World Wide Web, pp. 2–11. ACM, Chiba, (2005)
He, D., Göker, A.: Detecting session boundaries from web user logs. In: Proceedings of the BCS-IRSG 22nd Annual Colloquium on Information Retrieval Research, pp. 57–66 (2000)
Radlinski, F., Joachims, T.: Query chains: learning to rank from implicit feedback. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge discovery in Data Mining, pp. 239–248. ACM, Chicago (2005)
Mehrzadi, D., Feitelson, D.G.: On extracting session data from activity logs. In: Proceedings of the 5th Annual International Systems and Storage Conference, pp. 1–7. ACM, Haifa (2012)
Guerbas, A., Addam, O., Zaarour, O., Nagi, M., Elhajj, A., Ridley, M., Alhajj, R.: Effective web log mining and online navigational pattern prediction. Knowl. Based Syst. 49, 50–62 (2013)
Cooley, R.: Web usage mining: discovery and application of interesting patterns from web data. Ph.D. thesis. University of Minnesota (2000)
Schmitt, E., Manning, H., Paul, Y., Tong, J.: Measuring Web Success. Forrester Report (1999)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Comput. Netw. 30, 107–117 (1998)
Jain, A., Sharma, R., Dixit, G., Tomar, V.: Page ranking algorithms in web mining, limitations of existing methods and a new method for indexing web pages. In: Proceedings of the 2013 International Conference on Communication Systems and Network Technologies, pp. 640–645. IEEE Computer Society (2013)
Lorentzen, D.G.: Webometrics benefitting from web mining? An investigation of methods and applications of two research fields. Scientometrics 99, 409–445 (2014)
Lili, Y., Yingbin, W., Zhanji, G., Yizhuo, C.: Research on PageRank and hyperlink-induced topic search in web structure mining. In: Conference Research on PageRank and Hyperlink-Induced Topic Search in Web Structure Mining, pp. 1–4 (2011)
Wu, G., Wei, Y.: Arnoldi versus GMRES for computing PageRank: a theoretical contribution to google’s PageRank problem. ACM Trans. Inf. Syst. 28, 1–28 (2010)
Xu, G., Zhang, Y., Li, L.: Web Mining and Social Networking Techniques and Applications. Springer, Heidelberg (2011)
Ahmadi-Abkenari, F., Selamat, A.: A clickstream based web page importance metric for customized search engines. In: Nguyen, N. (ed.) Transactions on Computational Collective Intelligence XII, vol. 8240, pp. 21–41. Springer, Berlin Heidelberg (2013)
Agichtein, E., Brill, E., Dumais, S.: Improving web search ranking by incorporating user behavior information. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research And Development in Information Retrieval, pp. 19–26. ACM, Seattle (2006)
Meiss, M.R., Menczer, F., Fortunato, S., Flammini, A., Vespignani, A.: Ranking web sites with real user traffic. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 65–76. ACM, Palo Alto (2008)
Su, J.-H., Wang, B.-W., Tseng, V.S.: Effective Ranking and Recommendation on web page retrieval by integrating association mining and PageRank. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 03, pp. 455–458. IEEE Computer Society (2008)
Srikant, R., Yang, Y.: Mining web logs to improve website organization. In: Proceedings of the 10th International Conference on World Wide Web, pp. 430–437. ACM, Hong Kong (2001)
Liu, H., Keselj, V.: Combined mining of web server logs and web contents for classifying user navigation patterns and predicting users’ future requests. Data Knowl. Eng. 61, 304–330 (2007)
Das, R., Turkoglu, I.: Creating meaningful data from web logs for improving the impressiveness of a website by using path analysis method. Expert Syst. Appl. 36, 6635–6644 (2009)
Yang, Q., Ling, C., Gao, J.: Mining web logs for actionable knowledge. In: Liu, J., Zhong, N. (eds.) Intelligent Technologies for Information Analysis, pp. 169–191. Springer, Heidelberg (2004)
Eirinaki, M., Vazirgiannis, M.: Usage-based PageRank for web personalization. In: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 130–137. IEEE Computer Society (2005)
Masseglia, F., Poncelet, P., Teisseire, M.: Using data mining techniques on web access logs to dynamically improve hypertext structure. SIGWEB Newsletter, 8, pp. 13–19 (1999)
Tripathy, A., Patra, P.K.: A Web mining architectural model of distributed crawler for internet searches using PageRank algorithm. In: Proceedings of the 2008 IEEE Asia-Pacific Services Computing Conference, pp. 513–518. IEEE Computer Society (2008)
Fang, Y., Huang, Z.: An improved algorithm for session identification on web log. In: Wang, F., Gong, Z., Luo, X., Lei, J. (eds.) Web Information Systems and Mining, vol. 6318, pp. 53–60. Springer, Heidelberg (2010)
Munk, M., Kapusta, J., Švec, P.: Data preprocessing evaluation for web log mining: reconstruction of activities of a web visitor. Procedia Comput. Sci. 1, 2273–2280 (2010)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules in large databases. In: Proceedings of the 20th International Conference on Very Large Data Bases, pp. 487–499. Morgan Kaufmann Publishers Inc., (1994)
Pilkova, A., Volna, J., Papula, J., Holienka, M.: The influence of intellectual capital on firm performance among slovak SMEs. In: Proceedings of the 10th International Conference on Intellectual Capital, Knowledge Management and Organisational Learning (ICICKM-2013), pp. 329–338 (2013)
Acknowledgments
This paper is supported by the project VEGA 1/0392/13 Modelling of Stakeholders’ Behaviour in Commercial Bank during the Recent Financial Crisis and Expectations of Basel Regulations under Pillar 3- Market Discipline and project KEGA 015UKF-4/2013 Modern computer science – New methods and forms for effective education.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Kapusta, J., Munk, M., Drlik, M. (2015). Experimental Verification of the Dependence Between the Expected and Observed Visit Rate of Web Pages. In: Huang, DS., Han, K. (eds) Advanced Intelligent Computing Theories and Applications. ICIC 2015. Lecture Notes in Computer Science(), vol 9227. Springer, Cham. https://doi.org/10.1007/978-3-319-22053-6_66
Download citation
DOI: https://doi.org/10.1007/978-3-319-22053-6_66
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22052-9
Online ISBN: 978-3-319-22053-6
eBook Packages: Computer ScienceComputer Science (R0)