Abstract
As the web usage patterns from clients are getting more complex, simple sessionizations based on time and navigation-oriented heuristics have been restricted to exploit various kinds of rule discovering methods. In this paper, we present semantic analysis approach based on semantic session reconstruction as finding out semantic outliers from web log data. Web directory service is applied to enrich semantics to web logs, categorizing them to all possible hierarchical paths. In order to detect the candidate set of session identifiers, semantic factors like semantic mean, deviation, and distance matrix are established. Eventually, each semantic session is obtained based on nested repetition of top-down partitioning and evaluation process. For experiment, we applied this ontology-oriented heuristics to sessionize the access log files for one week from IRCache. Compared with time-oriented heuristics, more than 48% of sessions were additionally detected by semantic outlier analysis. It means that we can conceptually track the behavior of users tending to easily change their intentions and interests, or simultaneously try to search various kinds of information on the web.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Cooley, R., Srivastava, J., Mobasher, B.: Web Mining: Information and Pattern Discovery on the World Wide Web. In: Proc. of the 9th IEEE Int. Conf. on Tools with Artificial Intelligence (1997)
Batista, P., Silva, M.J.: Web Access Mining from an On-line Newspaper Logs. In: Proc. 12th Int. Meeting of the Euro Working Group on Decision Support Systems (2001)
Bonchi, F., Giannotti, F., Gozzi, C., Manco, G., Nanni, M., Pedreschi, D., Renso, C., Ruggieri, S.: Web log data warehousing and mining for intelligent web caching. Data and Knowledge Engineering 39(2), 165–189 (2001)
Cooley, R., Mobasher, B., Srivastava, J.: Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems 1(1), 5–32 (1999)
Berendt, B., Spiliopoulou, M.: Analysing navigation behaviour in web sites integrating multiple information systems. The VLDB Journal 9(1), 56–75 (2000)
Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on Web usage mining. Communications of the ACM 43(8) (2000)
Berendt, B., Mobasher, B., Nakagawa, M., Spiliopoulou, M.: The Impact of Site Structure and User Environment on Session Reconstruction in Web Usage Analysis. In: Proc. of the 4th WebKDD Workshop at the ACM-SIGKDD Conf. on Knowledge Discovery in Databases (2002)
Chen, Z., Tao, L., Wang, J., Wenyin, L., Ma, W.-Y.: A Unified Framework for Web Link Analysis. In: Proc. of the 3rd Int. Conf. on Web Information Systems Engineering, pp. 63–72 (2002)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: Proc. of the ACM SIGMOD Conf. on Management of Data, pp. 427–438 (2000)
IRCache Users Guide, http://www.ircache.net/
Arning, A., Agrawal, R., Raghavan, P.: A Linear Model for Deviation Detection in Large Databases. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, pp. 164–169 (1996)
Menasalvas, E., Millan, S., Pena, J.M., Hadjimichael, M., Marban, O.: Subsessions: a granular approach to click path analysis. In: Proc. of the IEEE Int. Conf. on Fuzzy Systems, pp. 878–883 (2002)
Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, Springer, Heidelberg (2000)
Gruber, T.: What is an Ontology?, http://www-ksl.stanford.edu/kst/what-is-an-ontology.html
Labrou, Y., Finin, T.: Yahoo! as an Ontology: Using Yahoo! Categories to Describe Documents. In: Proc. of the 8th Int. Conf. on Information Knowledge Management, pp. 180–187 (1999)
McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Building Domain-Specific Search Engines with Machine Learning Techniques. In: AAAI Spring Symposium (1999)
Jung, J.J., Yoon, J.-S., Jo, G.-S.: Collaborative Information Filtering by Using Categorized Bookmarks on the Web. In: Proc. of the 14th Int. Conf. on Applications of Prolog, pp. 343–357 (2001)
Levenshtein, I.V.: Binary Codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory 10(8), 707–710 (1966)
Aggarwal, C., Wolf, J.L., Yu, P.S.: Caching on the World Wide Web. IEEE Tran. on Knowldge and Data Engineering 11(1), 94–107 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jung, J.J., Jo, GS. (2004). Semantic Analysis for Data Preparation of Web Usage Mining. In: Orchard, B., Yang, C., Ali, M. (eds) Innovations in Applied Artificial Intelligence. IEA/AIE 2004. Lecture Notes in Computer Science(), vol 3029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24677-0_128
Download citation
DOI: https://doi.org/10.1007/978-3-540-24677-0_128
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22007-7
Online ISBN: 978-3-540-24677-0
eBook Packages: Springer Book Archive