Semantic Analysis for Data Preparation of Web Usage Mining

Jung, Jason J.; Jo, Geun-Sik

doi:10.1007/978-3-540-24677-0_128

Semantic Analysis for Data Preparation of Web Usage Mining

Jason J. Jung¹⁹ &
Geun-Sik Jo¹⁹

Conference paper

1483 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3029))

Abstract

As the web usage patterns from clients are getting more complex, simple sessionizations based on time and navigation-oriented heuristics have been restricted to exploit various kinds of rule discovering methods. In this paper, we present semantic analysis approach based on semantic session reconstruction as finding out semantic outliers from web log data. Web directory service is applied to enrich semantics to web logs, categorizing them to all possible hierarchical paths. In order to detect the candidate set of session identifiers, semantic factors like semantic mean, deviation, and distance matrix are established. Eventually, each semantic session is obtained based on nested repetition of top-down partitioning and evaluation process. For experiment, we applied this ontology-oriented heuristics to sessionize the access log files for one week from IRCache. Compared with time-oriented heuristics, more than 48% of sessions were additionally detected by semantic outlier analysis. It means that we can conceptually track the behavior of users tending to easily change their intentions and interests, or simultaneously try to search various kinds of information on the web.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cooley, R., Srivastava, J., Mobasher, B.: Web Mining: Information and Pattern Discovery on the World Wide Web. In: Proc. of the 9th IEEE Int. Conf. on Tools with Artificial Intelligence (1997)
Google Scholar
Batista, P., Silva, M.J.: Web Access Mining from an On-line Newspaper Logs. In: Proc. 12th Int. Meeting of the Euro Working Group on Decision Support Systems (2001)
Google Scholar
Bonchi, F., Giannotti, F., Gozzi, C., Manco, G., Nanni, M., Pedreschi, D., Renso, C., Ruggieri, S.: Web log data warehousing and mining for intelligent web caching. Data and Knowledge Engineering 39(2), 165–189 (2001)
Article MATH Google Scholar
Cooley, R., Mobasher, B., Srivastava, J.: Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems 1(1), 5–32 (1999)
Google Scholar
Berendt, B., Spiliopoulou, M.: Analysing navigation behaviour in web sites integrating multiple information systems. The VLDB Journal 9(1), 56–75 (2000)
Article Google Scholar
Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on Web usage mining. Communications of the ACM 43(8) (2000)
Google Scholar
Berendt, B., Mobasher, B., Nakagawa, M., Spiliopoulou, M.: The Impact of Site Structure and User Environment on Session Reconstruction in Web Usage Analysis. In: Proc. of the 4th WebKDD Workshop at the ACM-SIGKDD Conf. on Knowledge Discovery in Databases (2002)
Google Scholar
Chen, Z., Tao, L., Wang, J., Wenyin, L., Ma, W.-Y.: A Unified Framework for Web Link Analysis. In: Proc. of the 3rd Int. Conf. on Web Information Systems Engineering, pp. 63–72 (2002)
Google Scholar
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: Proc. of the ACM SIGMOD Conf. on Management of Data, pp. 427–438 (2000)
Google Scholar
IRCache Users Guide, http://www.ircache.net/
Arning, A., Agrawal, R., Raghavan, P.: A Linear Model for Deviation Detection in Large Databases. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, pp. 164–169 (1996)
Google Scholar
Menasalvas, E., Millan, S., Pena, J.M., Hadjimichael, M., Marban, O.: Subsessions: a granular approach to click path analysis. In: Proc. of the IEEE Int. Conf. on Fuzzy Systems, pp. 878–883 (2002)
Google Scholar
Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, Springer, Heidelberg (2000)
Chapter Google Scholar
Gruber, T.: What is an Ontology?, http://www-ksl.stanford.edu/kst/what-is-an-ontology.html
Labrou, Y., Finin, T.: Yahoo! as an Ontology: Using Yahoo! Categories to Describe Documents. In: Proc. of the 8th Int. Conf. on Information Knowledge Management, pp. 180–187 (1999)
Google Scholar
McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Building Domain-Specific Search Engines with Machine Learning Techniques. In: AAAI Spring Symposium (1999)
Google Scholar
Jung, J.J., Yoon, J.-S., Jo, G.-S.: Collaborative Information Filtering by Using Categorized Bookmarks on the Web. In: Proc. of the 14th Int. Conf. on Applications of Prolog, pp. 343–357 (2001)
Google Scholar
Levenshtein, I.V.: Binary Codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory 10(8), 707–710 (1966)
MathSciNet Google Scholar
Aggarwal, C., Wolf, J.L., Yu, P.S.: Caching on the World Wide Web. IEEE Tran. on Knowldge and Data Engineering 11(1), 94–107 (1999)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Intelligent E-Commerce Systems Laboratory, School of Computer Engineering, Inha University, 253 Yonghyun-dong, Incheon, Korea, 402-751
Jason J. Jung & Geun-Sik Jo

Authors

Jason J. Jung
View author publications
You can also search for this author in PubMed Google Scholar
Geun-Sik Jo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Information Technology, National Research Council of Canada, 1200 Montreal Read, M-50, K1A 0R6, Ottawa, Ontario, Canada
Bob Orchard
Institute for Information Technology, National Research Council, Canada
Chunsheng Yang
Department of Computer Science, Texas State University-San Marcos, Nueces 247, 601 University Drive, TX 78666-4616, San Marcos, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jung, J.J., Jo, GS. (2004). Semantic Analysis for Data Preparation of Web Usage Mining. In: Orchard, B., Yang, C., Ali, M. (eds) Innovations in Applied Artificial Intelligence. IEA/AIE 2004. Lecture Notes in Computer Science(), vol 3029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24677-0_128

Download citation

DOI: https://doi.org/10.1007/978-3-540-24677-0_128
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22007-7
Online ISBN: 978-3-540-24677-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics