Skip to main content

Semantic Analysis for Data Preparation of Web Usage Mining

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3029))

Abstract

As the web usage patterns from clients are getting more complex, simple sessionizations based on time and navigation-oriented heuristics have been restricted to exploit various kinds of rule discovering methods. In this paper, we present semantic analysis approach based on semantic session reconstruction as finding out semantic outliers from web log data. Web directory service is applied to enrich semantics to web logs, categorizing them to all possible hierarchical paths. In order to detect the candidate set of session identifiers, semantic factors like semantic mean, deviation, and distance matrix are established. Eventually, each semantic session is obtained based on nested repetition of top-down partitioning and evaluation process. For experiment, we applied this ontology-oriented heuristics to sessionize the access log files for one week from IRCache. Compared with time-oriented heuristics, more than 48% of sessions were additionally detected by semantic outlier analysis. It means that we can conceptually track the behavior of users tending to easily change their intentions and interests, or simultaneously try to search various kinds of information on the web.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cooley, R., Srivastava, J., Mobasher, B.: Web Mining: Information and Pattern Discovery on the World Wide Web. In: Proc. of the 9th IEEE Int. Conf. on Tools with Artificial Intelligence (1997)

    Google Scholar 

  2. Batista, P., Silva, M.J.: Web Access Mining from an On-line Newspaper Logs. In: Proc. 12th Int. Meeting of the Euro Working Group on Decision Support Systems (2001)

    Google Scholar 

  3. Bonchi, F., Giannotti, F., Gozzi, C., Manco, G., Nanni, M., Pedreschi, D., Renso, C., Ruggieri, S.: Web log data warehousing and mining for intelligent web caching. Data and Knowledge Engineering 39(2), 165–189 (2001)

    Article  MATH  Google Scholar 

  4. Cooley, R., Mobasher, B., Srivastava, J.: Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems 1(1), 5–32 (1999)

    Google Scholar 

  5. Berendt, B., Spiliopoulou, M.: Analysing navigation behaviour in web sites integrating multiple information systems. The VLDB Journal 9(1), 56–75 (2000)

    Article  Google Scholar 

  6. Mobasher, B., Cooley, R., Srivastava, J.: Automatic personalization based on Web usage mining. Communications of the ACM 43(8) (2000)

    Google Scholar 

  7. Berendt, B., Mobasher, B., Nakagawa, M., Spiliopoulou, M.: The Impact of Site Structure and User Environment on Session Reconstruction in Web Usage Analysis. In: Proc. of the 4th WebKDD Workshop at the ACM-SIGKDD Conf. on Knowledge Discovery in Databases (2002)

    Google Scholar 

  8. Chen, Z., Tao, L., Wang, J., Wenyin, L., Ma, W.-Y.: A Unified Framework for Web Link Analysis. In: Proc. of the 3rd Int. Conf. on Web Information Systems Engineering, pp. 63–72 (2002)

    Google Scholar 

  9. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  10. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient Algorithms for Mining Outliers from Large Data Sets. In: Proc. of the ACM SIGMOD Conf. on Management of Data, pp. 427–438 (2000)

    Google Scholar 

  11. IRCache Users Guide, http://www.ircache.net/

  12. Arning, A., Agrawal, R., Raghavan, P.: A Linear Model for Deviation Detection in Large Databases. In: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, pp. 164–169 (1996)

    Google Scholar 

  13. Menasalvas, E., Millan, S., Pena, J.M., Hadjimichael, M., Marban, O.: Subsessions: a granular approach to click path analysis. In: Proc. of the IEEE Int. Conf. on Fuzzy Systems, pp. 878–883 (2002)

    Google Scholar 

  14. Pei, J., Han, J., Mortazavi-Asl, B., Zhu, H.: Mining Access Patterns Efficiently from Web Logs. In: Terano, T., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  15. Gruber, T.: What is an Ontology?, http://www-ksl.stanford.edu/kst/what-is-an-ontology.html

  16. Labrou, Y., Finin, T.: Yahoo! as an Ontology: Using Yahoo! Categories to Describe Documents. In: Proc. of the 8th Int. Conf. on Information Knowledge Management, pp. 180–187 (1999)

    Google Scholar 

  17. McCallum, A., Nigam, K., Rennie, J., Seymore, K.: Building Domain-Specific Search Engines with Machine Learning Techniques. In: AAAI Spring Symposium (1999)

    Google Scholar 

  18. Jung, J.J., Yoon, J.-S., Jo, G.-S.: Collaborative Information Filtering by Using Categorized Bookmarks on the Web. In: Proc. of the 14th Int. Conf. on Applications of Prolog, pp. 343–357 (2001)

    Google Scholar 

  19. Levenshtein, I.V.: Binary Codes capable of correcting deletions, insertions, and reversals. Cybernetics and Control Theory 10(8), 707–710 (1966)

    MathSciNet  Google Scholar 

  20. Aggarwal, C., Wolf, J.L., Yu, P.S.: Caching on the World Wide Web. IEEE Tran. on Knowldge and Data Engineering 11(1), 94–107 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jung, J.J., Jo, GS. (2004). Semantic Analysis for Data Preparation of Web Usage Mining. In: Orchard, B., Yang, C., Ali, M. (eds) Innovations in Applied Artificial Intelligence. IEA/AIE 2004. Lecture Notes in Computer Science(), vol 3029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24677-0_128

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24677-0_128

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22007-7

  • Online ISBN: 978-3-540-24677-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics