Combining Web Usage Mining and XML Mining in a Real Case Study

Facca, Federico Michele

doi:10.1007/978-3-540-74951-6_2

Federico Michele Facca¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4737))

Included in the following conference series:

Workshop on Web Mining

584 Accesses

Abstract

In this paper we report our first extended experiments on Conceptual Web log generation and XML Mining over generated Conceptual logs. Conceptual logs are XML Web server log containing rich information about the structure of a Web site and its content. Furthermore they can be automatically generated starting from a proper logging facility and a conceptual application model. This allows an easier analysis of the results of the mining process, thanks to the rich information provided and allows to perform the data mining process at different levels of abstraction. In this work we use WebML as conceptual model, and XMINE as mining tool; nevertheless the underlying idea is of general validity and can be applied to any other conceptual modeling framework and mining technique.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Braga, D., Campi, A., Ceri, S., Klemettinen, M., Lanzi, P.: A tool for extracting xml association rules. In: Proceedings of ICTAI 2002, 4-6 November, IEEE Computer Society, Los Alamitos (2002)
Google Scholar
Cooley, R.: The use of web structure and content to identify subjectively interesting web usage patterns. ACM Trans. Inter. Tech. 3(2), 93–116 (2003)
Article Google Scholar
Stumme, G., Berendt, B., Hotho, A.: Usage mining for and on the semantic web. In: Next Generation Data Mining. Proc. NSF Workshop, Baltimore, November 2002, pp. 77–86 (2002)
Google Scholar
Heer, J., Chi, E.: Identification of web user traffic composition using multi-modal clustering and information scent. In: Proceedings of the Workshop on Web Mining, 2001 SIAM Conference on Data Mining (2001)
Google Scholar
Punin, J.R., Krishnamoorthy, M.S., Zaki, M.J.: Logml: Log markup language for web usage mining. In: Kohavi, R., Masand, B., Spiliopoulou, M., Srivastava, J. (eds.) WEBKDD 2001 - Mining Web Log Data Across All Customers Touch Points. LNCS (LNAI), vol. 2356, Springer, Heidelberg (2002)
Chapter Google Scholar
Facca, F.M., Lanzi, P.L.: Mining interesting knowledge from weblogs: a survey. Data & Knowledge Engineering 53(3), 225–241 (2005)
Article Google Scholar
Fraternali, P., Lanzi, P.L., Matera, M., Maurino, A.: Model-driven web usage analysis for the evaluation of web application quality. J. Web Eng. 3(2), 124–152 (2004)
Google Scholar
Ceri, S., Fraternali, P., Bongio, A., Brambilla, M., Comai, S., Matera, M.: Designing Data-Intensive Web Applications. Morgan Kaufmann, San Francisco, CA (2002)
Google Scholar
Web Models: Webratio case tool (2006), http://www.webratio.com
Termier, A., Rousset, M.C., Sebag, M.: Treefinder: a first step towards xml data mining. In: ICDM 2002. Proceedings of the 2002 IEEE International Conference on Data Mining, pp. 450–457. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Zaki, M.J., Aggarwal, C.C.: Xrules: an effective structural classifier for xml data. In: KDD 2003. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 316–325. ACM Press, New York (2003)
Chapter Google Scholar
Lian, W., Cheung, D.W., Mamoulis, N., Yiu, S.M.: An efficient and scalable algorithm for clustering xml documents by structure. IEEE Transactions on Knowledge and Data Engineering 16(1), 82–96 (2004)
Article Google Scholar
Ahonen, H., Heinonen, O., Klemettinen, M., Verkamo, A.I.: Mining in the phrasal frontier. In: Komorowski, J., Żytkow, J.M. (eds.) PKDD 1997. LNCS, vol. 1263, pp. 343–350. Springer, Heidelberg (1997)
Google Scholar
Tan, H., Dillon, T., Feng, L., Chang, E., Hadzic, F.: X3-Miner: Mining Patterns from XML Database. In: Proceedings of the 6th International Conference on Data Mining, Text Mining and their Business Applications, Skiathos, Greece (2005)
Google Scholar
Facca, F.M.: Mining patterns from xml data: a structure-based approach. Master’s thesis, Politecnico di Milano, Dipartimento di Elettronica e Informatica (2004)
Google Scholar
Zaki, M.: Efficiently mining frequent trees in a forest. In: Hand, D., Keim, D., Ng, R. (eds.) KDD-02. Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 71–80. ACM Press, New York (2002)
Chapter Google Scholar
University of Trier, CS Department: DBLP - Digital Bibliography & Library Project (2006), http://dblp.uni-trier.de
Luotonen, A.: The common logfile format (1995), http://www.w3.org/pub/WWW/Daemon/User/Config/Logging.html
Hallam-Baker, P.M.: Extended log file format (1996), http://www.w3.org/TR/WD-logfile.html
The Apache Software Foundation: Apache http server project (2006), http://httpd.apache.org

Download references

Author information

Authors and Affiliations

Dipartimento di Elettronica e Informazione, Politecnico di Milano, Italy
Federico Michele Facca

Authors

Federico Michele Facca
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Bettina Berendt Andreas Hotho Dunja Mladenic Giovanni Semeraro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Facca, F.M. (2007). Combining Web Usage Mining and XML Mining in a Real Case Study. In: Berendt, B., Hotho, A., Mladenic, D., Semeraro, G. (eds) From Web to Social Web: Discovering and Deploying User and Content Profiles. WebMine 2006. Lecture Notes in Computer Science(), vol 4737. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74951-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-74951-6_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74950-9
Online ISBN: 978-3-540-74951-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics