Skip to main content
Log in

On Using a Warehouse to Analyze Web Logs

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Analyzing Web Logs for usage and access trends can not only provide important information to web site developers and administrators, but also help in creating adaptive web sites. While there are many existing tools that generate fixed reports from web logs, they typically do not allow ad-hoc analysis queries. Moreover, such tools cannot discover hidden patterns of access embedded in the access logs. We describe a relational OLAP (ROLAP) approach for creating a web-log warehouse. This is populated both from web logs, as well as the results of mining web logs. We discuss the design criteria that influenced our choice of dimensions, facts and data granularity. A web based ad-hoc tool for analytic queries on the warehouse was developed. We present some of the performance specific experiments that we performed on our warehouse.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. 'A Listing of Access Log Analyzers', http://www.uu.se/Software/Analyzers/Access-analyzers.html.

  2. R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” in Proc. of the 20th Int'l Conference on Very Large Databases, Santiago, Chile, Sept. 1994.

  3. H. Ahonen, O. Heinonen, M. Klemettinen, and I. Verkamo, “Mining in the phrasal frontier,” in 1st European Symposium on Principles of Data Mining and Knowledge Discovery (PKDD'97), Norway, June 1997.

  4. A. Buchner and M. Mulvenna, “Discovering internet marketing intelligence through online analytical web usage mining,” SIGMOD Record, vol. 27, no. 4, pp. 54–61, 1998.

    Google Scholar 

  5. M.S. Chen, J.-S. Park, and P.S. Yu, “Efficient data mining for path traversal patterns,” IEEE Trans. on Knowledge and Data Engineering, vol. 10, no. 2, pp. 209–221, 1998.

    Google Scholar 

  6. R. Cooley, B. Mobasher, and J. Srivastava, “Web mining: Information and pattern discovery on the world wide web,” in ICTAI'97, Dec. 1997, pp. 558–567.

  7. 'Follow: A session based Log analyzing tool,' http://www.pobox.com/∼mnot/follow/.

  8. A. Joshi and R. Krishnapuram, “Robust fuzzy clustering methods to support web mining,” in Proc.Workshop in Data Mining and knowledge Discovery, SIGMOD 1998.

  9. A. Joshi and R. Krishnapuram, “On mining web acceess logs,” in Proc. SIGMOD 2000 Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, 2000, pp 63–69.

  10. T. Kamdar, MS Thesis, CSEE Department, University of Maryland Baltimore County, May 2001.

  11. B. Lent, R. Agrawal, and R. Srikant, “Discovering trends in text databases,” in Proc. of the 3rd Int'l Conference on Knowledge Discovery in Databases and Data Mining, Newport Beach, California, August 1997.

  12. O. Nasraoui, H. Frigui, A. Joshi, and R. Krishnapuram, “Extracting web user profiles using relational competitive fuzzy clustering,” Intl. J. Artificial Intelligence Tools, vol. 9, no. 4, pp. 509–526, 2000.

    Google Scholar 

  13. O. Nasraoui, R. Krishnapuram, and A. Joshi, “Mining web access logs using a fuzzy relational clustering algorithm based on a robust estimator,” (poster) at WWW8, August 1999.

  14. M. Perkowitz and O. Etzioni, “Towards adaptive web sites: Conceptual framework and case study,” in Proc. of the Eighth International WWW Conference, May 1999, pp. 1245–1258.

  15. SGI-MineSet 'http://www.sgi.com/software/mineset/'.

  16. C. Shahabi, A.M. Zarkesh, J. Abidi, and V. Shah, “Knowledge discovery from user's web-page navigation,” in Proc. Seventh IEEE Intl. Workshop on Research Issues in Data Engineering (RIDE),' 97, pp. 20–29.

  17. “SpeedTracer: A web usage mining and analysis tool,” IBM Systems Journal, vol 37, no. 1--Internet Computing, pp. 89–105, 1998.

  18. L. Yi, R. Krishnapuram, and A. Joshi, “A fuzzy relative of the k-medoids algorithm with application to document and snippet clustering,” IEEE Int'l Conference--Fuzzy Systems, 1999.

  19. O.R. Zaiane, M. Xin, and J. Han, “Discovering web access patterns and trends by applying OLAP and data mining technology on web logs,” in Proc. Advances in Digital Libraries Conf. (ADL'98), Santa Barbara, CA, April 1998, pp. 19–29.

  20. A. Zarkesh, J. Adibi, C. Shahabi, R. Sadri, and V. Shah, “Analysis and design of server informative WWWsites,” in Proceedings of the ACM CIKM'97, pp. 254–261.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Joshi, K.P., Joshi, A. & Yesha, Y. On Using a Warehouse to Analyze Web Logs. Distributed and Parallel Databases 13, 161–180 (2003). https://doi.org/10.1023/A:1021515408295

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1021515408295

Navigation