Abstract
Collecting and mining web log records (WLRs) from e-commerce web sites has become increasingly important for targeted marketing, promotions, and traffic analysis. In this paper, we describe a scalable data warehousing and OLAP-based engine for analyzing WLRs. We have to address several scalability and performance challenges in developing such a framework. Because an active web site may generate hundreds of millions of WLRs daily, we have to deal with huge data volumes and data flow rates. To support fine-grained analysis, e.g., individual users’ access profiles, we end up with huge, sparse data cubes defined over very large-sized dimensions (there may be hundreds of thousands of visitors to the site and tens of thousands of pages). While OLAP servers store sparse cubes quite efficiently, rolling up a very large cube can take prohibitively long. We have applied several non-traditional approaches to deal with this problem, which allow us to speed up WLR analysis by 3 orders of magnitude. Our framework supports multilevel and multidimensional pattern extraction, analysis and feature ranking, and in addition to the typical OLAP operations, supports data mining operations such as extended multilevel and multidimensional association rules.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sameet Agarwal, Rakesh Agrawal, Prasad Deshpande, Ashish Gupta, Jeffrey F. Naughton, Raghu Ramakrishnan, Sunita Sarawagi, “On the Computation of Multidimensional Aggregates”, 506–521, Proc. VLDB’96, 1996.
Torben Bach Pedersen, Christian S. Jensen, Curtis E. Dyreson, “Extending Practical Pre-Aggregation in On-line Analytical Processing”, 663–674, Proc. VLDB’99, 1999.
Stefano Ceri, Piero Fraternali, Stefano Paraboschi, “Data-Driven, One-To-One Web Site Generation for Data-Intensive Applications”, 615–626, Proc. VLDB’99, 1999.
Surajit Chaudhuri and Umesh Dayal, “An Overview of Data Warehousing and OLAP Technology”, SIGMOD Record Vol (26) No (1), 1996.
Q. Chen, M. Hsu and U. Dayal, “A Data Warehouse/OLAP Framework for Scalable Telecommunication Tandem Traffic Analysis”, Proc. of 16th International Conference on Data Engineering (ICDE-2000), 2000, USA.
Q. Chen, U. Dayal, M. Hsu, “A Distributed OLAP Infrastructure for E-Commerce”, Proc. Fourth IFCIS Conference on Cooperative Information Systems (CoopIS’99), 1999, UK.
Daniela Florescu, Alon Y. Levy, Dan Suciu, Khaled Yagoub, “Optimization of Run-time Management of Data Intensive Web-sites”, 627–638, Proc. VLDB’99, 1999.
Dimitrios Gunopulos, George Kollios, Vassilis Tsotras, Carlotta Domeniconi, “Approximating multi-dimensional aggregate range queries overreal attributes”, Proc. ACMSIGMOD’00, 2000.
J. Han, S. Chee, and J. Y. Chiang, “Issues for On-Line Analytical Mining of Data Warehouses”, SIGMOD’98 Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’98), USA, 1998.
H. V. Jagadish, Laks V. S. Lakshmanan, Divesh Srivastava, What can Hierarchies do for Data Warehouses? 530–541, Proc. VLDB’99, 1999.
S. Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins, “Extracting Large-Scale Knowledge Bases from the Web”, 639–650, Proc. VLDB’99, 1999.
Net.Genesis http://www.netgenesis.com.
WebTrends, http://www.webt rends.com.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chen, Q., Dayal, U., Hsu, M. (2000). An OLAP-based Scalable Web Access Analysis Engine. In: Kambayashi, Y., Mohania, M., Tjoa, A.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2000. Lecture Notes in Computer Science, vol 1874. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44466-1_21
Download citation
DOI: https://doi.org/10.1007/3-540-44466-1_21
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67980-6
Online ISBN: 978-3-540-44466-4
eBook Packages: Springer Book Archive