Article

Using retrieval measures to assess similarity in mining dynamic web clickstreams

Authors:
Olfa Nasraoui

University of Louisville

University of Louisville
View Profile

,
Cesar Cardona

Magnify Inc., Chicago, IL

Magnify Inc., Chicago, IL
View Profile

,
Carlos Rojas

University of Louisville

University of Louisville
View Profile

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data miningAugust 2005Pages 439–448https://doi.org/10.1145/1081870.1081921

Published:21 August 2005Publication History

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Pages 439–448

ABSTRACT

While scalable data mining methods are expected to cope with massive Web data, coping with evolving trends in noisy data in a continuous fashion, and without any unnecessary stoppages and reconfigurations is still an open challenge. This dynamic and single pass setting can be cast within the framework of mining evolving data streams. In this paper, we explore the task of mining mass user profiles by discovering evolving Web session clusters in a single pass with a recently proposed scalable immune based clustering approach (TECNO-STREAMS), and study the effect of the choice of different similarity measures on the mining process and on the interpretation of the mined patterns. We propose a simple similarity measure that has the advantage of explicitly coupling the precision and coverage criteria to the early learning stages, and furthermore requiring that the affinity of the data to the learned profiles or summaries be defined by the minimum of their coverage or precision, hence requiring that the learned profiles are simultaneously precise and complete, with no compromises.In our experiments, we study the task of mining evolving user profiles from Web clickstream data (web usage mining) in a single pass, and under different trend sequencing scenarios, showing that compared oto the cosine similarity measure, the proposed similarity measure explicitly based on precision and coverage allows the discovery of more correct profiles at the same precision or recall quality levels.

References

S. Babu and J. Widom. Continuous queries over data streams. In SIGMOD Record'01, pages 109--120, 2001.]] Google ScholarDigital Library
D. Barbara. Requirements for clustering data streams. ACM SIGKDD Explorations Newsletter, 3(2):23--27, 2002.]] Google ScholarDigital Library
J. Borges and M. Levene. Data mining of user navigation patterns. In H. A. Abbass, R. A. Sarker, and C. Newton, editors, Web Usage Analysis and User Profiling, Lecture Notes in Computer Science, pages 92--111. Springer-Verlag, 1999.]] Google ScholarDigital Library
P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In Proceedings of the 4th international conf. on Knowledge Discovery and Data Mining (KDD98), 1998.]]Google Scholar
Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In 2002 Int. Conf. on Very Large Data Bases (VLDB'02), Hong Kong, China, 2002.]]Google ScholarDigital Library
R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Journal of knowledge and information systems, 1(1), 1999.]]Google Scholar
S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In IEEE Symposium on Foundations of Computer Science (FOCS'00), Redondo Beach, CA, 2000.]] Google ScholarDigital Library
J. Hunt and D. Cooke. An adaptative, distributed learning system, based on immune system. In IEEE International Conference on Systems, Man and Cybernetics, pages 2494--2499, Los Alamitos, CA, 1995.]]Google Scholar
N. K. Jerne. The immune system. Scientific American, 229(1):52--60, 1973.]]Google ScholarCross Ref
R. R. Korfhage. Information Storage and Retrieval. Wiley, 1997.]] Google ScholarDigital Library
O. Nasraoui, C. Cardona-Uribe, and C. Rojas-Coronel. Tecno-streams: Tracking evolving clusters in noisy data streams with a scalable immune system learning model. In IEEE International Conference on Data Mining, Melbourne, Florida, Nov. 2003.]] Google ScholarDigital Library
O. Nasraoui, D. Dasgupta, and F. Gonzalez. An artificial immune system approach to robust data mining. In Genetic and Evolutionary Computation Conference (GECCO) Late breaking papers, pages 356--363, New York, NY, 2002.]]Google Scholar
O. Nasraoui, H. Frigui, R. Krishnapuram, and A. Joshi. Mining web access logs using relational competitive fuzzy clustering. In Eighth International Fuzzy Systems Association Congress, Hsinchu, Taiwan, Aug. 1999.]]Google Scholar
O. Nasraoui and R. Krishnapuram. One step evolutionary mining of context sensitive associations and web navigation patterns. In SIAM conference on Data Mining, pages 531--547, Arlington, VA, 2002.]]Google ScholarCross Ref
O. Nasraoui, R. Krishnapuram, H. Frigui, and A. Joshi. Extracting web user profiles using relational competitive fuzzy clustering. International Journal of Artificial Intelligence Tools, 9(4):509--526, 2000.]]Google ScholarCross Ref
O. Nasraoui, R. Krishnapuram, and A. Joshi. Mining web access logs using a relational clustering algorithm based on a robust estimator. In 8th International World Wide Web Conference, pages 40--41, Toronto, Canada, 1999.]]Google Scholar
M. Perkowitz and O. Etzioni. Adaptive web sites: Automatically synthesizing web pages. In AAAI 98, 1998.]] Google ScholarDigital Library
C. Shahabi, A. M. Zarkesh, J. Abidi, and V. Shah. Knowledge discovery from users web-page navigation. In Proceedings of workshop on research issues in Data engineering, Birmingham, England, 1997.]] Google ScholarDigital Library
J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2):1--12, Jan 2000.]] Google ScholarDigital Library
J. Timmis, M. Neal, and J. Hunt. An artificial immune system for data analysis. Biosystems, 55(1/3):143--150, 2000.]]Google ScholarCross Ref
T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Proceedings of the 5th International World Wide Web conference, Paris, France, 1996.]] Google ScholarDigital Library
H. Yang, S. Parthasarathy, and S. Reddy. On the use of constrained association rules for web mining. In WebKDD workshop on Knowledge Discovery in the Web, pages 77--90, Edmonton, Alberta, Canada, 2002.]]Google Scholar
O. Zaiane, M. Xin, and J. Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. In Advances in Digital Libraries, pages 19--29, Santa Barbara, CA, 1998.]] Google ScholarDigital Library
T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for large databases. In ACM SIGMOD International Conference on Management of Data, pages 103--114, New York, NY, 1996. ACM Press.]] Google ScholarDigital Library

Index Terms

Using retrieval measures to assess similarity in mining dynamic web clickstreams
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Frequent pattern mining on stream data using Hadoop CanTree-GTree

The need for knowledge discovery from real-time stream data is continuously increasing nowadays and processing of transactions for mining patterns needs efficient data structures and algorithms. We propose a time-efficient Hadoop CanTree-GTree algorithm,...
Read More
A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites

In this paper, we present a complete framework and findings in mining web usage patterns from Web log files of a real website that has all the challenging aspects of real life web usage mining, including evolving user profiles and external data ...
Read More
Knowledge Discovery and Retrieval on World Wide Web Using Web Structure Mining
AMS '10: Proceedings of the 2010 Fourth Asia International Conference on Mathematical/Analytical Modelling and Computer Simulation

The World Wide Web is nearing omnipresence. The explosively growing number of Web contents including Digitalized manuals, emails pictures, multimedia, and Web services require a distinct and elaborate structural framework that can provide a navigational ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
August 2005
844 pages
ISBN:159593135X
DOI:10.1145/1081870
General Chair:
Robert Grossman
University of Illinois at Chicago & Open Data Partners, USA
,
Program Chairs:
Roberto Bayardo
IBM Almaden Research, USA
,
Kristin Bennett
RPI, USA
Copyright © 2005 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 August 2005
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
artificial immune systems
clustering
mining evolving data
personalization
stream data mining
web mining
Qualifiers
- Article
Conference

Acceptance Rates
Overall Acceptance Rate1,133of8,635submissions,13%
Upcoming Conference
KDD '24

Sponsor:

sigkdd

sigkdd

The 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 25 - 29, 2024

Barcelona , Spain
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 8
  Total Citations
  View Citations
- 1,147
  Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Using retrieval measures to assess similarity in mining dynamic web clickstreams

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Frequent pattern mining on stream data using Hadoop CanTree-GTree

A Web Usage Mining Framework for Mining Evolving User Profiles in Dynamic Web Sites

Knowledge Discovery and Retrieval on World Wide Web Using Web Structure Mining