skip to main content
10.1145/1081870.1081921acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Using retrieval measures to assess similarity in mining dynamic web clickstreams

Published:21 August 2005Publication History

ABSTRACT

While scalable data mining methods are expected to cope with massive Web data, coping with evolving trends in noisy data in a continuous fashion, and without any unnecessary stoppages and reconfigurations is still an open challenge. This dynamic and single pass setting can be cast within the framework of mining evolving data streams. In this paper, we explore the task of mining mass user profiles by discovering evolving Web session clusters in a single pass with a recently proposed scalable immune based clustering approach (TECNO-STREAMS), and study the effect of the choice of different similarity measures on the mining process and on the interpretation of the mined patterns. We propose a simple similarity measure that has the advantage of explicitly coupling the precision and coverage criteria to the early learning stages, and furthermore requiring that the affinity of the data to the learned profiles or summaries be defined by the minimum of their coverage or precision, hence requiring that the learned profiles are simultaneously precise and complete, with no compromises.In our experiments, we study the task of mining evolving user profiles from Web clickstream data (web usage mining) in a single pass, and under different trend sequencing scenarios, showing that compared oto the cosine similarity measure, the proposed similarity measure explicitly based on precision and coverage allows the discovery of more correct profiles at the same precision or recall quality levels.

References

  1. S. Babu and J. Widom. Continuous queries over data streams. In SIGMOD Record'01, pages 109--120, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Barbara. Requirements for clustering data streams. ACM SIGKDD Explorations Newsletter, 3(2):23--27, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. Borges and M. Levene. Data mining of user navigation patterns. In H. A. Abbass, R. A. Sarker, and C. Newton, editors, Web Usage Analysis and User Profiling, Lecture Notes in Computer Science, pages 92--111. Springer-Verlag, 1999.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. P. Bradley, U. Fayyad, and C. Reina. Scaling clustering algorithms to large databases. In Proceedings of the 4th international conf. on Knowledge Discovery and Data Mining (KDD98), 1998.]]Google ScholarGoogle Scholar
  5. Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. Multi-dimensional regression analysis of time-series data streams. In 2002 Int. Conf. on Very Large Data Bases (VLDB'02), Hong Kong, China, 2002.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Cooley, B. Mobasher, and J. Srivastava. Data preparation for mining world wide web browsing patterns. Journal of knowledge and information systems, 1(1), 1999.]]Google ScholarGoogle Scholar
  7. S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In IEEE Symposium on Foundations of Computer Science (FOCS'00), Redondo Beach, CA, 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Hunt and D. Cooke. An adaptative, distributed learning system, based on immune system. In IEEE International Conference on Systems, Man and Cybernetics, pages 2494--2499, Los Alamitos, CA, 1995.]]Google ScholarGoogle Scholar
  9. N. K. Jerne. The immune system. Scientific American, 229(1):52--60, 1973.]]Google ScholarGoogle ScholarCross RefCross Ref
  10. R. R. Korfhage. Information Storage and Retrieval. Wiley, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. O. Nasraoui, C. Cardona-Uribe, and C. Rojas-Coronel. Tecno-streams: Tracking evolving clusters in noisy data streams with a scalable immune system learning model. In IEEE International Conference on Data Mining, Melbourne, Florida, Nov. 2003.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. O. Nasraoui, D. Dasgupta, and F. Gonzalez. An artificial immune system approach to robust data mining. In Genetic and Evolutionary Computation Conference (GECCO) Late breaking papers, pages 356--363, New York, NY, 2002.]]Google ScholarGoogle Scholar
  13. O. Nasraoui, H. Frigui, R. Krishnapuram, and A. Joshi. Mining web access logs using relational competitive fuzzy clustering. In Eighth International Fuzzy Systems Association Congress, Hsinchu, Taiwan, Aug. 1999.]]Google ScholarGoogle Scholar
  14. O. Nasraoui and R. Krishnapuram. One step evolutionary mining of context sensitive associations and web navigation patterns. In SIAM conference on Data Mining, pages 531--547, Arlington, VA, 2002.]]Google ScholarGoogle ScholarCross RefCross Ref
  15. O. Nasraoui, R. Krishnapuram, H. Frigui, and A. Joshi. Extracting web user profiles using relational competitive fuzzy clustering. International Journal of Artificial Intelligence Tools, 9(4):509--526, 2000.]]Google ScholarGoogle ScholarCross RefCross Ref
  16. O. Nasraoui, R. Krishnapuram, and A. Joshi. Mining web access logs using a relational clustering algorithm based on a robust estimator. In 8th International World Wide Web Conference, pages 40--41, Toronto, Canada, 1999.]]Google ScholarGoogle Scholar
  17. M. Perkowitz and O. Etzioni. Adaptive web sites: Automatically synthesizing web pages. In AAAI 98, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Shahabi, A. M. Zarkesh, J. Abidi, and V. Shah. Knowledge discovery from users web-page navigation. In Proceedings of workshop on research issues in Data engineering, Birmingham, England, 1997.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Srivastava, R. Cooley, M. Deshpande, and P.-N. Tan. Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1(2):1--12, Jan 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Timmis, M. Neal, and J. Hunt. An artificial immune system for data analysis. Biosystems, 55(1/3):143--150, 2000.]]Google ScholarGoogle ScholarCross RefCross Ref
  21. T. Yan, M. Jacobsen, H. Garcia-Molina, and U. Dayal. From user access patterns to dynamic hypertext linking. In Proceedings of the 5th International World Wide Web conference, Paris, France, 1996.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Yang, S. Parthasarathy, and S. Reddy. On the use of constrained association rules for web mining. In WebKDD workshop on Knowledge Discovery in the Web, pages 77--90, Edmonton, Alberta, Canada, 2002.]]Google ScholarGoogle Scholar
  23. O. Zaiane, M. Xin, and J. Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. In Advances in Digital Libraries, pages 19--29, Santa Barbara, CA, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Zhang, R. Ramakrishnan, and M. Livny. Birch: An efficient data clustering method for large databases. In ACM SIGMOD International Conference on Management of Data, pages 103--114, New York, NY, 1996. ACM Press.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Using retrieval measures to assess similarity in mining dynamic web clickstreams

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
      August 2005
      844 pages
      ISBN:159593135X
      DOI:10.1145/1081870

      Copyright © 2005 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 21 August 2005

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,133of8,635submissions,13%

      Upcoming Conference

      KDD '24

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader