Skip to main content

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 111))

  • 229 Accesses

Abstract

Explosive growth in size and usage of the World Wide Web has made it Necessary for Web site administrators to track and analyze the navigation patterns of Web site visitors. However, data mining techniques are not easily applicable to Web data due to problems both related with the technology underlying the Web and the lack of standards in the design and implementation of Web pages. Information collected by Web servers and kept in the server log is the main source of data for analyzing user navigation patterns.

Once logs have been preprocessed and sessions have been obtained there are several kinds of access pattern mining that can be performed depending on the needs of the analyst. It is important to mention that most efforts have relied on relatively simple techniques which can be inadequate for real user profile data since noise in the data has to be firstly tacked. Thus, there is a need for robust methods that integrates different intelligent techniques that are free of any assumptions about the noise contamination rate.

In this paper, the problem of mining behavior patterns on the Web is studied in detail and different approaches to solve the problem are analyzed. An algorithm is given to calculate frequent access patterns. This algorithm is based on a model structure that has been called WPC-Tree that stores in each node relevant information about pages that make it possible to apply data mining techniques to obtain useful patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. B. Mobasher, N. Jain, E. Han, and J. Srivastava. (1997) Web mining: Pattern discovery from WWW transaction. In Int Conference on Tools with Artificial Intellgence, pages 558–567, New port.

    Google Scholar 

  2. Jiawei Han and Micheline Kamber. (2001) Data Mining: Concepts and Techniques. Morgan Kaufmann publishers.

    Google Scholar 

  3. Oren Etzioni. (1996) The World-Wide Web: Quagmire or gold mine? Communications of the ACM, 39 (11): 65–77.

    Article  Google Scholar 

  4. M. Perkowitz and O. Etzioni. (1998) Adaptive web sites: Automatically synthesizing web pages. In Fifteenth National Conference on Artificial Intelligence.

    Google Scholar 

  5. http://www.statlab.cam.ac.uk/sret1/analalog.

  6. http://www.boutell.com/wusage.

  7. http://www.internetworld.com/print/monthly/1997/06/iwlabs.html.

  8. Jaideep Srivastava, Robert Cooley, Mukund Deshpande, and Pang-Ning Tan. (2000) Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1: 12–23.

    Google Scholar 

  9. D. Florescu, A. Levy, and A. Mendelzon. (1998) Database techniques for the World-Wide Web: A survey. SIGMOD Record (ACM Special Interest Group on Management of Data), 27 (3): 59.

    Google Scholar 

  10. Tak Woon Yan, Matthew Jacobsen, Hector Garcia-Molina, and Umeshwar Dayal. (1996) From user access patterns to dynamic hypertext linking. Computer Networks and ISDN Systems, 28 (7–11): 1007–1014.

    Article  Google Scholar 

  11. Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. (1999) Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1 (1).

    Google Scholar 

  12. M. Spiliopoulou, L. Faulstich, and K. Wilkler. (1999) A data miner analyzing the navigational behaviour of web users. In Proc. Of the Workshop on Machine Learning in User Modelling of the ACAI99, Greece.

    Google Scholar 

  13. Myra Spiliopoulou, Carsten Pohle, and Lukas Faulstich. (1999) Improving the efiectiveness of a web site with web usage mining. In Proceedings WEBKDD99.

    Google Scholar 

  14. Rob Barret, Paul Maglio, and Daniel Kellern (1997). Web browser Intelligence: Opening up the web. In Proceedings of COMPCON97, page 122.

    Google Scholar 

  15. J. C. Bezdek. (1981) Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York.

    Book  MATH  Google Scholar 

  16. C. Shahabi, A. M. Zarkesh, J. Adibi, and V. Shah. (1997) Knowledge discovery from user’s web-page navigation. In Proceedings of the Seventh International Workshop on Research Issues in Data Engineering, High Performance Database Management for Large-Scale Applications (RIDE’97), pages 20–31, Washington- Brussels-Tokyo, IEEE.

    Chapter  Google Scholar 

  17. Olfa Nasraoui, Hichem Frigui, Anupam Joshi, and Raghu Krishnapuram. (1999) Mining web access logs using relational competitive fuzzy clustering. In Proceedings of the International Fuzzy Systems Association Congress, Chungli, Taiwan.

    Google Scholar 

  18. R. J. Hathaway, J. W. Davenport, and J. C. Bezcez. (1989) Relational duals of the c-means algorithms. Pattern recognition, 22: 205–212.

    Article  MathSciNet  MATH  Google Scholar 

  19. O. Nasraoiu, R. Krisnapuram, and A. Joshi. Mining web access logs using a fuzzy realtional clustering algrotihm based on a robust estimator.

    Google Scholar 

  20. Yongjian Fu. Clustering of web users based on access patterns.

    Google Scholar 

  21. Jiawei Han, Yandong Cai, and Nick Cercone. (1992) Knowledge discovery in databases: An attribute-oriented approach. In Li-Yan Yuan, editor, Very large data bases: VLDB ‘82, proceedings of the 18th International Conference on Very Large Data Bases, August 23–27, 1992, Vancouver, Canada, pages 547–559, Los Altos, CA 94022, USA. Morgan Kaufmann Publishers.

    Google Scholar 

  22. Tian Zhang, Raghu Ramakrishnan, and Miron Livny. (1996) BIRCH: an effcient data clustering method for very large databases. In H. V. Jagadish and Inderpal Singh Mumick, editors, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4–6, 1996, pages 103–114, New York, NY 10036, USA. ACM Press.

    Google Scholar 

  23. B. Mobasher, H. Dai, T. Luo, M. Nakagawa, and J. Witshire. (2000) Discovery of aggregate usage profiles for web personalization. In Proceedings of the WebKDD Workshop.

    Google Scholar 

  24. Pang-Ning Tan and Vipin Kumar. (2000) Modeling of web robot navigational patterns. In Workshop on Web Mining for E-Commerce-Challenges and Opportunities Working Notes (KDD2000), pages 111–117, Boston, MA, August.

    Google Scholar 

  25. Gaul Wolfang and Schmidt-Thieme Lars. (2000) Mining web navigation path fragments. In Workshop on Web Mining for E-Commerce-Challenges and Opportunities Working Notes (KDD2000), pages 105–110, Boston, MA, August.

    Google Scholar 

  26. Jose Borges and Mark Levene. (2000) A fine grained heuristic to capture web navigation patterns. SIGKDD Explorations, 2 (1): 40–50.

    Article  Google Scholar 

  27. J. Borges and M. Levene. (1999) Heuristics for mining high quality user web navigation patterns. Research Note RN/99/68, Department of Computer Science, Gower Street, London, UK, October.

    Google Scholar 

  28. J. Borges and M. Levene. (2000) A heuristic to capture longer user web navigation patterns. In Proc. Of the First International Conference on Electronic Commerce and Web Technologies, Greenwich, U.K., September.

    Google Scholar 

  29. Ming-Syan Chen, Jong Soo Park, and Philip S. Yu. (1998) EÆcient data mining for path traversal patterns. IEEE Transactions on knowledge and data engineering, 10(2):209–221, march/april.

    Google Scholar 

  30. Jian Pei, Jiawei Han, Behzad Mortazavi-AsI, and Hua Zhu. (2000) Mining access patterns eiEciently from web logs. In Proceedings Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00).

    Google Scholar 

  31. Cyrus Shahabi, Adil Faisal, Farnoush Banaei Kashani, and Jabed Faruque. (2000) 1NSITE: A tool for real-time knowledge discovery from users web navigation. In Proceedings of Very Large Databases (VLDB’2000), Cairo, Egypt, September.

    Google Scholar 

  32. Cyrus Shahabi, Adil Faisal, Farnoush Banaei Kashani, and Jabed Faruque. (2000) Insite: A tool for interpreting users? interaction with a web space. In Amr El Abbadi, Michael L. Brodie, Sharma Chakravarthy, Umeshwar Dayal, Nabil Kamel, Gunter Schlageter, and Kyu-Young Whang, editors, VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10–14, 2000, Cairo, Egypt, pages 635–638. Morgan Kaufmann.

    Google Scholar 

  33. H. Kato, T. Nakayama, and Y. Yamane. (2000) Navigation analysis tool based on the correlation between contents distribution and access patterns. In Workshop on Web Mining for E-Commerce-Challenges and Opportunities Working Notes (KDD2000), pages 95–104, August.

    Google Scholar 

  34. Myra Spiliopoulou and Lukas C. Faulstich. (1998) WUM: a Web Utilization Miner. In Workshop on the Web and Data Bases (WebDB98), pages 109–115.

    Google Scholar 

  35. Stuart Schechter, Murali Krishnan, and Michael D. Smith. (1998) Using path profiles to predict HTTP requests. Computer Networks and ISDN Systems, 30(1–7):457–467, April.

    Google Scholar 

  36. Cyrus Shahabi, Farnoush Banaei-Kashani, Jabed Faruque, and Adil Faisal. (2001) Feature matrices: A model for elEcient and anonymous web usage mining. In Proceedings of EC-Web 2001, Germany, September.

    Google Scholar 

  37. John S. Breese, David Heckerman, and Carl Kadie. (1998) Empirical analysis of predictive algorithms for collaborative filtering. In Gregory F. Cooper and Serafin Moral, editors, Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 43–52, San Francisco, July 24–26. Morgan Kaufmann.

    Google Scholar 

  38. Joseph A. Konstan, Bradley N. Miller, David Maltz, Jonathan L. Herlocker, Lee R. Gordon, and John Riedl. (1997) GroupLens: Applying collaborative filtering to Usenet news. Communications of the ACM, 40(3):77–87, March.

    Google Scholar 

  39. Upendra Shardanand and Patti Maes. (1995) Social information filtering: Algorithms for automating \word of mouth“. In Proceedings of ACM CI-11’95 Conference on Human Factors in Computing Systems, volume I of Papers: Using the Information of Others, pages 210–217.

    Google Scholar 

  40. Daniel Billsus and Michael J. Pazzani. (1998) Learning collaborative information filters. In Proc. 15th International Conf. on Machine Learning, pages 46–54. Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  41. Slodoban Vucetic and Zoran Obradovic. (2000) A regression based approach for scaling-up personalized recommeder systems in e-commerce. In The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Workshop on Web Mining for E-Commerce-Challenges and Opportunities), August.

    Google Scholar 

  42. Lise Getoor and Mehran Sahami. Using probabiistic relational models for collaborative filtering.

    Google Scholar 

  43. Thomas Hofmann and Jan Puzicha. (1999) Latent class models for collaborative filtering. In Dean Thomas, editor, Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99-Vol2), pages 688–693, S.F., July 31-August 6. Morgan Kaufmann Publishers.

    Google Scholar 

  44. Yezdi Lashkari. (1995) Feature guided atomated collaborative filtering. Master’s thesis, Massachutes institute of tech.

    Google Scholar 

  45. T. Joachims, D. Freitag, and T. Mitchell. (1997) Webwatcher: A tour guide for the world wide web. In Proceedings of IJCAI97.

    Google Scholar 

  46. Henry Lieberman, Christopher Fry, and Louis Weitzman. (2001) Exploring the web with reconnaissance agents. Communications of the ACM, 44 (8): 69–75.

    Article  Google Scholar 

  47. Henry Lieberman. (1995) Letizia: An agent that assists web browsing. In Chris S. Mellish, editor, Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 924–929. Morgan Kaufmann publishers Inc.: San Mateo, CA, USA, August 20–25.

    Google Scholar 

  48. http://www.alexa.com.

  49. J. Budzik, K.J. Hammond, C. Marlow, and A. Scheinkman. (1998) Anticipating information needs: Everyday applications as interfaces to internet information sources. In Proceedings of the 1998 World Conference on the W W W, Internet, and Intranet. AACE Press.

    Google Scholar 

  50. Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, and Jon Kleinberg. (1998) Automatic resource compilation by analyzing hyperlink structure and associated text. Computer Networks and ISDN Systems, 30 (1–7): 65–74.

    Article  Google Scholar 

  51. Y. S. Choi and S. I. Yoo. (1999) Multi-agent Web information retrieval: Neural network based approach. Lecture Notes in Computer Science, 1642: 499.

    Google Scholar 

  52. P. Werbos. (1974) Beyond Regression New Tools for Prediction and Analysis in the Behaviroal Scienes. PhD thesis, Harvard.

    Google Scholar 

  53. Y Yao, H. J. Hamilton, and X.W Wang. (2000) PagePrompter: An intelligent agent for web navigation created using data mining techniques. Technical report, Department of Computer Science, November.

    Google Scholar 

  54. J. Hartigan. (1975) Clustering Algorithm. John Willey.

    Google Scholar 

  55. Rakesh Agrawal and Ramakrishnan Srikant. (1994) Fast algorithms for mining association rules. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487–499. Morgan Kaufmann, 12–15 September.

    Google Scholar 

  56. Juan Pedro Caraça-Valente and Ignacio Lopez-Chavarrias. (2000) Discovering similar patterns in time series. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-00), pages 497–505, N. Y., August 20–23. ACM Press.Lenz M., Hübner A., Kunze M. (1998). Textual CBR. In: Lenz M., Bartsch-Spörl B., Burkhard H.-D., Wess S. (Eds.) (1998): Case-Based Reasoning Technology. From Foundations to Applications. Springer Verlag, Berlin, Heidelberg.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Menasalvas, E., Marbán, O., Millán, S., Peña, J.M. (2003). Intelligent Web Mining. In: Szczepaniak, P.S., Segovia, J., Kacprzyk, J., Zadeh, L.A. (eds) Intelligent Exploration of the Web. Studies in Fuzziness and Soft Computing, vol 111. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1772-0_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-7908-1772-0_22

  • Publisher Name: Physica, Heidelberg

  • Print ISBN: 978-3-7908-2519-0

  • Online ISBN: 978-3-7908-1772-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics