Intelligent Web Mining

Menasalvas, Ernestina; Marbán, Oscar; Millán, Socorro; Peña, Jose M.

doi:10.1007/978-3-7908-1772-0_22

Ernestina Menasalvas⁶,
Oscar Marbán⁷,
Socorro Millán⁸ &
…
Jose M. Peña⁹

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 111))

229 Accesses

Abstract

Explosive growth in size and usage of the World Wide Web has made it Necessary for Web site administrators to track and analyze the navigation patterns of Web site visitors. However, data mining techniques are not easily applicable to Web data due to problems both related with the technology underlying the Web and the lack of standards in the design and implementation of Web pages. Information collected by Web servers and kept in the server log is the main source of data for analyzing user navigation patterns.

Once logs have been preprocessed and sessions have been obtained there are several kinds of access pattern mining that can be performed depending on the needs of the analyst. It is important to mention that most efforts have relied on relatively simple techniques which can be inadequate for real user profile data since noise in the data has to be firstly tacked. Thus, there is a need for robust methods that integrates different intelligent techniques that are free of any assumptions about the noise contamination rate.

In this paper, the problem of mining behavior patterns on the Web is studied in detail and different approaches to solve the problem are analyzed. An algorithm is given to calculate frequent access patterns. This algorithm is based on a model structure that has been called WPC-Tree that stores in each node relevant information about pages that make it possible to apply data mining techniques to obtain useful patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

B. Mobasher, N. Jain, E. Han, and J. Srivastava. (1997) Web mining: Pattern discovery from WWW transaction. In Int Conference on Tools with Artificial Intellgence, pages 558–567, New port.
Google Scholar
Jiawei Han and Micheline Kamber. (2001) Data Mining: Concepts and Techniques. Morgan Kaufmann publishers.
Google Scholar
Oren Etzioni. (1996) The World-Wide Web: Quagmire or gold mine? Communications of the ACM, 39 (11): 65–77.
Article Google Scholar
M. Perkowitz and O. Etzioni. (1998) Adaptive web sites: Automatically synthesizing web pages. In Fifteenth National Conference on Artificial Intelligence.
Google Scholar
http://www.statlab.cam.ac.uk/sret1/analalog.
http://www.boutell.com/wusage.
http://www.internetworld.com/print/monthly/1997/06/iwlabs.html.
Jaideep Srivastava, Robert Cooley, Mukund Deshpande, and Pang-Ning Tan. (2000) Web usage mining: Discovery and applications of usage patterns from web data. SIGKDD Explorations, 1: 12–23.
Google Scholar
D. Florescu, A. Levy, and A. Mendelzon. (1998) Database techniques for the World-Wide Web: A survey. SIGMOD Record (ACM Special Interest Group on Management of Data), 27 (3): 59.
Google Scholar
Tak Woon Yan, Matthew Jacobsen, Hector Garcia-Molina, and Umeshwar Dayal. (1996) From user access patterns to dynamic hypertext linking. Computer Networks and ISDN Systems, 28 (7–11): 1007–1014.
Article Google Scholar
Robert Cooley, Bamshad Mobasher, and Jaideep Srivastava. (1999) Data preparation for mining world wide web browsing patterns. Knowledge and Information Systems, 1 (1).
Google Scholar
M. Spiliopoulou, L. Faulstich, and K. Wilkler. (1999) A data miner analyzing the navigational behaviour of web users. In Proc. Of the Workshop on Machine Learning in User Modelling of the ACAI99, Greece.
Google Scholar
Myra Spiliopoulou, Carsten Pohle, and Lukas Faulstich. (1999) Improving the efiectiveness of a web site with web usage mining. In Proceedings WEBKDD99.
Google Scholar
Rob Barret, Paul Maglio, and Daniel Kellern (1997). Web browser Intelligence: Opening up the web. In Proceedings of COMPCON97, page 122.
Google Scholar
J. C. Bezdek. (1981) Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York.
Book MATH Google Scholar
C. Shahabi, A. M. Zarkesh, J. Adibi, and V. Shah. (1997) Knowledge discovery from user’s web-page navigation. In Proceedings of the Seventh International Workshop on Research Issues in Data Engineering, High Performance Database Management for Large-Scale Applications (RIDE’97), pages 20–31, Washington- Brussels-Tokyo, IEEE.
Chapter Google Scholar
Olfa Nasraoui, Hichem Frigui, Anupam Joshi, and Raghu Krishnapuram. (1999) Mining web access logs using relational competitive fuzzy clustering. In Proceedings of the International Fuzzy Systems Association Congress, Chungli, Taiwan.
Google Scholar
R. J. Hathaway, J. W. Davenport, and J. C. Bezcez. (1989) Relational duals of the c-means algorithms. Pattern recognition, 22: 205–212.
Article MathSciNet MATH Google Scholar
O. Nasraoiu, R. Krisnapuram, and A. Joshi. Mining web access logs using a fuzzy realtional clustering algrotihm based on a robust estimator.
Google Scholar
Yongjian Fu. Clustering of web users based on access patterns.
Google Scholar
Jiawei Han, Yandong Cai, and Nick Cercone. (1992) Knowledge discovery in databases: An attribute-oriented approach. In Li-Yan Yuan, editor, Very large data bases: VLDB ‘82, proceedings of the 18th International Conference on Very Large Data Bases, August 23–27, 1992, Vancouver, Canada, pages 547–559, Los Altos, CA 94022, USA. Morgan Kaufmann Publishers.
Google Scholar
Tian Zhang, Raghu Ramakrishnan, and Miron Livny. (1996) BIRCH: an effcient data clustering method for very large databases. In H. V. Jagadish and Inderpal Singh Mumick, editors, Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data, Montreal, Quebec, Canada, June 4–6, 1996, pages 103–114, New York, NY 10036, USA. ACM Press.
Google Scholar
B. Mobasher, H. Dai, T. Luo, M. Nakagawa, and J. Witshire. (2000) Discovery of aggregate usage profiles for web personalization. In Proceedings of the WebKDD Workshop.
Google Scholar
Pang-Ning Tan and Vipin Kumar. (2000) Modeling of web robot navigational patterns. In Workshop on Web Mining for E-Commerce-Challenges and Opportunities Working Notes (KDD2000), pages 111–117, Boston, MA, August.
Google Scholar
Gaul Wolfang and Schmidt-Thieme Lars. (2000) Mining web navigation path fragments. In Workshop on Web Mining for E-Commerce-Challenges and Opportunities Working Notes (KDD2000), pages 105–110, Boston, MA, August.
Google Scholar
Jose Borges and Mark Levene. (2000) A fine grained heuristic to capture web navigation patterns. SIGKDD Explorations, 2 (1): 40–50.
Article Google Scholar
J. Borges and M. Levene. (1999) Heuristics for mining high quality user web navigation patterns. Research Note RN/99/68, Department of Computer Science, Gower Street, London, UK, October.
Google Scholar
J. Borges and M. Levene. (2000) A heuristic to capture longer user web navigation patterns. In Proc. Of the First International Conference on Electronic Commerce and Web Technologies, Greenwich, U.K., September.
Google Scholar
Ming-Syan Chen, Jong Soo Park, and Philip S. Yu. (1998) EÆcient data mining for path traversal patterns. IEEE Transactions on knowledge and data engineering, 10(2):209–221, march/april.
Google Scholar
Jian Pei, Jiawei Han, Behzad Mortazavi-AsI, and Hua Zhu. (2000) Mining access patterns eiEciently from web logs. In Proceedings Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD’00).
Google Scholar
Cyrus Shahabi, Adil Faisal, Farnoush Banaei Kashani, and Jabed Faruque. (2000) 1NSITE: A tool for real-time knowledge discovery from users web navigation. In Proceedings of Very Large Databases (VLDB’2000), Cairo, Egypt, September.
Google Scholar
Cyrus Shahabi, Adil Faisal, Farnoush Banaei Kashani, and Jabed Faruque. (2000) Insite: A tool for interpreting users? interaction with a web space. In Amr El Abbadi, Michael L. Brodie, Sharma Chakravarthy, Umeshwar Dayal, Nabil Kamel, Gunter Schlageter, and Kyu-Young Whang, editors, VLDB 2000, Proceedings of 26th International Conference on Very Large Data Bases, September 10–14, 2000, Cairo, Egypt, pages 635–638. Morgan Kaufmann.
Google Scholar
H. Kato, T. Nakayama, and Y. Yamane. (2000) Navigation analysis tool based on the correlation between contents distribution and access patterns. In Workshop on Web Mining for E-Commerce-Challenges and Opportunities Working Notes (KDD2000), pages 95–104, August.
Google Scholar
Myra Spiliopoulou and Lukas C. Faulstich. (1998) WUM: a Web Utilization Miner. In Workshop on the Web and Data Bases (WebDB98), pages 109–115.
Google Scholar
Stuart Schechter, Murali Krishnan, and Michael D. Smith. (1998) Using path profiles to predict HTTP requests. Computer Networks and ISDN Systems, 30(1–7):457–467, April.
Google Scholar
Cyrus Shahabi, Farnoush Banaei-Kashani, Jabed Faruque, and Adil Faisal. (2001) Feature matrices: A model for elEcient and anonymous web usage mining. In Proceedings of EC-Web 2001, Germany, September.
Google Scholar
John S. Breese, David Heckerman, and Carl Kadie. (1998) Empirical analysis of predictive algorithms for collaborative filtering. In Gregory F. Cooper and Serafin Moral, editors, Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 43–52, San Francisco, July 24–26. Morgan Kaufmann.
Google Scholar
Joseph A. Konstan, Bradley N. Miller, David Maltz, Jonathan L. Herlocker, Lee R. Gordon, and John Riedl. (1997) GroupLens: Applying collaborative filtering to Usenet news. Communications of the ACM, 40(3):77–87, March.
Google Scholar
Upendra Shardanand and Patti Maes. (1995) Social information filtering: Algorithms for automating \word of mouth“. In Proceedings of ACM CI-11’95 Conference on Human Factors in Computing Systems, volume I of Papers: Using the Information of Others, pages 210–217.
Google Scholar
Daniel Billsus and Michael J. Pazzani. (1998) Learning collaborative information filters. In Proc. 15th International Conf. on Machine Learning, pages 46–54. Morgan Kaufmann, San Francisco, CA.
Google Scholar
Slodoban Vucetic and Zoran Obradovic. (2000) A regression based approach for scaling-up personalized recommeder systems in e-commerce. In The Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Workshop on Web Mining for E-Commerce-Challenges and Opportunities), August.
Google Scholar
Lise Getoor and Mehran Sahami. Using probabiistic relational models for collaborative filtering.
Google Scholar
Thomas Hofmann and Jan Puzicha. (1999) Latent class models for collaborative filtering. In Dean Thomas, editor, Proceedings of the 16th International Joint Conference on Artificial Intelligence (IJCAI-99-Vol2), pages 688–693, S.F., July 31-August 6. Morgan Kaufmann Publishers.
Google Scholar
Yezdi Lashkari. (1995) Feature guided atomated collaborative filtering. Master’s thesis, Massachutes institute of tech.
Google Scholar
T. Joachims, D. Freitag, and T. Mitchell. (1997) Webwatcher: A tour guide for the world wide web. In Proceedings of IJCAI97.
Google Scholar
Henry Lieberman, Christopher Fry, and Louis Weitzman. (2001) Exploring the web with reconnaissance agents. Communications of the ACM, 44 (8): 69–75.
Article Google Scholar
Henry Lieberman. (1995) Letizia: An agent that assists web browsing. In Chris S. Mellish, editor, Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pages 924–929. Morgan Kaufmann publishers Inc.: San Mateo, CA, USA, August 20–25.
Google Scholar
http://www.alexa.com.
J. Budzik, K.J. Hammond, C. Marlow, and A. Scheinkman. (1998) Anticipating information needs: Everyday applications as interfaces to internet information sources. In Proceedings of the 1998 World Conference on the W W W, Internet, and Intranet. AACE Press.
Google Scholar
Soumen Chakrabarti, Byron Dom, Prabhakar Raghavan, Sridhar Rajagopalan, David Gibson, and Jon Kleinberg. (1998) Automatic resource compilation by analyzing hyperlink structure and associated text. Computer Networks and ISDN Systems, 30 (1–7): 65–74.
Article Google Scholar
Y. S. Choi and S. I. Yoo. (1999) Multi-agent Web information retrieval: Neural network based approach. Lecture Notes in Computer Science, 1642: 499.
Google Scholar
P. Werbos. (1974) Beyond Regression New Tools for Prediction and Analysis in the Behaviroal Scienes. PhD thesis, Harvard.
Google Scholar
Y Yao, H. J. Hamilton, and X.W Wang. (2000) PagePrompter: An intelligent agent for web navigation created using data mining techniques. Technical report, Department of Computer Science, November.
Google Scholar
J. Hartigan. (1975) Clustering Algorithm. John Willey.
Google Scholar
Rakesh Agrawal and Ramakrishnan Srikant. (1994) Fast algorithms for mining association rules. In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo, editors, Proc. 20th Int. Conf. Very Large Data Bases, VLDB, pages 487–499. Morgan Kaufmann, 12–15 September.
Google Scholar
Juan Pedro Caraça-Valente and Ignacio Lopez-Chavarrias. (2000) Discovering similar patterns in time series. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-00), pages 497–505, N. Y., August 20–23. ACM Press.Lenz M., Hübner A., Kunze M. (1998). Textual CBR. In: Lenz M., Bartsch-Spörl B., Burkhard H.-D., Wess S. (Eds.) (1998): Case-Based Reasoning Technology. From Foundations to Applications. Springer Verlag, Berlin, Heidelberg.
Google Scholar

Download references

Author information

Authors and Affiliations

DLSIIS Facultad de Informática, U.P.M., Madrid, Spain
Ernestina Menasalvas
Departamento de Informática, Universidad Carlos III, Madrid, Spain
Oscar Marbán
Universidad del Valle, Cali., Colombia
Socorro Millán
DATSI Facultad de Informática, U.P.M., Madrid, Spain
Jose M. Peña

Authors

Ernestina Menasalvas
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Marbán
View author publications
You can also search for this author in PubMed Google Scholar
Socorro Millán
View author publications
You can also search for this author in PubMed Google Scholar
Jose M. Peña
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, Technical University of Lodz, ul. Sterlinga 16/18, 90-217, Lodz, Poland
Piotr S. Szczepaniak
Systems Research Institute, Polish Academy of Sciences, ul. Newelska 6, 01-447, Warsaw, Poland
Piotr S. Szczepaniak & Janusz Kacprzyk &
Facultad de Informática, Universidad Politécnica de Madrid, Campus de Montegancedo, 28660, Madrid, Spain
Javier Segovia
Computer Science Division, Department of Electrical Engineering and Computer Sciences, University of California, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Menasalvas, E., Marbán, O., Millán, S., Peña, J.M. (2003). Intelligent Web Mining. In: Szczepaniak, P.S., Segovia, J., Kacprzyk, J., Zadeh, L.A. (eds) Intelligent Exploration of the Web. Studies in Fuzziness and Soft Computing, vol 111. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1772-0_22

Download citation

DOI: https://doi.org/10.1007/978-3-7908-1772-0_22
Publisher Name: Physica, Heidelberg
Print ISBN: 978-3-7908-2519-0
Online ISBN: 978-3-7908-1772-0
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics