Skip to main content

Web Page Classification: A Soft Computing Approach

  • Conference paper
  • First Online:
Advances in Web Intelligence (AWIC 2003)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2663))

Included in the following conference series:

Abstract

The Internet makes it possible to share and manipulate a vast quantity of information efficiently and effectively, but the rapid and chaotic growth experienced by the Net has generated a poorly organized environment that hinders the sharing and mining of useful data. The need for meaningful web-page classification techniques is therefore becoming an urgent issue. This paper describes a novel approach to web-page classification based on a fuzzy representation of web pages. A doublet representation that associates a weight with each of the most representative words of the web document so as to characterize its relevance in the document. This weight is derived by taking advantage of the characteristics of HTML language. Then a fuzzy-rule-based classifier is generated from a supervised learning process that uses a genetic algorithm to search for the minimum fuzzy-rule set that best covers the training examples. The proposed system has been demonstrated with two significantly different classes of web pages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. UNCTAD E-Commerce and development report 2002. Report of the United Nations Conference on Trade and Development. United Nations, New York and Geneva (2002).

    Google Scholar 

  2. Gudivada, V.N., Raghavan, V.V., Grosky, W.I., and Kasanagottu, R.: Information retrieval on the World Wide Web. IEEE Internet Computing. September–October (1997) 58–68.

    Google Scholar 

  3. Chen, H. and Dumais, S.T.: Bringing order to the Web: automatically categorizing search results. Proceedings of the CHI’00, Human Factor in Computing Systems, Den Haag, New York, US. ACM Press (2000) 145–152.

    Google Scholar 

  4. Salton, G., Wong, A., and Yang, C.S.: A vector space model for information retrieval. Communications of the ACM. 18-11 (1975) 613–620.

    Article  Google Scholar 

  5. Baeza-Yates, R. and Ribeiro-Neto, B..:Modern information retrieval. ACM Press Books, Addison-Wesley (1999).

    Google Scholar 

  6. Kosala, R. and Blockeel H.: Web mining research: a survey. ACM SIGKDD Explorations. 2-1 (2000) 1–15.

    Article  Google Scholar 

  7. Koller, D. and Sahami, M.: Toward Optimal feature selection. Proceedings of the Thirteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA (1996) 284–292.

    Google Scholar 

  8. Henzinger, M.: Link analysis in web information retrieval. Bulletin of the Technical Committee on Data Engineering. 23-3 (2000) 3–8.

    Google Scholar 

  9. Yang, Y.: A study of approach to hypertext categorization. Journal of Intelligent Information Systems. 18-2/3 (2002) 219–241.

    Article  Google Scholar 

  10. Ribeiro, A., Fresno, V., García-Alegre, M.C., and Guinea, D.: A fuzzy system for the web representation. Intelligent Exploration of the Web. Studies in Fuzziness and Soft Computing. Szczepaniak, P.S., Segovia, J., Kacprzyk, J., and Zadeh, L.A. Editors. Physica-Verlag, Berlin Heidelberg New York (2003) 19–37.

    Google Scholar 

  11. Pierre, J.M.: On the automated classification of web sites. Linköping Electronic Articles in Computer and Information Science. Linköping University Electronic Press Linköping, Sweden. 6 (2001).

    Google Scholar 

  12. Fresno V. and Ribeiro.: A.feature selection and dimensionality reduction in web pages representation. Proceedings of the International Congress on Computational Intelligence: Methods & Applications. Bangor, Wales, U.K. (2001) 416–421.

    Google Scholar 

  13. Gasós J., Fernandéz P.D., García-Alegre M.C., Garcia Rosa R.: Environment for the development of fuzzy controllers. Proceedings of the International Conference. on AI: Applications & N.N. (1990) 121–124.

    Google Scholar 

  14. Michalewicz Z.: Genetic Algorithms + Data Structures = Evolution Programs. 3rd edn. Springer-Verlag, Berlin Heidelberg New York (1996).

    MATH  Google Scholar 

  15. Freitas, A.A.: Data mining and knowledge discovery with evolutionary algorithms. Natural Computing Series. Springer-Verlag, Berlin Heidelberg New York (2002).

    Google Scholar 

  16. Dasgupta, D. and Gonzales, F.A.: Evolving complex fuzzy classifier rules using a linear tree genetic representation. Proceedings of the Genetic and Evolutionary Computation Conference (GECCO’2001). Morgan Kaufmann (2001) 299–305.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ribeiro, A., Fresno, V., Garcia-Alegre, M.C., Guinea, D. (2003). Web Page Classification: A Soft Computing Approach. In: Menasalvas, E., Segovia, J., Szczepaniak, P.S. (eds) Advances in Web Intelligence. AWIC 2003. Lecture Notes in Computer Science, vol 2663. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44831-4_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-44831-4_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40124-7

  • Online ISBN: 978-3-540-44831-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics