Skip to main content

Abstract

Nowadays, the Web is an essential tool for most people. Internet provides millions of web pages for each and every search term. The Internet is a powerful medium for communication between computers and accessing online documents but it is not a tool for locating or organizing information. Tools like search engines assist users in locating information. The amount of daily searches on the web is broad and the task of getting interesting and required results quickly becomes very difficult. The use of an automatic web page classifier can simplify the process by assisting the search engine in getting relevant results. The web pages can present different and varied information depending on the characteristics of its content. The uncontrolled nature of web content presents additional challenges to web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process. This paper analyses the feasibility of an automatic web page classifier, proposes several classifiers and studies their precision. In this sense, Data Mining techniques are of great importance and will be used to construct the classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dunham, M.H.: Data Mining. Introductory and Advanced Topics. Prentice Hall, New Jersey (2003)

    Google Scholar 

  2. Qi, X., Davison, B.D.: Web Page Classification: Features and Algorithms. ACM Computing Surveys 41(2), Article 12 (2009)

    Google Scholar 

  3. Miró-Julià, M., Fiol-Riog, G., Vaquer-Ferrer, D.: Classification using Intelligent Approaches: an Example in Social Assistance. Frontiers in Artificial Intelligence and Applications 202, 138–146 (2009)

    Google Scholar 

  4. Fiol-Roig, G.: UIB-IK: A Computer System for Decision Trees Induction. In: Raś, Z.W., Skowron, A. (eds.) ISMIS 1999. LNCS, vol. 1609, pp. 601–611. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  5. Miró-Julià, M., Fiol-Roig, G.: An Algebra for the Treatment of Multivalued Information Systems. In: Perales, F.J., Campilho, A.C., Pérez, N., Sanfeliu, A. (eds.) IbPRIA 2003. LNCS, vol. 2652, pp. 556–563. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  6. Fiol-Roig, G.: Learning from Incompletely Specified Object Attribute Tables with Continuous Attributes. Frontiers in Artificial Intelligence and Applications 113, 145–152 (2004)

    Google Scholar 

  7. Miró-Julià, M.: Degenerate Arrays: a Framework using Uncertain Data Tables. In: Moreno Díaz, R., Pichler, F., Quesada Arencibia, A. (eds.) EUROCAST 2005. LNCS, vol. 3643, pp. 21–26. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  8. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fiol-Roig, G., Miró-Julià, M., Herraiz, E. (2011). Data Mining Techniques for Web Page Classification. In: Pérez, J.B., et al. Highlights in Practical Applications of Agents and Multiagent Systems. Advances in Intelligent and Soft Computing, vol 89. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19917-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-19917-2_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-19916-5

  • Online ISBN: 978-3-642-19917-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics