Abstract
Nowadays, the Web is an essential tool for most people. Internet provides millions of web pages for each and every search term. The Internet is a powerful medium for communication between computers and accessing online documents but it is not a tool for locating or organizing information. Tools like search engines assist users in locating information. The amount of daily searches on the web is broad and the task of getting interesting and required results quickly becomes very difficult. The use of an automatic web page classifier can simplify the process by assisting the search engine in getting relevant results. The web pages can present different and varied information depending on the characteristics of its content. The uncontrolled nature of web content presents additional challenges to web page classification as compared to traditional text classification, but the interconnected nature of hypertext also provides features that can assist the process. This paper analyses the feasibility of an automatic web page classifier, proposes several classifiers and studies their precision. In this sense, Data Mining techniques are of great importance and will be used to construct the classifiers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dunham, M.H.: Data Mining. Introductory and Advanced Topics. Prentice Hall, New Jersey (2003)
Qi, X., Davison, B.D.: Web Page Classification: Features and Algorithms. ACM Computing Surveys 41(2), Article 12 (2009)
Miró-Julià, M., Fiol-Riog, G., Vaquer-Ferrer, D.: Classification using Intelligent Approaches: an Example in Social Assistance. Frontiers in Artificial Intelligence and Applications 202, 138–146 (2009)
Fiol-Roig, G.: UIB-IK: A Computer System for Decision Trees Induction. In: Raś, Z.W., Skowron, A. (eds.) ISMIS 1999. LNCS, vol. 1609, pp. 601–611. Springer, Heidelberg (1999)
Miró-Julià, M., Fiol-Roig, G.: An Algebra for the Treatment of Multivalued Information Systems. In: Perales, F.J., Campilho, A.C., Pérez, N., Sanfeliu, A. (eds.) IbPRIA 2003. LNCS, vol. 2652, pp. 556–563. Springer, Heidelberg (2003)
Fiol-Roig, G.: Learning from Incompletely Specified Object Attribute Tables with Continuous Attributes. Frontiers in Artificial Intelligence and Applications 113, 145–152 (2004)
Miró-Julià, M.: Degenerate Arrays: a Framework using Uncertain Data Tables. In: Moreno Díaz, R., Pichler, F., Quesada Arencibia, A. (eds.) EUROCAST 2005. LNCS, vol. 3643, pp. 21–26. Springer, Heidelberg (2005)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fiol-Roig, G., Miró-Julià, M., Herraiz, E. (2011). Data Mining Techniques for Web Page Classification. In: Pérez, J.B., et al. Highlights in Practical Applications of Agents and Multiagent Systems. Advances in Intelligent and Soft Computing, vol 89. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19917-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-19917-2_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19916-5
Online ISBN: 978-3-642-19917-2
eBook Packages: EngineeringEngineering (R0)