Abstract
Automated classification of web pages is an important research direction in web mining, which aims to construct a classification model that can classify new instances based on labeled web documents. Machine learning algorithms are adapted to textual classification problems, including web document classification. Artificial immune systems are a branch of computational intelligence inspired by biological immune systems which is utilized to solve a variety of computational problems, including classification. This paper examines the effectiveness and suitability of artificial immune system based approaches for web page classification. Hence, two artificial immune system based classification algorithms, namely Immunos-1 and Immunos-99 algorithms are compared to two standard machine learning techniques, namely C4.5 decision tree classifier and Naïve Bayes classification. The algorithms are experimentally evaluated on 50 data sets obtained from DMOZ (Open Directory Project). The experimental results indicate that artificial immune based systems achieve higher predictive performance for web page classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Fürnkranz, J.: Web Mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 891–920. Springer, Heidelberg (2005)
Zhang, Q., Richard, S.: Web Mining: A Survey of Current Research, Techniques, and Software. Int. J. Info. Tech. Dec. Mak. 7, 683–720 (2008)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2011)
Bhatia, M.P.S., Kumar, A.: Information Retrieval and Machine Learning: Supporting Technologies for Web Mining Research and Practice. Webology 5(2), Article 55 (2008)
Qi, X., Davison, B.D.: Web Page Classification: Features and Algorithms. ACM Computing Surveys 41(2), Article 12 (2009)
Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)
de Castro, L.N., Timmis, J.: Artificial Immune Systems: A Novel Paradigm to Pattern Recognition. In: Corchado, J.M., Alonso, L., Fyfe, C. (eds.) Artificial Neural Networks in Pattern Recognition, pp. 67–84 (2002)
Zheng, J., Chen, Y., Zhang, W.: A Survey of Artificial Immune Applications. Artificial Intelligence Review 34, 19–34 (2010)
Lee, H.-M., Chen, C.-M., Tan, C.-C.: An Intelligent Web-Page Classifier with Fair Feature-Subset Selection. In: Joint 9th IFSA World Congress and 20th NAFIPS International Conference, pp. 395–400. IEEE Press, New York (2001)
Haruechaiyasak, C., Shyu, M.-C., Chen, S.-C.: Web Document Classification Based on Fuzzy Association. In: 26th Annual International Computer Software and Applications Conference, pp. 487–492. IEEE Press, New York (2002)
Wang, Y., Hodges, J., Tang, B.: Classification of Web Documents Using a Naïve Bayes Method. In: 15th IEEE International Conference on Tools with Artificial Intelligence, pp. 560–564. IEEE Press, New York (2003)
Kwon, O.-W., Lee, J.-H.: Text Categorization based on K-nearest Neighbor Approach for Web site Classification. Information Processing and Management 39, 25–44 (2003)
Qi, D., Sun, B.: A Genetic K-means Approaches for Automated Web Page Classification. In: Proceedings of the 2004 IEEE International Conference on Information Reuse and Integration, pp. 241–246. IEEE Press, New York (2004)
Selamat, A., Omatu, S.: Web page feature selection and classification using neural networks. Information Sciences 158, 69–88 (2004)
Yi, G., Hu, H., Lu, Z.: Web Document Classification Based on Extended Rough Set. In: PDCAT 2005, pp. 916–919. IEEE Press, New York (2005)
Chen, R.-C., Hsich, C.-H.: Web Page Classification Based on a Support Vector Machine Using a Weighted Vote Schema. Expert Systems with Applications 31, 427–435 (2006)
Materna, J.: Automated Web Page Classification. In: Proceedings of Recent Advances in Slavonic Natural Language Processing, Masaryk University, pp. 84–93 (2008)
Zhang, J., Niu, Y., Nie, H.: Web Document Classification Based on Fuzzy k-NN Algorithm. In: Proceedings of the 2009 International Conference on Computational Intelligence and Security, pp. 193–196. IEEE Press, Washington (2009)
Chen, C.-M., Lee, H.-M., Chang, Y.-J.: Two Novel Feature Selection Approaches for Web Page Classification. Expert Systems with Applications 36, 260–272 (2009)
Özel, S.A.: A Web Page Classification System Based on a Genetic Algorithm Using Tagged-Terms as Features. Expert Systems with Applications 38, 3407–3415 (2011)
de Castro, L.N., Timmis, J.: Artificial Immune System: A New Computational Intelligence Approach. Springer, Heidelberg (2002)
Timmis, J., Hone, A., Stibor, T., Clark, E.: Theoretical advances in artificial immune systems. Theoretical Computer Science 403, 11–32 (2008)
Sinha, J.K., Bhattacharya, S.: A Text Book of Immunology. Academic Pub., Kolkata (2006)
de Castro, L.N., Zuben, F.J.V.: Artificial Immune Systems: Part I- Basic Theory and Applications, Technical report, RT-DCA (1999)
de Castro, L., Zuben, F.: Learning and Optimization Using the Clonal Selection Principle. IEEE Transactions on Evolutionary Computation 6(3), 239–251 (2002)
Ruochen, L., Haifeng, D., Licheng, J.: Immunity Clonal Strategies. In: Proceedings of the Fifth International Conference on Computational Intelligence and Multimedia Applications, pp. 290–295. IEEE Press, Washington (2003)
Garrett, S.: Parameter-Free Adaptive Clonal Selection. In: Proceedings of Congress on Evolutionary Computation, pp. 1052–1058. IEEE Press, Washington (2004)
White, J.A., Garrett, S.M.: Improved Pattern Recognition with Artificial Clonal Selection? In: Timmis, J., Bentley, P.J., Hart, E. (eds.) ICARIS 2003. LNCS, vol. 2787, pp. 181–193. Springer, Heidelberg (2003)
Carter, J.H.: The immune system as a model for classification and pattern recognition. Journal of the American Informatics Association 7, 28–41 (2000)
Brownlee, J.: Immunos-81: The Misunderstood Artificial Immune System. Technical report, Swinburne University (2005)
Wilson, W.O., Birkin, P., Aickelin, U.: Price Trackers Inspired by Immune Memory. In: Bersini, H., Carneiro, J. (eds.) ICARIS 2006. LNCS, vol. 4163, pp. 362–375. Springer, Heidelberg (2006)
Forrest, S., Perelson, A., Allen, L., Cherukuri, R.: Self-nonself discrimination in a computer. In: Proceedings of the IEEE Symposium on Research in Security and Privacy, pp. 202–212. IEEE Press, New York (1994)
Talbi, E.-G.: Metaheuristics: From Design to Implementation. Wiley, New York (2009)
Hofmeyr, S.A., Forrest, S.: Architecture for an Artificial Immune System. Evolutionary Computation 8(4), 443–473 (2000)
Timmis, J., Neal, M., Hunt, J.: An Artificial Immune System for Data Analysis. Biosystems 55, 143–150 (2000)
Kopacek, L., Olej, V.: Municipal Creditworthiness Mlodeling by Artificial Immune Systems. Acta Electrotehnica et Informatica 10(1), 3–11 (2010)
DMOZ Open Directory Project Dataset, http://www.unicauca.edu.co/~ccobos/wdc/wdc.htm
WEKA Classification Algorithms, http://wekaclassalgos.sourceforge.net/
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Onan, A. (2015). Artificial Immune System Based Web Page Classification. In: Silhavy, R., Senkerik, R., Oplatkova, Z., Prokopova, Z., Silhavy, P. (eds) Software Engineering in Intelligent Systems. Advances in Intelligent Systems and Computing, vol 349. Springer, Cham. https://doi.org/10.1007/978-3-319-18473-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-18473-9_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18472-2
Online ISBN: 978-3-319-18473-9
eBook Packages: EngineeringEngineering (R0)