Abstract
This paper proposes one method of feature selection by using Bayes’ theorem. The purpose of the proposed method is to reduce the computational complexity and increase the classification accuracy of the selected feature subsets. The dependence between two attributes (binary) is determined based on the probabilities of their joint values that contribute to positive and negative classification decisions. If opposing sets of attribute values do not lead to opposing classification decisions (zero probability), then the two attributes are considered independent of each other, otherwise dependent, and one of them can be removed and thus the number of attributes is reduced. The process must be repeated on all combinations of attributes. The paper also evaluates the approach by comparing it with existing feature selection algorithms over 8 datasets from University of California, Irvine (UCI) machine learning databases. The proposed method shows better results in terms of number of selected features, classification accuracy, and running time than most existing algorithms.
Similar content being viewed by others
References
R. Agrawal, T. Imielinski, A. Swami. Database Mining: A Performance Perspective. IEEE Transactions on Knowledge and Data Engineering, vol. 5, no. 6, pp. 914–925, 1993.
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth. From Data Mining to Knowledge Discovery: An Overview. Advances in Knowledge Discovery and Data Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (eds.), pp. 495–515, AAAI Press/MIT Press, Menlo Park, CA, USA, 1996.
J. Han Y. Fu. Attribute-oriented Induction in Data Mining. Advances in Knowledge Discovery and Data Mining, U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, R. Uthurusamy (eds.), pp. 399–421, AAAI Press/MIT Press, Menlo Park, CA, USA, 1996.
J. Han, M. Kamber. Data Mining: Concepts and Techniques, Morgan Kaufman, 2005.
H. Liu, H. Motoda. Feature Selection for Knowledge Discovery and Data Mining, Kluwer Academic, Boston, USA, 1998.
D. Pyle. Data Preparation for Data Mining, Morgan Kaufmann, 1999.
A. L. Blum, P. Langley. Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence, vol. 97, no. 1–2, pp. 245–271, 1997.
H. Liu, H. Motoda. Feature Extraction, Construction and Selection: A Data Mining Perspective, Kluwer Academic, Boston, USA, 1998, 2nd printing, 2001.
M. Ben-Bassat. Pattern Recognition and Reduction of Dimensionality. Handbook of Statistics II, P. R. Krishnaiah, L. N. Kanal (eds.), North Holland, pp. 773–791, 1982.
A. Jain, D. Zongker. Feature Selection: Evaluation, Application, and Small Sample Performance. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 19, no. 2, 153–158, 1997.
P. Mitra, C. A. Murthy, S. K. Pal. Unsupervised Feature Selection Using Feature Similarity. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 3, pp. 301–312, 2002.
W. Siedlecki, J. Sklansky. On Automatic Feature Selection. International Journal of Pattern Recognition and Artificial Intelligence, vol. 2, no. 2, pp. 197–220, 1988.
N. Wyse, R. Dubes, A. K. Jain. A Critical Evaluation of Intrinsic Dimensionality Algorithms. Pattern Recognition in Practice, E. S. Gelsema, L. N. Kanal (eds.), pp. 415–425, Morgan Kaufmann, 1980.
G. H. John, R. Kohavi, K. Pfleger. Irrelevant Feature and the Subset Selection Problem. In Proceedings of the 11th International Conference onMachine Learning, Morgan Kaufmann, New Brunswick, New Jersey, USA, pp. 121–129, 1994.
K. Kira, L. A. Rendell. The Feature Selection Problem: Traditional Methods and a New Algorithm. In Proceedings of the 10th National Conference on Artificial Intelligence, MIT Press, San Jose, California, USA, pp. 129–134, 1992.
R. Kohavi, G. H. John. Wrappers for Feature Subset Selection. Artificial Intelligence, vol. 97, no. 1–2, pp. 273–324, 1997.
M. Dash, K. Choi, P. Scheuermann, H. Liu. Feature Selection for Clustering — A Filter Solution. In Proceedings of the 2nd International Conference on Data mining, IEEE Computer Society Press, Maebashi City, Japan, pp. 115–122, 2002.
M. Dash, H. Liu. Feature Selection for Classification. Intelligent Data Analysis, vol. 1, no. 3, pp. 131–156, 1997.
Y. Kim, W. N. Street, F. Menczer. Feature Selection for Unsupervised Learning via Evolutionary Search. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM Press, Boston, MA, USA, pp. 365–369, 2000.
E. Leopold, J. Kindermann. Text Categorization with Support Vector Machines: How to Represent Texts in Input Space? Machine Learning, vol. 46, no. 1, pp. 423–444, 2002.
K. Nigam, A. K. Mccallum, S. Thrun, T. Mitchell. Text Classification from Labeled and Unlabeled Documents Using EM. Machine Learning, vol. 39, no. 2, pp. 103–134, 2000.
Y. Yang, J. O. Pederson. A Comparative Study on Feature Selection in Text Categorization. In Proceedings of the 14th International Conference on Machine Learning, Morgan Kaufmann, Nashville, Tennessee, USA, pp. 412–420, 1997.
Y. Rui, T. S. F. Huang, S. Chang. Image Retrieval: Current Techniques, Promising Directions and Open Issues. Journal of Visual Communication and Image Representation, vol. 10, no. 1, pp. 39–62, 1999.
D. L. Swets, J. J. Weng. Efficient Content-based Image Retrieval Using Automatic Feature Selection. In Proceedings of IEEE International Symposium on Computer Vision, IEEE Computer Society Press, pp. 85–90, 1995.
K. S. Ng, H. Liu. Customer Retention via Data Mining. Artificial Intelligence Review, vol. 14, no. 6, pp. 569–590, 2000.
W. Lee, S. J. Stolfo, K. W. Mok. Adaptive Intrusion Detection: A Data Mining Approach. Artificial Intelligence Review, vol. 14, no. 6, pp. 533–567, 2000.
E. Xing, M. I. Jordan, R. M. Karp. Feature Selection for High-dimensional Genomic Microarray Data. In Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann, Madison, Wisconson, USA, pp. 601–608, 2001.
A. L. Blum, R. L. Rivest. Training a 3-Node Neural Networks is NP-Complete. Neural Networks, vol. 5, no. 1, pp. 117–127, 1992.
P. Langley. Selection of Relevant Features in Machine Learning. In Proceedings of AAAI Fall Symposium on Relevance, AAAI Press, Menlo Park, California, USA, pp. 140–144, 1994.
A. J. Miller. Subset Selection in Regression, 2nd Edition, Chapman & Hall/CRC, 2002.
T. Hastie, R. Tibshirani, J. Friedman. The Elements of Statistical Learning, Springer, 2001.
J. Doak. An Evaluation of Feature Selection Methods and Their Application to Computer Security, Technical Report, Department of Computer Science, University of California at Davis, USA, 1992.
M. Dash, H. Liu. Handling Large Unsupervised Data via Dimensionality Reduction. In Proceedings of SIGMOD Research Issues in Data Mining and Knowledge Discovery Workshop, 1999.
M. Dash, H. Liu, J. Yao. Dimensionality Reduction of Unsupervised Data. In Proceedings of the 9th IEEE International Conference on Tools with Artificial Intelligence, IEEE Press, Newport Beach, CA, USA, pp. 532–539, 1997.
J. G. Dy, C. E. Brodley. Feature Subset Selection and Order Identification for Unsupervised Learning. In Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, San Francisco, CA, USA, pp. 247–254, 2000.
L. Talavera. Feature Selection as a Preprocessing Step for Hierarchical Clustering. In Proceedings of the 16th International Conference on Machine Learning, Morgan Kaufmann, Bled, Slovenia, pp. 389–397, 1999.
M. A. Hall. Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning. In Proceedings of the 17th International Conference on Machine Learning, Morgan Kaufmann, Stanford University, USA, pp. 359–366, 2000.
H. Liu, R. Setiono. A Probabilistic Approach to Feature Selection — A Filter Solution. In Proceedings of the 13th International Conference on Machine Learning, Morgan Kaufmann Publishers, Bari, Italy, pp. 319–327, 1996.
L. Yu, H. Liu. Feature Selection for High-dimensional Data: A Fast Correlation-based Filter Solution. In Proceedings of the 20th International Conference on Machine Learning, AAAI Press, Washington DC, USA, pp. 856–863, 2003.
R. Caruana, D. Freitag. Greedy Attribute Selection. In Proceedings of the 11th International Conference of Machine Learning, Morgan Kaufmann, New Jersey, USA, pp. 28–36, 1994.
S. Das. Filters, Wrappers and a Boosting-based Hybrid for Feature Selection. In Proceedings of the 18th International Conference on Machine Learning, Morgan Kaufmann, Williams College, Williamstown, MA, USA, pp. 74–81, 2001.
A. Y. Ng. On Feature Selection: Learning with Exponentially Many Irrelevant Features as Training Examples. In Proceedings of the 15th International Conference on Machine Learning, Morgan Kaufmann, Madison, Wisconson, USA, pp. 404–412, 1998.
J. R. Quinlan. Induction of Decision Trees. Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.
J. R. Quinlan. C4.5: Programs for Machine Learning, Morgan Kaufmann, San Francisco, 1993.
L. Breiman, J. Friedman, C. J. Stone, R. A. Olshen. Classification and Regression Trees, Wadsworth, Belmont, CA, 1984.
R. S. Michalski. Pattern Recognition as Rule-guided Inductive Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 2, no. 4, pp. 349–361, 1980.
P. M. Narendra, K. Fukunaga. A Branch and Bound Algorithm for Feature Subset Selection. IEEE Transactions on Computers, vol. 26, no. 9, pp. 917–922, 1977.
P. Pudil, J. Novovicova, J. Kittler. Floating Search Methods in Feature Selection. Pattern Recognition Letters, vol. 15, no. 11, pp. 1119–1125,1994.
P. Somol, P. Pudil, J. Kittler. Fast Branch and Bound Algorithm in Feature Selection. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 26, no. 7, pp. 900–912, 2000.
J. Casillas, O. Cordon, M. J. Del Jesus, F. Herrera. Genetic Feature Selection in a Fuzzy Rule-based Classification System Learning Process for High-dimensional Problems. Information Sciences, vol. 136, no. 1–4, pp. 135–157, 2001.
N. Xiong. A Hybrid Approach to Input Selection for Complex Processes. IEEE Transactions on Systems, Man, and Cybernetics — Part A, vol. 32, no. 4, pp. 532–536, 2002.
L. I. Kuncheva, J. C. Bezdek. Nearest Prototype Classification: Clustering, Genetic Algorithms or Random Search. IEEE Transactions on Systems, Man, and Cybernetics — Part C, vol. 28, no. 1, pp. 160–164, 1998.
S. Y. Ho, C. C. Liu, S. Liu. Design of an Optimal Nearest Neighbor Classifier Using an Intelligent Genetic Algorithm. Pattern Recognition Letters, vol. 23, no. 13, pp. 1495–1503, 2002.
R. Thawonmas, S. Abe. A Novel Approach to Feature Selection Based on Analysis of Class Regions. IEEE Transactions on Systems, Man, and Cybernetics — Part B, vol. 27, no. 2, pp. 196–207, 1997.
K. Kira, L. A. Rendell. A Practical Approach to Feature Selection. In Proceedings of the 9th International Conference on Machine Learning, Morgan Kaufmann, Aberdeen, Scotland, pp. 249–256, 1992.
I. Kononenko. Estimating Attributes: Analysis and Extensions of RELIEF. In Proceedings of Europe International Conference on Machine Learning, Springer-Verlag, New York, USA, pp. 171–182, 1994.
S. Cost, S. Salzberg. A Weighted Nearest Algorithm with Symbolic Features. Machine Learning, vol. 10, no. 1, pp. 57–78, 1993.
C. Stanfill, D. Waltz. Towards Memory Based Reasoning. Communications of the ACM, vol. 29, no. 12, pp. 1213–1228, 1986.
S. Zhao, E. C. C. Tsang. On Fuzzy Approximation Operators in Attribute Reduction with Fuzzy Rough Sets. Information Sciences, vol. 178, no. 16, pp. 3163–3176, 2008.
A. Sharma, K. K. Paliwal. Rotational Linear Discriminate Analysis Technique for Dimensionality Reduction. IEEE Transactions on Knowledge and Data Engineering, vol. 20, no. 10, pp. 1336–1347, 2008.
C. L. Blake, C. J. Merz. UCI Repository of Machine Learning Databases, Department of Information and Computer Science, Universitry of California, Irvine, USA, [Online], Available: http://www.ics.uci.edu/mlearn, 1998.
J. Joyce. Bayes’ Theorem. Standford Encyclopedia of Philosophy, E. N. Zalta (ed.), The Metaphysics Research Lab, Stanford University, USA, 2003.
Author information
Authors and Affiliations
Corresponding author
Additional information
Subramanian Appavu Alias Balamurugan is a Ph. D. candidate at the Department of Information and Communication Engineering, Anna University, Chennai, India. He is also an faculty at Thiagarajar College of Engineering, Madurai, India.
His research interests include data mining and text mining.
Ramasamy Rajaram received the Ph.D. degree from Madurai Kamaraj University, India. He is a professor of Department of Computer Science and Information Technology at Thiagarajar College of Engineering, Madurai, India.
His research interests include data mining and information security.
Rights and permissions
About this article
Cite this article
Balamurugan, S.A.A., Rajaram, R. Effective and efficient feature selection for large-scale data using Bayes’ theorem. Int. J. Autom. Comput. 6, 62–71 (2009). https://doi.org/10.1007/s11633-009-0062-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11633-009-0062-2