skip to main content
10.1145/1015330.1015397acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicmlConference Proceedingsconference-collections
Article

Redundant feature elimination for multi-class problems

Published: 04 July 2004 Publication History

Abstract

We consider the problem of eliminating redundant Boolean features for a given data set, where a feature is redundant if it separates the classes less well than another feature or set of features. Lavrač et al. proposed the algorithm REDUCE that works by pairwise comparison of features, i.e., it eliminates a feature if it is redundant with respect to another feature. Their algorithm operates in an ILP setting and is restricted to two-class problems. In this paper we improve their method and extend it to multiple classes. Central to our approach is the notion of a neighbourhood of examples: a set of examples of the same class where the number of different features between examples is relatively small. Redundant features are eliminated by applying a revised version of the REDUCE method to each pair of neighbourhoods of different class. We analyse the performance of our method on a range of data sets.

References

[1]
Almuallim, H., & Dietterich, T. G. (1991). Learning with many irrelevant features. Proc. 9th Nat. Conf. on Artificial Intelligence (pp. 547--552). MIT Press.
[2]
Blum, A., & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97, 245--271.
[3]
Ceci, M., Appice, A., & Malerba, D. (2003). Mr-SBC: a Multi-Relational Naive Bayes Classifier. Proc. 7th Eur. Conf. on Principles and Practice of Knowledge Discovery in Databases (pp. 95--106). Springer-Verlag.
[4]
Cohen, W. W. (1995). Fast Effective Rule Induction. Proc. 12th Int. Conf. on Machine Learning (pp. 115--123). Morgan Kaufmann.
[5]
Dash, M., & Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1, 131--156.
[6]
Dzeroski, S., & Lavračč, N., Eds. (2001). Relational Data Mining. Spring-Verlag.
[7]
Freund, Y. & Schapire, R. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Computer and System Sciences, 55(1), 119--139.
[8]
Hall, M. (2000). Correlation-based feature selection for discrete and numeric class machine learning. Proc. 17th Int. Conf. on Machine Learning (pp. 359--366). Morgan Kaufmann.
[9]
Keerthi, S. S., Shevade, S. K., Bhattacharyya, C., & Murthy, K. R. K. Improvements to Platt's SMO Algorithm for SVM Classifier Design. Neural Computation, 13(3), 637--649.
[10]
Kira, K., & Rendell, L. A. (1992). A practical approach to feature selection. Proc. 9th Int. Conf. on Machine Learning (pp. 249--256). Morgan Kaufmann.
[11]
Kohavi, R., John, G., & Pfleger, K. (1994). Irrelevant features and the subset selection problem. Proc. 11th Int. Conf. on Machine Learning (pp. 121--129). Morgan Kaufmann.
[12]
Kononenko, I. (1994). Estimating attributes: Analysis and extensions of RELIEF. Proc. Eur. Conf. on Machine Learning (pp. 171--182). Springer-Verlag.
[13]
Krogel, M., Rawles, S., Zelezny, F., Flach, P., Lavrač, N., & Wrobel S. (2003). Comparative evaluation of approaches to propositionalization. Proc. 13th Int. Conf. on Inductive Logic Programming (pp. 197--214). Springer-Verlag.
[14]
Langley, P. (1996). Elements of Machine Learning. Morgan Kaufmann.
[15]
Lavrač, N., Gamberger, D., & Jovanoski V. (1999). A study of relevance for learning in deductive databases. J. Logic Programming, 16, 215--249.
[16]
Lewis, D. D. (1999). Reuters-21578 text categorization test collection distribution 1.0. Available at http://www.research.att.com/lewis.
[17]
Liu, H., & Setiono, R. (1996). A probabilistic approach to feature selection: A filter solution. Proc. 13th Int. Conf. on Machine Learning (pp. 319--327). Morgan Kaufmann.
[18]
Modrzejewski, M. (1993). Feature selection using rough sets theory. Proc. Eur. Conf. on Machine Learning (pp. 213--226). Springer-Verlag.
[19]
Muggleton, S. H., Bain, M., Hayes-Michie J., & Michie, D. (1989). An experimental comparison of human and machine learning formalisms. Proc. 6th Int. Workshop on Machine Learning. Morgan Kaufmann.
[20]
Pagallo, G., & Haussler, D. (1990). Boolean feature discovery in empirical learning. Machine learning, 5 (1), 71--100.
[21]
Quinlan, J. (1993). C4.5: Programs for machine learning. Morgan Kaufmann.
[22]
Raman, B. (2003). Enhancing inductive learning with feature selection and example selection. Master thesis, Texas A & M University.
[23]
Rendell, A., & Sheshu, R. (1990). Learning hard concepts through constructive induction: Framework and rationale. Computational Intelligence, 6, 247--270.
[24]
Schapire, R., & Singer, Y. (2000). A boosting-based system for text categorization. Machine Learning, 39(2/3), 135--168.
[25]
Srinivasan, A., King, R. D., & Muggleton, S. (1999). The role of background knowledge: using a problem from chemistry to examine the performance of an ILP program. Technical Report PRG-TR-08-99, Oxford University Computing Laboratory.
[26]
Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. Proc. 20th ACM-SIGIR Int. Conf. on Research and Development in Information Retrieval (pp. 42--49). ACM Press.
[27]
Witten, I. & Frank, E. (2000). Data mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann.

Cited By

View all
  • (2024)Machine Learning Framework for Conotoxin Class and Molecular Target PredictionToxins10.3390/toxins1611047516:11(475)Online publication date: 3-Nov-2024
  • (2023)Feature Selection and Dynamic Network Traffic Congestion Classification based on Machine Learning for Internet of ThingsWasit Journal of Computer and Mathematics Science10.31185/wjcms.1502:2(76-91)Online publication date: 1-Jul-2023
  • (2023)A Model Selection Algorithm for Complex CNN Systems Based on Feature-Weights Relation2023 IEEE IAS Global Conference on Emerging Technologies (GlobConET)10.1109/GlobConET56651.2023.10150024(1-7)Online publication date: 19-May-2023
  • Show More Cited By
  1. Redundant feature elimination for multi-class problems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICML '04: Proceedings of the twenty-first international conference on Machine learning
    July 2004
    934 pages
    ISBN:1581138385
    DOI:10.1145/1015330
    • Conference Chair:
    • Carla Brodley

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 July 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 140 of 548 submissions, 26%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)20
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Machine Learning Framework for Conotoxin Class and Molecular Target PredictionToxins10.3390/toxins1611047516:11(475)Online publication date: 3-Nov-2024
    • (2023)Feature Selection and Dynamic Network Traffic Congestion Classification based on Machine Learning for Internet of ThingsWasit Journal of Computer and Mathematics Science10.31185/wjcms.1502:2(76-91)Online publication date: 1-Jul-2023
    • (2023)A Model Selection Algorithm for Complex CNN Systems Based on Feature-Weights Relation2023 IEEE IAS Global Conference on Emerging Technologies (GlobConET)10.1109/GlobConET56651.2023.10150024(1-7)Online publication date: 19-May-2023
    • (2023)Encoding for Reinforcement Learning Driven SchedulingJob Scheduling Strategies for Parallel Processing10.1007/978-3-031-22698-4_4(68-87)Online publication date: 12-Jan-2023
    • (2022)Building reliable radiomic models using image perturbationScientific Reports10.1038/s41598-022-14178-x12:1Online publication date: 16-Jun-2022
    • (2021)Iteratively local fisher score for feature selectionApplied Intelligence10.1007/s10489-020-02141-0Online publication date: 5-Feb-2021
    • (2020)Gender and Age Group Predictions from Speech Features using Multi-Layer Perceptron Model2020 IEEE 17th India Council International Conference (INDICON)10.1109/INDICON49873.2020.9342434(1-6)Online publication date: 10-Dec-2020
    • (2020)Prediction of Age from Speech Features Using a Multi-Layer Perceptron Model2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT49239.2020.9225390(1-6)Online publication date: Jul-2020
    • (2019)Reducing features to improve link prediction performance in location based social networks, non-monotonically selected subset from feature clustersProceedings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining10.1145/3341161.3343853(809-815)Online publication date: 27-Aug-2019
    • (2019)A New Feature Selection Method based on Monarch Butterfly Optimization and Fisher Criterion2019 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2019.8852063(1-6)Online publication date: Jul-2019
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media