Skip to main content

A Heuristic Method for Selecting Support Features from Large Datasets

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4508))

Abstract

For feature selection in machine learning, set covering (SC) is most suited, for it selects support features for data under analysis based on the individual and the collective roles of the candidate features. However, the SC-based feature selection requires the complete pair-wise comparisons of the members of the different classes in a dataset, and this renders the meritorious SC principle impracticable for selecting support features from a large number of data.

Introducing the notion of implicit SC-based feature selection, this paper presents a feature selection procedure that is equivalent to the standard SC-based feature selection procedure in supervised learning but with the memory requirement that is multiple orders of magnitude less than the counterpart. With experiments on six large machine learning datasets, we demonstrate the usefulness of the proposed implicit SC-based feature selection scheme in large-scale supervised data analysis.

This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (KRF-2005-003-D00445).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apté, C., Weiss, S., Grout, G.: Predicting defects in disk drive manufacturing: A case study in high-dimensional classification. In: Proceedings of the 9th Conference on Artificial Intelligence for Applications, Orlando, Florida, pp. 212–218 (1993)

    Google Scholar 

  2. Bhandari, I., Colet, E., Parker, J., Pines, Z., Pratap, R., Ramanujam, K.: Advanced scout: Data mining and knowledge discovery in nba. Data Mining and Knowledge Discovery 1, 121–125 (1997)

    Article  Google Scholar 

  3. Carter, C., Catlett, S.: Assessing credit card applications using machine learning. IEEE Expert, 71–79 (1987)

    Google Scholar 

  4. Kim, K., Ryoo, H.: A lad-based method for selecting short oligo probes for genotyping applications. OR Spectrum: Special Issue on OR and Biomedical Informatics, accepted for publication (2006)

    Google Scholar 

  5. Osuna, E., Freund, R., Girosi, F.: Training support vector machines: an application to face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, Puerto Rico, pp. 130–136 (1997)

    Google Scholar 

  6. Rahmann, S.: Fast large scale oligonucleotide selection using the longest common factor approach. Journal of Bioinformatics and Computational Biology 1(2), 343–361 (2003)

    Article  Google Scholar 

  7. Wang, X., Seed, B.: Selection of oligonucleotide probes for protein coding sequences. Bioinformatics 19(7), 796–802 (2003)

    Article  Google Scholar 

  8. Wolberg, W., Mangasarian, O.: Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences 87, 9193–9196 (1990)

    Article  MATH  Google Scholar 

  9. Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)

    MATH  Google Scholar 

  10. Ullman, J.: Pattern Recognition Techniques. Crane, London (1973)

    Google Scholar 

  11. Vapnik, V.: Statistical Learning Theory. Wiley Interscience, Hoboken (1998)

    MATH  Google Scholar 

  12. Bennett, K., Mangasarian, O.: Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software 1, 23–34 (1992)

    Article  Google Scholar 

  13. Falk, J., Lopez-Cardona, E.: The surgical separation of sets. Journal of Global Optimization 11, 433–462 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  14. Megiddo, N.: On the complexity of polyhedral separability. Discrete and Computational Geometry 3, 325–337 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  15. Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of \(\mathcal{NP}-\)Completeness. W.H. Freeman, New York (1979)

    MATH  Google Scholar 

  16. Balas, E., Carrera, M.: A dynamic subgradient-based branch-and-bound procedure for set covering problem. Operation Research 44(6), 875–890 (1996)

    MATH  MathSciNet  Google Scholar 

  17. Caprara, A., Fischetti, M., Toth, P.: A heuristic method for the set covering problem. Operations Research 47(5), 730–743 (1999)

    MATH  MathSciNet  Google Scholar 

  18. Ceria, S., Nobili, P., Sassano, A.: A lagrangian-based heuristic for large-scale set covering problems. Mathematical Programming 81(2), 215–228 (1998)

    Article  MathSciNet  Google Scholar 

  19. Fisher, M., Kedia, P.: Optimal solution of set covering/partitioning problems using dual heuristics. Management Science 36, 674–688 (1990)

    MATH  MathSciNet  Google Scholar 

  20. Vasko, F., Wilson, G.: An efficient heuristic for large set covering problem. Naval Research Logistics Quarterly 31, 163–171 (1984)

    Article  MATH  Google Scholar 

  21. Vasko, F., Wilson, G.: Hybrid heuristics for minimum cardinality set covering problems. Naval Research Logistics Quarterly 33, 241–249 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  22. Boros, E., Hammer, P., Ibaraki, T., Kogan, A., Mayoraz, E., Muchnik, I.: An implementation of logical analysis of data. IEEE Transactions on Knowledge and Data Engineering 12, 292–306 (2000)

    Article  Google Scholar 

  23. Ryoo, H., Jang, I.Y.: Milp approach to pattern generation in logical analysis of data. Machine Learning, submitted (2005)

    Google Scholar 

  24. Borneman, J., Chrobak, M., Vedova, G., Figueroa, A., Jiang, T.: Probe selection algorithms with applications in the analysis of microbial communities. Bioinformatics 17(Suppl. 1), S39–S48 (2001)

    Google Scholar 

  25. Klau, G., Rahmann, S., Schliep, A., Vingron, M., Reinert, K.: Optimal robust non-unique probe selection using integer linear programming. Bioinformatics 20(Suppl. 1), i186–i193 (2004)

    Google Scholar 

  26. Chaval, V.: A greedy heuristic for the set covering problem. Mathematics of Operations Research 4(3), 233–235 (1979)

    Article  MathSciNet  Google Scholar 

  27. Nemhauser, G.L., Wolsey, L.A.: Integer and Combinatorial Optimization. Wiley-Interscience Series I Discrete Mathematics and Optimization. Wiley, New York (1988)

    MATH  Google Scholar 

  28. Murphy, P., Aha, D.: Uci repository of machine learning databases: Readable data repository. Department of Computer Science, University of California at Irvine, CA (1994), Available from World Wide Web: http://www.ics.uci.edu/~mlearn/MLRepository.html.

  29. Heisele, B., Poggio, T., Pontil, M.: Face detection in still grey images. Technical report, MIT Artificial Intelligence Laboratory and Center for Biological and Computational Learning, Massachusetts, A.I. Memo No. 1687, C.B.C.L. Paper No. 187 (2000), Data available from World Wide Web: http://cbcl.mit.edu/cbcl/software-datasets

  30. Hammer, P., Bonates, T.: Logical analysis of data: From combinatorial optimization to medical applications. RUTCOR Research Report 10-2005 (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ming-Yang Kao Xiang-Yang Li

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Ryoo, H.S., Jang, IY. (2007). A Heuristic Method for Selecting Support Features from Large Datasets. In: Kao, MY., Li, XY. (eds) Algorithmic Aspects in Information and Management. AAIM 2007. Lecture Notes in Computer Science, vol 4508. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72870-2_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72870-2_39

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72868-9

  • Online ISBN: 978-3-540-72870-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics