A Heuristic Method for Selecting Support Features from Large Datasets

Ryoo, Hong Seo; Jang, In-Yong

doi:10.1007/978-3-540-72870-2_39

A Heuristic Method for Selecting Support Features from Large Datasets

Hong Seo Ryoo¹ &
In-Yong Jang¹

Conference paper

732 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4508))

Abstract

For feature selection in machine learning, set covering (SC) is most suited, for it selects support features for data under analysis based on the individual and the collective roles of the candidate features. However, the SC-based feature selection requires the complete pair-wise comparisons of the members of the different classes in a dataset, and this renders the meritorious SC principle impracticable for selecting support features from a large number of data.

Introducing the notion of implicit SC-based feature selection, this paper presents a feature selection procedure that is equivalent to the standard SC-based feature selection procedure in supervised learning but with the memory requirement that is multiple orders of magnitude less than the counterpart. With experiments on six large machine learning datasets, we demonstrate the usefulness of the proposed implicit SC-based feature selection scheme in large-scale supervised data analysis.

This work was supported by the Korea Research Foundation Grant funded by the Korean Government (MOEHRD) (KRF-2005-003-D00445).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Apté, C., Weiss, S., Grout, G.: Predicting defects in disk drive manufacturing: A case study in high-dimensional classification. In: Proceedings of the 9th Conference on Artificial Intelligence for Applications, Orlando, Florida, pp. 212–218 (1993)
Google Scholar
Bhandari, I., Colet, E., Parker, J., Pines, Z., Pratap, R., Ramanujam, K.: Advanced scout: Data mining and knowledge discovery in nba. Data Mining and Knowledge Discovery 1, 121–125 (1997)
Article Google Scholar
Carter, C., Catlett, S.: Assessing credit card applications using machine learning. IEEE Expert, 71–79 (1987)
Google Scholar
Kim, K., Ryoo, H.: A lad-based method for selecting short oligo probes for genotyping applications. OR Spectrum: Special Issue on OR and Biomedical Informatics, accepted for publication (2006)
Google Scholar
Osuna, E., Freund, R., Girosi, F.: Training support vector machines: an application to face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, Puerto Rico, pp. 130–136 (1997)
Google Scholar
Rahmann, S.: Fast large scale oligonucleotide selection using the longest common factor approach. Journal of Bioinformatics and Computational Biology 1(2), 343–361 (2003)
Article Google Scholar
Wang, X., Seed, B.: Selection of oligonucleotide probes for protein coding sequences. Bioinformatics 19(7), 796–802 (2003)
Article Google Scholar
Wolberg, W., Mangasarian, O.: Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proceedings of the National Academy of Sciences 87, 9193–9196 (1990)
Article MATH Google Scholar
Cortes, C., Vapnik, V.: Support vector networks. Machine Learning 20, 273–297 (1995)
MATH Google Scholar
Ullman, J.: Pattern Recognition Techniques. Crane, London (1973)
Google Scholar
Vapnik, V.: Statistical Learning Theory. Wiley Interscience, Hoboken (1998)
MATH Google Scholar
Bennett, K., Mangasarian, O.: Robust linear programming discrimination of two linearly inseparable sets. Optimization Methods and Software 1, 23–34 (1992)
Article Google Scholar
Falk, J., Lopez-Cardona, E.: The surgical separation of sets. Journal of Global Optimization 11, 433–462 (1997)
Article MATH MathSciNet Google Scholar
Megiddo, N.: On the complexity of polyhedral separability. Discrete and Computational Geometry 3, 325–337 (1988)
Article MATH MathSciNet Google Scholar
Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of \(\mathcal{NP}-\)Completeness. W.H. Freeman, New York (1979)
MATH Google Scholar
Balas, E., Carrera, M.: A dynamic subgradient-based branch-and-bound procedure for set covering problem. Operation Research 44(6), 875–890 (1996)
MATH MathSciNet Google Scholar
Caprara, A., Fischetti, M., Toth, P.: A heuristic method for the set covering problem. Operations Research 47(5), 730–743 (1999)
MATH MathSciNet Google Scholar
Ceria, S., Nobili, P., Sassano, A.: A lagrangian-based heuristic for large-scale set covering problems. Mathematical Programming 81(2), 215–228 (1998)
Article MathSciNet Google Scholar
Fisher, M., Kedia, P.: Optimal solution of set covering/partitioning problems using dual heuristics. Management Science 36, 674–688 (1990)
MATH MathSciNet Google Scholar
Vasko, F., Wilson, G.: An efficient heuristic for large set covering problem. Naval Research Logistics Quarterly 31, 163–171 (1984)
Article MATH Google Scholar
Vasko, F., Wilson, G.: Hybrid heuristics for minimum cardinality set covering problems. Naval Research Logistics Quarterly 33, 241–249 (1986)
Article MATH MathSciNet Google Scholar
Boros, E., Hammer, P., Ibaraki, T., Kogan, A., Mayoraz, E., Muchnik, I.: An implementation of logical analysis of data. IEEE Transactions on Knowledge and Data Engineering 12, 292–306 (2000)
Article Google Scholar
Ryoo, H., Jang, I.Y.: Milp approach to pattern generation in logical analysis of data. Machine Learning, submitted (2005)
Google Scholar
Borneman, J., Chrobak, M., Vedova, G., Figueroa, A., Jiang, T.: Probe selection algorithms with applications in the analysis of microbial communities. Bioinformatics 17(Suppl. 1), S39–S48 (2001)
Google Scholar
Klau, G., Rahmann, S., Schliep, A., Vingron, M., Reinert, K.: Optimal robust non-unique probe selection using integer linear programming. Bioinformatics 20(Suppl. 1), i186–i193 (2004)
Google Scholar
Chaval, V.: A greedy heuristic for the set covering problem. Mathematics of Operations Research 4(3), 233–235 (1979)
Article MathSciNet Google Scholar
Nemhauser, G.L., Wolsey, L.A.: Integer and Combinatorial Optimization. Wiley-Interscience Series I Discrete Mathematics and Optimization. Wiley, New York (1988)
MATH Google Scholar
Murphy, P., Aha, D.: Uci repository of machine learning databases: Readable data repository. Department of Computer Science, University of California at Irvine, CA (1994), Available from World Wide Web: http://www.ics.uci.edu/~mlearn/MLRepository.html.
Heisele, B., Poggio, T., Pontil, M.: Face detection in still grey images. Technical report, MIT Artificial Intelligence Laboratory and Center for Biological and Computational Learning, Massachusetts, A.I. Memo No. 1687, C.B.C.L. Paper No. 187 (2000), Data available from World Wide Web: http://cbcl.mit.edu/cbcl/software-datasets
Hammer, P., Bonates, T.: Logical analysis of data: From combinatorial optimization to medical applications. RUTCOR Research Report 10-2005 (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Information Management Engineering, Korea university, 1, 5-Ka, Anam-Dong, Seongbuk-Ku, Seoul, 136-713, Korea
Hong Seo Ryoo & In-Yong Jang

Authors

Hong Seo Ryoo
View author publications
You can also search for this author in PubMed Google Scholar
In-Yong Jang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ming-Yang Kao Xiang-Yang Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ryoo, H.S., Jang, IY. (2007). A Heuristic Method for Selecting Support Features from Large Datasets. In: Kao, MY., Li, XY. (eds) Algorithmic Aspects in Information and Management. AAIM 2007. Lecture Notes in Computer Science, vol 4508. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72870-2_39

Download citation

DOI: https://doi.org/10.1007/978-3-540-72870-2_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72868-9
Online ISBN: 978-3-540-72870-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics