Skip to main content

A VNS-Based Heuristic for Feature Selection in Data Mining

  • Chapter
Hybrid Metaheuristics

Part of the book series: Studies in Computational Intelligence ((SCI,volume 434))

  • 2136 Accesses

Abstract

The selection of features that describe samples in sets of data is a typical problem in data mining. A crucial issue is to select a maximal set of pertinent features, because the scarce knowledge of the problem under study often leads to consider features which do not provide a good description of the corresponding samples. The concept of consistent biclustering of a set of data has been introduced to identify such a maximal set. The problem can be modeled as a 0–1 linear fractional program, which is NP-hard. We reformulate this optimization problem as a bilevel program, and we prove that solutions to the original problem can be found by solving the reformulated problem. We also propose a heuristic for the solution of the bilevel program, that is based on the meta-heuristic Variable Neighborhood Search (VNS). Computational experiments show that the proposed heuristic outperforms previously proposed heuristics for feature selection by consistent biclustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Belotti, P.: Couenne: a user’s manual. Technical report, Lehigh University (2009)

    Google Scholar 

  2. Busygin, S., Prokopyev, O.A., Pardalos, P.M.: Feature selection for consistent biclustering via fractional 0–1 programming. Journal of Combinatorial Optimization 10, 7–21 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  3. Hitt, B.A., Levine, P.J., Fusaro, V.A., Steinberg, S.M., Mills, G.B., Simone, C., Fishman, D.A., Kohn, E.C., Liotta, L.A., Petricoin III, E.F., Ardekani, A.M.: Use of proteomic patterns in serum to identify ovarian cancer. The Lancet 359, 572–577 (2002)

    Article  Google Scholar 

  4. Fourer, R., Gay, D.M., Kernighan, B.W.: AMPL: A Modeling Language for Mathematical Programming. Brooks/Cole Publishing Company, Cengage Learning (2002)

    Google Scholar 

  5. Hansen, P., Mladenovic, N.: Variable neighborhood search: Principles and applications. European Journal of Operational Research 130(3), 449–467 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  6. Hartigan, J.: Clustering Algorithms. John Wiles & Sons, New York (1975)

    MATH  Google Scholar 

  7. Ilog cplex solver, http://www.ilog.com/products/cplex/

  8. Kent ridge database, http://datam.i2r.a-star.edu.sg/datasets/krbd/

  9. Kundakcioglu, O.E., Pardalos, P.M.: The complexity of feature selection for consistent biclustering. In: Butenko, S., Pardalos, P.M., Chaovalitwongse, W.A. (eds.) Clustering Challenges in Biological Networks. World Scientific Publishing (2009)

    Google Scholar 

  10. Mladenovic, M., Hansen, P.: Variable neighborhood search. Computers and Operations Research 24, 1097–1100 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  11. Mucherino, A.: Extending the definition of β-consistent biclustering for feature selection. In: Proceedings of the Federated Conference on Computer Science and Information Systems, FedCSIS 2011. IEEE (2011)

    Google Scholar 

  12. Mucherino, A., Cafieri, S.: A new heuristic for feature selection by consistent biclustering. Technical Report arXiv:1003.3279v1 (March 2010)

    Google Scholar 

  13. Mucherino, A., Papajorgji, P., Pardalos, P.M.: Data Mining in Agriculture. Springer (2009)

    Google Scholar 

  14. Mucherino, A., Papajorgji, P., Pardalos, P.M.: A survey of data mining techniques applied to agriculture. Operational Research: An International Journal 9(2), 121–140 (2009)

    MATH  Google Scholar 

  15. Mucherino, A., Urtubia, A.: Consistent biclustering and applications to agriculture. In: Proceedings of the Industrial Conference on Data Mining, ICDM 2010, Workshop on Data Mining and Agriculture DMA 2010, IbaI Conference Proceedings, pp. 105–113. Springer, Berlin (2010)

    Google Scholar 

  16. Mucherino, A., Urtubia, A.: Feature selection for datasets of wine fermentations. In: Proceedings of the 10th International Conference on Modeling and Applied Simulation, MAS 2011. I3A (2011)

    Google Scholar 

  17. Nahapatyan, A., Busygin, S., Pardalos, P.M.: An improved heuristic for consistent biclustering problems, vol. 102, pp. 185–198. Springer

    Google Scholar 

  18. Notterman, D.A., Alon, U., Sierk, A.J., Levine, A.J.: Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. Cancer Research 61, 3124–3130 (2001)

    Google Scholar 

  19. Sahinidis, N.V., Tawarmalani, M.: BARON 9.0.4: Global Optimization of Mixed-Integer Nonlinear Programs. User’s Manual (2010)

    Google Scholar 

  20. Tawarmalani, M., Sahinidis, N.V.: A polyhedral branch-and-cut approach to global optimization. Mathematical Programming 103, 225–249 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  21. Notterman, D.A., Gish, K., Ybarra, S., Mack, D., Levine, A.J., Alon, U., Barkai, N.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS 96, 6745–6750 (1999)

    Article  Google Scholar 

  22. Urtubia, A., Perez-Correa, J.R., Meurens, M., Agosin, E.: Monitoring large scale wine fermentations with infrared spectroscopy. Talanta 64(3), 778–784 (2004)

    Article  Google Scholar 

  23. Urtubia, A., Perez-Correa, J.R., Soto, A., Pszczolkowski, P.: Using data mining techniques to predict industrial wine problem fermentations. Food Control 18, 1512–1517 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Mucherino .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Mucherino, A., Liberti, L. (2013). A VNS-Based Heuristic for Feature Selection in Data Mining. In: Talbi, EG. (eds) Hybrid Metaheuristics. Studies in Computational Intelligence, vol 434. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30671-6_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30671-6_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30670-9

  • Online ISBN: 978-3-642-30671-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics