Skip to main content
Log in

Finding Essential Attributes from Binary Data

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

We consider data sets that consist of n-dimensional binary vectors representing positive and negative examples for some (possibly unknown) phenomenon. A subset S of the attributes (or variables) of such a data set is called a support set if the positive and negative examples can be distinguished by using only the attributes in S. In this paper we study the problem of finding small support sets, a frequently arising task in various fields, including knowledge discovery, data mining, learning theory, logical analysis of data, etc. We study the distribution of support sets in randomly generated data, and discuss why finding small support sets is important. We propose several measures of separation (real valued set functions over the subsets of attributes), formulate optimization models for finding the smallest subsets maximizing these measures, and devise efficient heuristic algorithms to solve these (typically NP-hard) optimization problems. We prove that several of the proposed heuristics have a guaranteed constant approximation ratio, and we report on computational experience comparing these heuristics with some others from the literature both on randomly generated and on real world data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. R. Agrawal, T. Imielinski and A. Swami, Mining association rules between sets of items in large databases, in: International Conference on Management of Data (SIGMOD 93) (1993) pp. 207–216.

  2. H. Almuallim and T. Dietterich, Efficient algorithms for identifying relevant features, in: Proceedings of the Ninth Canadian Conference on Artificial Intelligence (Morgan Kaufmann, Vancouver, BC) pp. 38–45.

  3. H. Almuallim and T. Dietterich, Learning Boolean concepts in the presence of many irrelevant features, Artificial Intelligence 69 (1994) 279–305.

    Google Scholar 

  4. D. Angluin, Queries and concept learning, Machine Learning 2 (1988) 319–342.

    Google Scholar 

  5. M. Anthony and N. Biggs, Computational Learning Theory (Cambridge University Press, 1992).

  6. F.J. Banzaf III, Weighted voting doesn't work: A mathematical analysis, Rutgers Law Review 19 (1965) 317–343.

    Google Scholar 

  7. D.A. Bell and H. Wang, A formalism for relevance and its application in feature sebset selection, Machine Learning 41 (2000) 175–195.

    Google Scholar 

  8. D.P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods (Academic Press, 1982).

  9. A. Blum, L. Hellerstein and N. Littlestone, Learning in the presence of finitely or infintely many irrelevant attributes, Journal of Computer and System Sciences 50 (1995) 32–40.

    Google Scholar 

  10. A. Blum and P. Langley, Selection of relevant features and examples in machine learning, Artificial Intelligence 67 (1997) 245–285.

    Google Scholar 

  11. A. Blumer, A. Ehrenfeucht, D. Haussler and M.K. Warmuth, Occam's razor, Information Processing Letters 24 (1987) 377–380.

    Google Scholar 

  12. E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz and I. Muchnik, An implementation of logical analysis of data, IEEE Transactions on Knowledge and Data Engineering 12 (2000) 292–306.

    Google Scholar 

  13. E. Boros, P.L. Hammer, T. Ibaraki and A. Kogan, Logical analysis of numerical data, Mathematical Programming 79 (1997) 163–190.

    Google Scholar 

  14. E. Boros, T. Ibaraki and K. Makino, Error-free and best-fit extensions of a partially defined Boolean function, Information and Computation 140 (1998) 254–283.

    Google Scholar 

  15. E. Boros, T. Ibaraki and K. Makino, Logical analysis of binary data with missing bits, Artificial Intelligence 107 (1999) 219–264.

    Google Scholar 

  16. P.S. Bradley, O.L. Mangasarian and W.N. Street, Feature selection via mathematical programming, INFORMS Journal on Computing 10 (1998) 209–217.

    Google Scholar 

  17. W. Brauer and M. Scherf, Feature selection by means of a feature weighting approach, Technical Report FKI-221-97, Institute für Informatik, Technische Universität München (1997).

  18. L. Breiman, J.H. Friedman, R.A. Olshen and C.J. Stone, Classification and Regression Trees (Wadsworth International Group, 1984).

  19. R. Caruana and D. Freitag, Greedy attribute selection, in: Machine Learning: Proceedings of the Eleventh International Conference (Rutgers University, New Brunswick, NJ, 1994) pp. 28–36.

    Google Scholar 

  20. C.K. Chow, Boolean functions realizable with single threshold devices, Proceedings of IRE 49 (1961) 370–371.

    Google Scholar 

  21. V. Chvátal, A greedy heuristic for the set-covering problem, Mathematics of Operations Research 4(3) (1979) 233–235.

    Google Scholar 

  22. M. Conforti and G. Cornuejols, Submodular set functions, matroids and the greedy algorithm: Tight worst-case bounds and some generalizations of the Rado–Edmonds theorem, Discrete Applied Mathematics 7 (1984) 251–274.

    Google Scholar 

  23. Y. Crama, P.L. Hammer and T. Ibaraki, Cause–effect relationships and partially defined Boolean functions, Annals of Operations Research 16 (1988) 299–326.

    Google Scholar 

  24. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy, Advances in Knowledge Discovery and Data Mining (AAAI Press/The MIT Press, 1996).

  25. U. Feige, A threshold of ln n for approximating set cover, in: Proceedings of the 28th ACM Symposium on Theory of Computing (1996) pp. 314–318.

  26. R. Fourer, D.M. Gay and B.W. Kernighan, A modeling language for mathematical programming, Management Science 36 (1990) 519–554.

    Google Scholar 

  27. M.R. Garey and D.S. Johnson, Computers and Intractability (Freeman, New York, 1979).

    Google Scholar 

  28. M.A. Hall and L.A. Smith, Practical feature subset selection for machine learning, in: Proceedings of the 21st Australasian Computer Science Conference (Springer, 1998) pp. 181–191.

  29. D. Harman (ed.), Overview of the Third Text Retrieval Conference (TREC-3), Gaithersburg, MD (National Institute of Standards and Technology, Special Publications 500-225, 1995).

  30. D.S. Hochbaum and A. Pathria, Analysis of the greedy approach in covering problems, Naval Research Quarterly 45 (1998) 615–627.

    Google Scholar 

  31. T. Ibaraki and M. Fukushima, FORTRAN 77 Optimization Programming (Iwanami, 1991) (in Japanese).

  32. G. John, R. Kohavi and K. Pfleger, Irrelevant features and the subset selection problem, in: Machine Learning: Proceedings of the Eleventh International Conference (Morgan Kaufmann, 1994) pp. 121–129.

  33. M. Karpinski and A. Zelikovsky, Approximating dense cases of covering problems, DIMACS Technical Report, DTR 96-59, DIMACS, Rutgers University (1996).

  34. S. Khuller, A. Moss and J. Naor, The budgeted maximum coverage problem, Information Processing Letters 70 (1999) 39–45.

    Google Scholar 

  35. K. Kira and L. Rendell, The feature selection problem: Traditional methods and a new algorithm, in: Proceedings of the Tenth National Conference on Artificial Intelligence (AAAI Press/The MIT Press, Menlo Park, CA, 1992) pp. 129–134.

    Google Scholar 

  36. D. Koller and M. Sahami, Toward optimal feature selection, in: ICML-96: Proceedings of the Thirtenth International Conference on Machine Learning (Morgan Kaufmann, 1997) pp. 284–292.

  37. N. Littlestone, Learning quickly when irreleveant attributes abound: a new linear-threshold algorithm, Machine Learning 2 (1988) 285–318.

    Google Scholar 

  38. H. Liu, H. Motoda and M. Dash, A monotonic measure for optimal feature selection, in: ECML-98: The 10th European Conference on Machine Learning.

  39. L. Lovász, On the ratio of optimal integral and fractional covers, Discrete Mathematics 13(4) 1975) 383–390.

    Google Scholar 

  40. O.L. Mangasarian, R. Setiono and W.H. Wolberg, Pattern recognition via linear programming: Theory and applications to medical diagnosis, in: Large-Scale Numerical Optimization, eds. T.F. Coleman and Y. Li (SIAM, Philadelphia, PA, 1990) pp. 22–30.

    Google Scholar 

  41. H. Mannila, H. Toivonen and A.I. Verkamo, Efficient algorithms for discovering association rules, in: AAAI Workshop on Knowledge Discovery in Database, eds. U.M. Fayyad and R. Uthurusamy (1994) pp. 181–192.

  42. R. Motwani and P. Raghavan, Randomized Algorithms (Cambridge University Press, New York, 1995).

    Google Scholar 

  43. P.M. Murphy and D.W. Aha, UCI repository of machine learning databases [http://www.ics.uci.edu/∼mlearn/MLRepository.html], Department of Information and Computer Science, University of California (1994).

  44. P.M. Narenda and K. Fukunaga, A branch-and-bound algorithm for feature subset selection, IEEE Transactions on Computers C-26 (1977) 917–922.

    Google Scholar 

  45. G.L. Nemhauser and L. Wolsey, Maximizing submodular set functions: formulations and analysis of algorithms, in: Studies of Graphs and Discrete Programming (North-Holland, Amsterdam, 1981) pp. 279–301.

    Google Scholar 

  46. Y. Nesterov and A. Nemirovskii, Interior-Point Polynomial Algorithms in Convex Programming, SIAM Studies in Applied Mathematics (1994).

  47. J.R. Quinlan, Induction of decision trees, Machine Learning 1 (1986) 81–106.

    Google Scholar 

  48. J.R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann, 1992).

  49. L.S. Shapley and M. Shubik, A method for evaluating the distribution of power in a committee system, Amer. Polit. Sci. Rev. 48 (1954) 787–792.

    Google Scholar 

  50. L.G. Valiant, A theory of the learnable, Communications of the ACM 27 (1984) 1134–1142.

    Google Scholar 

  51. R.O. Winder, Threshold Logic, Ph.D. Dissertation, Department of Mathematics, Princeton University, Princeton, NJ (1962).

    Google Scholar 

  52. R.O. Winder, Chow parameters in threshold logic, Journal of the ACM 18 (1971) 265–289.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Boros, E., Horiyama, T., Ibaraki, T. et al. Finding Essential Attributes from Binary Data. Annals of Mathematics and Artificial Intelligence 39, 223–257 (2003). https://doi.org/10.1023/A:1024653703689

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1024653703689

Navigation