Finding Essential Attributes in Binary Data

Boros, Endre; Horiyama, Takashi; Ibaraki, Toshihide; Makino, Kazuhisa; Yagiura, Mutsunori

doi:10.1007/3-540-44491-2_20

Endre Boros⁶,
Takashi Horiyama⁷,
Toshihide Ibaraki⁸,
Kazuhisa Makino⁹ &
…
Mutsunori Yagiura⁸

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1983))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1310 Accesses
4 Citations

Abstract

Given a data set, consisting of n-dimensional binary vectors of positive and negative examples, a subset S of the attributes is called a support set if the positive and negative examples can be distinguished by using only the attributes in S. In this paper we consider several selection criteria for evaluating the “separation power” of supports sets, and formulate combinatorial optimization problems for finding the “best and smallest” support sets with respect to such criteria. We provide efficient heuristics, some with a guaranteed performance rate, for the solution of these problems, analyze the distribution of small support sets in random examples, and present the results of some computational experiments with the proposed algorithms.

This work was partially supported by the Grants in Aid by the Ministry of Education, Science, Sports and Culture of Japan (Grants 09044160 and 10205211). The visit of the first author to Kyoto University (January to March, 1999) was also supported by this grant (Grant 09044160). The research of the first and third authors were supported in part by the Office of Naval Research (Grant N00014-92-J-1375). The first author thanks also the National Science Foundation (Grant DMS 98-06389) and DARPA (Contract N66001-97-C-8537) for partial support.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Almuallim and T. Dietterich. Learning Boolean concepts in the presence of many irrelevant features. Artificial Intelligence 69 (1994) 279–305.
Article MATH MathSciNet Google Scholar
R. Agrawal, T. Imielinski and A. Swami. Mining association rules between sets of items in large databases. In: International Conference on Management of Data (SIGMOD 93), (1993) pp. 207–216.
Google Scholar
D. Angluin. Queries and concept learning. Machine Learning 2 (1988) 319–342.
Google Scholar
A. Blum, L. Hellerstein, and N. Littlestone. Learning in the presence of finitely or infintely many irrelevant attributes. Journal of Computer and System Sciences 50 (1995) pp. 32–40.
Article MATH MathSciNet Google Scholar
A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence 67 (1997) 245–285.
Article MathSciNet Google Scholar
A. Blumer, A. Ehrenfeucht, D. Haussler and M. K. Warmuth. Occam’s razor. Information Processing Letters 24 (1987) 377–380.
Article MATH MathSciNet Google Scholar
E. Boros, P.L. Hammer, T. Ibaraki and A. Kogan. Logical analysis of numerical data Mathematical Programming 79 (1997), 163–190.
MathSciNet Google Scholar
E. Boros, P.L. Hammer, T. Ibaraki, A. Kogan, E. Mayoraz and I. Muchnik. An implementation of logical analysis of data. IEEE Trans. on Knowledge and Data Engineering 12 (2000) 292–306.
Article Google Scholar
E. Boros, T. Horiyama, T. Ibaraki, K. Makino and M. Yagiura. Finding Small Sets of Essential Attributes in Binary Data. DIMACS Technical Report DTR 2000-10, Rutgers University, 2000; ftp://dimacs.rutgers.edu/pub/dimacs/TechnicalReports/-TechReports/2000/2000-10.ps.gz
E. Boros, T. Ibaraki and K. Makino. Error-free and best-fit extensions of a partially defined Boolean function. Information and Computation 140 (1998) 254–283.
Article MATH MathSciNet Google Scholar
E. Boros, T. Ibaraki and K. Makino. Logical analysis of binary data with missing bits. Artificial Intelligence 107 (1999) 219–264.
Article MATH MathSciNet Google Scholar
W. Brauer and M Scherf. Feature selection by means of a feature weighting approach. Technical Report FKI-221-97, Institute für Informatik, Technische Universität München, 1997.
Google Scholar
N. Bshouty and L. Hellerstein. Attribute-efficient learning in query and mistakebound models. J. of Comput. Syst. Sci. 56 (1998) 310–319.
Article MATH MathSciNet Google Scholar
R. Caruana and D. Freitag. Greedy attribute selection. In: Machine Learning: Proceedings of the Eleventh International Conference, (Rutgers University, New Brunswick, NJ 1994), pp. 28–36.
Google Scholar
Y. Crama, P. L. Hammer and T. Ibaraki. Cause-effect relationships and partially defined Boolean functions. Annals of Operations Research 16 (1988) 299–326.
Article MathSciNet Google Scholar
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy. Advances in Knowledge Discovery and Data Mining, (AAAI Press/The MIT Press, 1996.)
Google Scholar
M.A. Hall and L.A. Smith. Practical feature subset selection for machine learning. In: Proceedings of the 21st Australasian Computer Science Conference (Springer Verlag, 1998) pp. 181–191.
Google Scholar
G. John, R. Kohavi and K. Pfleger. Irrelevant features and the subset selection problem. In: Machine Learning: Proceedings of the Eleventh International Conference, (Rutgers University, New Brunswick, NJ 1994), pp. 121–129.
Google Scholar
K. Kira and L. Rendell. The feature selection problem: Traditional methods and a new algorithm. In: Proceedings of the Tenth National Conference on Artificial Intelligence, Menlo Park, (AAAI Press/The MIT Press, 1992), pp. 129–134.
Google Scholar
N. Littlestone. Learning quickly when irreleveant attributes abound: a new linearthreshold algorithm. Machine Learning 2 (1988) 285–318.
Google Scholar
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, (1996) pp. 189–194.
Google Scholar
H. Mannila, H. Toivonen and A.I. Verkamo. Efficient algorithms for discovering association rules. In: AAAI Workshop on Knowledge Discovery in Database (U.M. Fayyad and R. Uthurusamy, eds.) (1994) pp. 181–192.
Google Scholar
Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Algorithms in Convex Programming. SIAM Studies in Applied Mathematics, 1994.
Google Scholar
J. R. Quinlan. Induction of decision trees. Machine Learning 1 (1986) 81–106.
Google Scholar
L. G. Valiant. A theory of the learnable. Communications of the ACM 27 (1984) 1134–1142.
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

RUTCOR, Rutgers University, 640 Bartholomew Road, 08854-8003, Piscataway, NJ, USA
Endre Boros
Graduate School of Information Science, Nara Institute of Science and Technology, 630-0101, Nara, Japan
Takashi Horiyama
Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, 606-8501, Kyoto, Japan
Toshihide Ibaraki & Mutsunori Yagiura
Department of Systems and Human Science, Graduate School of Engineering Science, Osaka University, 560-8531, Toyonaka, Osaka, Japan
Kazuhisa Makino

Authors

Endre Boros
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Horiyama
View author publications
You can also search for this author in PubMed Google Scholar
Toshihide Ibaraki
View author publications
You can also search for this author in PubMed Google Scholar
Kazuhisa Makino
View author publications
You can also search for this author in PubMed Google Scholar
Mutsunori Yagiura
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, Hong Kong
Kwong Sak Leung & Lai-Wan Chan &
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, Hong Kong
Helen Meng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boros, E., Horiyama, T., Ibaraki, T., Makino, K., Yagiura, M. (2000). Finding Essential Attributes in Binary Data. In: Leung, K.S., Chan, LW., Meng, H. (eds) Intelligent Data Engineering and Automated Learning — IDEAL 2000. Data Mining, Financial Engineering, and Intelligent Agents. IDEAL 2000. Lecture Notes in Computer Science, vol 1983. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44491-2_20

Download citation

DOI: https://doi.org/10.1007/3-540-44491-2_20
Published: 27 May 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41450-6
Online ISBN: 978-3-540-44491-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics