Abstract
Given n Boolean input variables representing a set of attritubes, we consider Boolean functions f (i.e., binary classifications of tuples) that actually depend only on a small but unknown subset of these variables/attributes, in the following called relevant. The goal is to determine the relevant attributes given a sequence of examples – input vectors X and corresponding classifications f(X). We analyze two simple greedy strategies and prove that they are able to achieve this goal for various kinds of Boolean functions and various input distributions according to which the examples are drawn at random.
This generalizes results obtained by Akutsu, Miyano, and Kuhara for the uniform distribution. The analysis also provides explicit upper bounds on the number of necessary examples. They depend on the distribution and combinatorial properties of the function to be inferred.
Our second contribution is an extension of these results to the situation where attribute noise is present, i.e., a certain number of input bits x i may be wrong. This is a typical situation, e.g., in medical research or computational biology, where not all attributes can be measured reliably. We show that even in such an error-prone situation, reliable inference of the relevant attributes can be performed, because our greedy strategies are robust even against a linear number of errors.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Large Databases. In: Proc. 1993 ACM SIGMOD Conf., pp. 207–216 (1993)
Akutsu, T., Bao, F.: Approximating Minimum Keys and Optimal Substructure Screens. In: Cai, J.-Y., Wong, C.K. (eds.) COCOON 1996. LNCS, vol. 1090, pp. 290–299. Springer, Heidelberg (1996)
Akutsu, T., Miyano, S., Kuhara, S.: A Simple Greedy Algorithm for Finding Functional Relations: Efficient Implementation and Average Case Analysis. TCS 292(2), 481–495 (2003); Morishita, S., Arikawa, S. (eds.): DS 2000. LNCS (LNAI), vol. 1967, pp. 86–98. Springer, Heidelberg (2000)
Angluin, D.: Queries and Concept Learning. Machine Learning 2(4), 319–342 (1988)
Angluin, D., Laird, P.: Learning from noisy examples. Machine Learning 2(4), 343–370 (1988)
Arora, S., Babai, L., Stern, J., Sweedyk, Z.: The Hardness of Approximate Optima in Lattices, Codes, and Systems of Linear Equations. J. CSS 54, 317–331 (1997)
Arpe, J., Reischuk, R.: Robust Inference of Relevant Attributes. Techn. Report, SIIM-TR-A 03-12, Univ. Lübeck (2003), available at http://www.tcs.mu-luebeck.de/TechReports.html
Blum, A., Hellerstein, L., Littlestone, N.: Learning in the Presence of Finitely or Infinitely Many Irrelevant Attributes. In: Proc. 4th, pp. 157–166 (1991)
Blum, A., Langley, P.: Selection of Relevant Features and Examples in Machine Learning. Artificial Intelligence 97(1–2), 245–271 (1997)
Feige, U.: A Threshold of ln n for Approximating Set Cover. J. ACM 45, 634–652 (1998)
Goldman, S., Sloan, H.: Can PAC Learning Algorithms Tolerate Random Attribute Noise? Algorithmica 14, 70–84 (1995)
Johnson, D.: Approximation Algorithms for Combinatorial Problems. J. CSS 9, 256–278 (1974)
Littlestone, N.: Learning Quickly When Irrelevant Attributes Abound: A New Linear-threshold Algorithm. Machine Learning 4(2), 285–318 (1988)
Littlestone, N.: From On-line to Batch Learning. In: Proc. 2nd COLT 1989, pp. 269–284 (1989)
Mannila, H., Räihä, K.: On the Complexity of Inferring Functional Dependencies. Discrete Applied Mathematics 40, 237–243 (1992)
Mossel, E., O’Donnell, R., Servedio, R.: Learning Juntas. In: Proc. STOC 2003, pp. 206–212 (2003)
Valiant, L.: Projection Learning. Machine Learning 37(2), 115–130 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Arpe, J., Reischuk, R. (2003). Robust Inference of Relevant Attributes. In: Gavaldá, R., Jantke, K.P., Takimoto, E. (eds) Algorithmic Learning Theory. ALT 2003. Lecture Notes in Computer Science(), vol 2842. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39624-6_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-39624-6_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20291-2
Online ISBN: 978-3-540-39624-6
eBook Packages: Springer Book Archive