Abstract
This paper introduces a new method for instances selection. The conceptual framework and the basic notions used by this method are those of an extended rough set theory, called α-rough set theory. In this context we formalize a notion of conflicting data, which is at the basis of a conflict normalization method used for instances selection. Extensive experiments are performed to show the efficiency and the accuracy of models built from the reduced datasets. The selection methodology and its results are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast Algorithms for Projected Clustering. SIGMod’99, 28(2) (1999) 61–72
Blake, C.L., Merz, C.J.: UCI Repository of machine learning databases. “http://www.ics.uci.edu/_mlearn/MLRepository.html” (1998)
Catlett, J.: MegaInduction: Machine Learning on Very Large Data-bases. Sydney, Australia, ICML (1988) 87–99
Dubes, R., Jain, A.K.: Clustering methodologies in Exploratory Data Analysis. Advances in Computers New York, Academic Press, 19 (1980)
Hart, H.: Structures of Influences and Cooperation-Conflict. International Interaction 1 (1974) 141–162
Hartingan, J. A.: Clustring Algorithms. John Willy & Sons Inc.; New York (1975)
John, G.H., Langley, P.: Static versus Dynamic Sampling for Data Mining. KDD’96, (1996) 367–370
Pawlak, Z.: On Conflicts. Int. J. of Man-Machine Studies, 21(2) (1984) 127–134
Pawlak, Z.: Rough sets: theoretical aspects of reasoning about data. Theory and decision library. Series D: System theory, knowledge engineering, and problem solving. London, Kluwer Academic, 9, (1991)
Quinlan, J.R.: C4.5: Programs for Machine Learning. Kaufmann M., California, (1993)
Reinartz, T.: Similarity-Driven Sampling for Data Mining. PKDD (1998) 423–431
Toivonen, H.: Sampling Large Databases for Finding Association Rules. VLDB (1996) 134–145
Van de Merckt, T.: Decision Trees in Numerical Attributes Spaces, IJCAI (1993)
Zaki, M.J., Parthasarathy, S., Li, W., Ogihara, M.: Evaluation of Samling for Data Mining of Association Rules, in Proceedings of the 7th Workshop on Research Issues in Data Engineering, Scheuermann, P. (eds.), Birmingham, England, (1997)
Zhang, T.: Data Clustering for Very Large Datasets Plus Applications. Technical Report, University of Wisconsin, Computer Sciences Department (1997) TR1355
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boussouf, M., Quafafou, M. (2000). Data Reduction via Conflicting Data Analysis. In: Raś, Z.W., Ohsuga, S. (eds) Foundations of Intelligent Systems. ISMIS 2000. Lecture Notes in Computer Science(), vol 1932. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39963-1_15
Download citation
DOI: https://doi.org/10.1007/3-540-39963-1_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41094-2
Online ISBN: 978-3-540-39963-6
eBook Packages: Computer ScienceComputer Science (R0)