Abstract
Discretization is an important preprocessing technique in data mining tasks. Univariate Discretization is the most commonly used method. It discretizes only one single attribute of a dataset at a time, without considering the interaction information with other attributes. Since it is multi-attribute rather than one single attribute determines the targeted class attribute, the result of Univariate Discretization is not optimal. In this paper, a new Multivariate Discretization algorithm is proposed. It uses ICA (Independent Component Analysis) to transform the original attributes into an independent attribute space, and then apply Univariate Discretization to each attribute in the new space. Data mining tasks can be conducted in the new discretized dataset with independent attributes. The numerical experiment results show that our method improves the discretization performance, especially for the nongaussian datasets, and it is competent compared to PCA-based multivariate method.
Supported by a SRG Grant (7001805) from the City University of Hong Kong.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Liu, H.: Discretization: An Enabling Technique. Data Mining and Knowledge Discovery 6, 393–423 (2002)
Mehta, S.: Toward Unsupervised Correlation Preserving Discretization. IEEE Transaction On Knowledge and Data Engineering 17(9), 1174–1185 (2005)
Dougherty, J., Kohavi, R., Sahami, M.: Supervised and unsupervised discretization of continuous features. In: Proceedings of the Twelfth International Conference on Machine Learning (1995)
Kerber, R.: Chimerge discretization of numeric attributes. In: Proceedings of the 10th International Conference on Artificial Intelligence (1991)
Zeta, K.M.H.O.: A Global Method for Discretization of Continuous Variables. In: The Third International Conference on Knowledge Discovery and Data Mining. (1997)
Liu, X., Wang, H.: A Discretization Algorithm Based on a Heterogeneity Criterion. IEEE Transactions on Knowledge and Data Engineering 17(9), 1166–1173 (2005)
Ferrandiz, S., Boullé, M.: Multivariate Discretization by Recursive Supervised Bipartition of Graph. In: Perner, P., Imiya, A. (eds.) MLDM 2005. LNCS, vol. 3587, pp. 253–264. Springer, Heidelberg (2005)
Bay, S.D.: Multivariate Discretization of Continuous Variables for Set Ming. Knowledge and Information Systems 3(4), 491–512 (2001)
Huang, Y., Luo, S.: Genetic Algorithm Applied to ICA Feature Selection. In: Proceedings of the International Joint Conference on Neural Networks (2003)
Bach, F.R., Jordan, M.I.: Kernel Independent Component Analysis. Journal of Machine Learning Research 3 (2002)
Hyvärinen, A.: Independent Component Analysis:Algorithms and Applications. Neural Networks 13, 411–430 (2000)
Comon, P.: Independent component analysis, A new concept? Signal Processing 36, 287–314 (1994)
Fayyad, U., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Proceeding of 13th International Joint Conference on Artificial Intelligence (1993)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kang, Y., Wang, S., Liu, X., Lai, H., Wang, H., Miao, B. (2006). An ICA-Based Multivariate Discretization Algorithm. In: Lang, J., Lin, F., Wang, J. (eds) Knowledge Science, Engineering and Management. KSEM 2006. Lecture Notes in Computer Science(), vol 4092. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11811220_47
Download citation
DOI: https://doi.org/10.1007/11811220_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-37033-8
Online ISBN: 978-3-540-37035-2
eBook Packages: Computer ScienceComputer Science (R0)