A Global Unsupervised Data Discretization Algorithm Based on Collective Correlation Coefficient

Zeng, An; Gao, Qi-Gang; Pan, Dan

doi:10.1007/978-3-642-21822-4_16

An Zeng^23,24,
Qi-Gang Gao²⁴ &
Dan Pan²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6703))

Included in the following conference series:

International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems

1073 Accesses
2 Citations

Abstract

Data discretization is an important task for certain types of data mining algorithms such as association rule discovery and Bayesian learning. For those algorithms, proper discretization not only can significantly improve the quality and understandability of discovered knowledge, but also can reduce the running time. We present a Global Unsupervised Discretization Algorithm based on Collective Correlation Coefficient (GUDA-CCC) that provides the following attractive merits. 1) It does not require class labels from training data. 2) It preserves the ranks of attribute importance in a data set and meanwhile minimizes the information loss measured by mean square error. The attribute importance is calibrated by the CCC derived from principal component analysis (PCA). The idea behind GUDA-CCC is that to stick closely to an original data set might be the best policy, especially when other available information is not reliable enough to be leveraged in the discretization. Experiments on benchmark data sets illustrate the effectiveness of the GUDA-CCC algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zeng, A., Pan, D., Zheng, Q.L., Peng, H.: Knowledge Acquisition based on Rough Set Theory and Principal Component Analysis. IEEE Intelligent Systems 21, 78–85 (2006)
Article Google Scholar
Lloyd, S.P.: Least Squares Quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–137 (1982)
Article MathSciNet MATH Google Scholar
Liu, H., Hussain, F., Tan, C., Dash, M.: Discretization: An Enabling Technique. Data Mining and Knowledge Discovery 6(4), 393–423 (2002)
Article MathSciNet Google Scholar
Kurgan, L.A., Cios, K.J.: CAIM Discretization Algorithm. IEEE Transactions on Knowledge and Data Engineering 16, 145–153 (2004)
Article Google Scholar
Tsai, C.J., Lee, C.I., Yang, W.P.: A Discretization Algorithm based on Class-attribute Contingency Coefficient. Information Sciences 178, 714–731 (2008)
Article Google Scholar
Yang, Y., Webb, G.I.: Discretization for Naïve-Bayes Learning: Managing Discretization Bias and Variance. Machine Learning 74, 39–74 (2009)
Article Google Scholar
Au, W.H., Chan, K.C.C., Wong, A.K.C.: A Fuzzy Approach to Partitioning Continuous Attributes for Classification. IEEE Transactions on Knowledge and Data Engineering 18, 715–719 (2006)
Article Google Scholar
Bondu, A., Boulle, M., Lemaire, V., Loiseru, S., Duval, B.: A Non-parametric Semi-supervised Discretization Method. In: Proceedings of 2008 Eighth International Conference on Data Mining, pp. 53–62 (2008)
Google Scholar
Mehta, S., Parthasarathy, S., Yang, H.: Toward Unsupervised Correlation Preserving Discretization. IEEE Transactions on Knowledge and Data Engineering 17, 1174–1185 (2005)
Article Google Scholar
Li, X.L., Shao, Z.J.: An Optimizing Method base on Autonomous Animals: Fish-Swarm Algorithm. Systems Engineering-Theory & Practice 11, 32–38 (2002) (in Chinese)
Google Scholar
Reynolds, C.W.: Flocks, Herds, and Schools: a Distributed Behavioral Model. Computer Graphics 21, 25–34 (1987)
Article Google Scholar
Fayyad, U.M., Irani, K.B.: On the Handling of Continuous-Valued Attributes in Decision Tree Generation. Machine Learning 8, 87–102 (1992)
MATH Google Scholar
Kononenko, I.: On Biases in Estimating Multi-Valued Attributes. In: Proceedings of 14th International Joint Conference on Artificial Intelligence, pp. 1034–1040 (1995)
Google Scholar
Weka, http://www.cs.waikato.ac.nz/ml/weka/

Download references

Author information

Authors and Affiliations

Guangdong University of Technology, China
An Zeng
Dalhousie University, Canada
An Zeng & Qi-Gang Gao
Saint Mary’s University, Canada
Dan Pan

Authors

An Zeng
View author publications
You can also search for this author in PubMed Google Scholar
Qi-Gang Gao
View author publications
You can also search for this author in PubMed Google Scholar
Dan Pan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer and Inforamtion Science, Center for Science and Technology, Syracuse University, 13244-4100, Syracuse, NY, USA
Kishan G. Mehrotra & Chilukuri K. Mohan &
Department of Electrical Engineering and Computer Science, Syracuse University, 13244, NY, USA
Jae C. Oh
Department of Electrical Engineering and Computer Science, Syracuse University, 13244, Syracuse, NY, USA
Pramod K. Varshney
Department of Computer Science, Texas State University San Marcos, 601 University Drive, 78666-4616, San Marcos, TX, USA
Moonis Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, A., Gao, QG., Pan, D. (2011). A Global Unsupervised Data Discretization Algorithm Based on Collective Correlation Coefficient. In: Mehrotra, K.G., Mohan, C.K., Oh, J.C., Varshney, P.K., Ali, M. (eds) Modern Approaches in Applied Intelligence. IEA/AIE 2011. Lecture Notes in Computer Science(), vol 6703. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21822-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-21822-4_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21821-7
Online ISBN: 978-3-642-21822-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics