Abstract
With the much increased capability of data collection and storage in the past decade, data miners have to deal with much larger datasets in knowledge discovery tasks. Very large observations may cause traditional clustering methods to break down and not be able to cope with such large volumes of data. To enable data miners effectively detect the hierarchical cluster structure of a very large dataset, we introduce a visualization technique HOV3 to plot the dataset into clear and meaningful subsets by using its statistical summaries. Therefore, data miners can focus on investigating a relatively smaller-sized subset and its nested clusters. In such a way, data miners can explore clusters of any subset and its offspring subsets in a top-down fashion. As a consequence, HOV3 provides data miners an effective method on the exploration of clusters in a hierarchy by visualization.
This research has been supported in part by a Macquarie University Safety Net Grant.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Andritsos, P., Tsaparas, P., Miller, R.J., Sevcik, K.C.: Limbo: scalable clustering of categorical data. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K., Ferrari, E. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 123–146. Springer, Heidelberg (2004)
Berkhin, P.: A Survey of Clustering Data Mining Techniques. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–72. Springer, Heidelberg (2006)
Bishop, C.M., Tipping, M.E.: A Hierarchical Latent Variable Model for Data Visualization. IEEE Trans. on Pattern Analysis and Machine Intelligence 20(3), 281–299 (1998)
Chen, C.: A Brief History of Data Visualization. In: Hardle, W., Unwin, A. (eds.) Handbook of Computational Statistics: Data Visualization, vol. III, Springer, Heidelberg (2007)
Eisen, M.: Cluster and TreeView Manual (2007), http://rana.lbl.gov/manuals/ClusterTreeView.pdf
Guha, S., Rastogh, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: Proceedings of ACM SIGMOD Conference 1998, pp. 73–84. ACM Press, New York (1998)
Kandogan, E.: Visualizing multi-dimensional clusters, trends, and outliers using star coordinates. In: Proceedings of ACM SIGKDD Conference 2001, pp. 107–116. ACM Press, New York (2001)
Karypis, G., Han, E.-H.S., Kumar, V.: Chameleon: hierarchical clustering using dynamic modeling. IEEE Computer 32(8), 68–75 (1999)
http://www.bioinf.manchester.ac.uk/microarray/maxd/maxdView/overview.html
Sprenger, T.C., Brunella, R., Gross, M.H.: H-BLOB: A Hierarchical Visual Clustering Method Using Implicit Surfaces. In: Proceedings of the Conference on Visualization 2000, pp. 61–68. IEEE Computer Society Press, Los Alamitos (2000)
Swayne, D.F., Cook, D., Buja, A.: XGobi: Interactive dynamic data visualization in the X Window System. Journal of Computational and Graphical Statistics 7, 113–130 (1998)
Schonlau, M.: Visualizing non-hierarchical and hierarchical cluster analyses with clustergrams. Journal of Computational Statistics 19, 95–111 (2004)
Seo, J., Shneiderman, B.: Interactively Exploring Hierarchical Clustering Results. IEEE Computer 35, 80–86 (2002)
Shneiderman, B.: Inventing discovery tools: Combining information visualization with data mining. Information Visualization 1, 5–12 (2002)
Todd, C.S., Toth, T.M., Robert, B.-F.: GraphClus, a MATLAB program for cluster analysis using graph theory. Journal of Computers and Geosciences 35, 1205–1213 (2009)
Ward, M.O.: XmdvTool: Integrating Multiple Methods for Visualizing Multivariate Data. In: Proceedings of IEEE Conference on Visualization 1994, pp. 326–333. IEEE Computer Society, Los Alamitos (1994)
Xu, R., Wunsch, D.C.: Clustering. John Wiley and Sons, Inc., Publication, Chichester (2009)
Yang, L.: Visual Exploration of Large Relational Data Sets through 3D Projections and Footprint Splatting. IEEE Trans. on Knowledge and Data Engineering 15, 1460–1471 (2003)
Zhang, K.-B., Huang, M.L., Orgun, M.A., Nguyen, Q.V.: A Visual Method for High-dimensional Data Cluster Exploration. In: Leung, C.S., Lee, M., Chan, J.H. (eds.) ICONIP 2009. LNCS, vol. 5863, pp. 699–709. Springer, Heidelberg (2009)
Zhang, K.-B., Orgun, M.A., Zhang, K.: HOV3: An Approach to Visual Cluster Analysis. In: Li, X., Zaïane, O.R., Li, Z.-h. (eds.) ADMA 2006. LNCS (LNAI), vol. 4093, pp. 316–327. Springer, Heidelberg (2006)
Zhang, K.-B., Orgun, M.A., Zhang, K.: A Prediction-Based Visual Approach for Cluster Exploration and Cluster Validation by HOV3. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 336–349. Springer, Heidelberg (2007)
Zhang, T., Ramakrishana, R., Livny, M.: An Efficient Data Clustering Method for Very Large Database. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 103–114. ACM Press, New York (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Zhang, KB., Orgun, M.A., Busch, P.A., Nayak, A.C. (2010). A Top-Down Approach for Hierarchical Cluster Exploration by Visualization. In: Cao, L., Feng, Y., Zhong, J. (eds) Advanced Data Mining and Applications. ADMA 2010. Lecture Notes in Computer Science(), vol 6440. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17316-5_47
Download citation
DOI: https://doi.org/10.1007/978-3-642-17316-5_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17315-8
Online ISBN: 978-3-642-17316-5
eBook Packages: Computer ScienceComputer Science (R0)