CloudVista: Visual Cluster Exploration for Extreme Scale Data in the Cloud

Chen, Keke; Xu, Huiqi; Tian, Fengguang; Guo, Shumin

doi:10.1007/978-3-642-22351-8_21

CloudVista: Visual Cluster Exploration for Extreme Scale Data in the Cloud

Keke Chen¹⁹,
Huiqi Xu¹⁹,
Fengguang Tian¹⁹ &
…
Shumin Guo¹⁹

Conference paper

1530 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6809))

Abstract

The problem of efficient and high-quality clustering of extreme scale datasets with complex clustering structures continues to be one of the most challenging data analysis problems. An innovate use of data cloud would provide unique opportunity to address this challenge. In this paper, we propose the CloudVista framework to address (1) the problems caused by using sampling in the existing approaches and (2) the problems with the latency caused by cloud-side processing on interactive cluster visualization. The CloudVista framework aims to explore the entire large data stored in the cloud with the help of the data structure visual frame and the previously developed VISTA visualization model. The latency of processing large data is addressed by the RandGen algorithm that generates a series of related visual frames in the cloud without user’s intervention, and a hierarchical exploration model supported by cloud-side subset processing. Experimental study shows this framework is effective and efficient for visually exploring clustering structures for extreme scale datasets stored in the cloud.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Patterson, D., Rabkin, A., Stoica, I., Zaharia, M.: Above the clouds: A berkeley view of cloud computing. Technical Report, University of Berkerley (2009)
Google Scholar
Bovey, J., Rodgers, P., Benoy, F.: Movement as an aid to understanding graphs. In: IEEE Conference on Information Visualization, pp. 472–478. IEEE, Los Alamitos (2003)
Google Scholar
Chen, K., Liu, L.: VISTA: Validating and refining clusters via visualization. Information Visualization 3(4), 257–270 (2004)
Article Google Scholar
Chen, K., Liu, L.: iVIBRATE: Interactive visualization based framework for clustering large datasets. ACM Transactions on Information Systems 24(2), 245–292 (2006)
Article Google Scholar
Cook, D., Buja, A., Cabrera, J., Hurley, C.: Grand tour and projection pursuit. Journal of Computational and Graphical Statistics 23, 155–172 (1995)
Google Scholar
Cox, T.F., Cox, M.A.A.: Multidimensional Scaling. Chapman&Hall/CRC, Boca Raton, FL (2001)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: USENIX Symposium on Operating Systems Design and Implementation (2004)
Google Scholar
M.J. (ed.) (1998)
Google Scholar
Faloutsos, C., Lin, K.-I.D.: FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of ACM SIGMOD Conference, pp. 163–174 (1995)
Google Scholar
Grochow, K., Howe, B., Barga, R., Lazowska, E.: Client + cloud: Seamless architectures for visual data analytics in the ocean sciences. In: Proceedings of International Conference on Scientific and Statistical Database Management, SSDBM (2010)
Google Scholar
Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: Proceedings of ACM SIGMOD Conference, pp. 73–84 (1998)
Google Scholar
Hinneburg, A., Keim, D.A., Wawryniuk, M.: Visual mining of high-dimensional data. In: IEEE Computer Graphics and Applications, pp. 1–8 (1999)
Google Scholar
Huber, P.J.: Projection pursuit. Annals of Statistics 13(2), 435–475 (1985)
Article MATH Google Scholar
Inselberg, A.: Multidimensional detective. In: IEEE Symposium on Information Visualization, pp. 100–107 (1997)
Google Scholar
Jain, A., Murty, M., Flynn, P.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)
Google Scholar
Kandogan, E.: Visualizing multi-dimensional clusters, trends, and outliers using star coordinates. In: Proceedings of ACM SIGKDD Conference, pp. 107–116 (2001)
Google Scholar
Kang, U., Tsourakakis, C.E., Faloutsos, C.: Pegasus: Mining peta-scale graphs. Knowledge and Information Systems, KAIS (2010)
Google Scholar
Lin, J., Dyer, C.: Data-intensive text processing with MapReduce. Morgan & Claypool Publishers, San Francisco (2010)
Google Scholar
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: Analysis and algorithm. In: Proceedings Of Neural Information Processing Systems NIPS (2001)
Google Scholar
Panda, B., Herbach, J.S., Basu, S., Bayardo, R.J.: Planet: Massively parall learning of tree ensembles with mapreduce. In: Proceedings of Very Large Databases Conference, VLDB (2009)
Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Saul, L.K., Weinberger, K.Q., Sha, F., Ham, J., Lee, D.D.: Spectral methods for dimensionality reduction. In: Semi-Supervised Learning. MIT Press, Cambridge (2006)
Google Scholar
Seo, J., Shneiderman, B.: Interactively exploring hierarchical clustering results. IEEE Computer 35(7), 80–86 (2002)
Article Google Scholar
Thusoo, A., Shao, Z., Anthony, S., Borthakur, D., Jain, N., Sen Sarma, J., Murthy, R., Liu, H.: Data warehousing and analytics infrastructure at facebook. In: Proceedings of ACM SIGMOD Conference, pp. 1013–1020. ACM, New York (2010)
Google Scholar
Vempala, S.S.: The Random Projection Method. American Mathematical Society (2005)
Google Scholar
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Sebastopol (2009)
Google Scholar
Yang, J., Ward, M.O., Rundensteiner, E.A.: Interactive hierarchical displays: a general framework for visualization and exploration of large multivariate datasets. Computers and Graphics Journal 27, 265–283 (2002)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Ohio Center of Excellence in Knowledge Enabled Computing Department of Computer Science and Engineering, Wright State University, Dayton, OH, 45435, USA
Keke Chen, Huiqi Xu, Fengguang Tian & Shumin Guo

Authors

Keke Chen
View author publications
You can also search for this author in PubMed Google Scholar
Huiqi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Fengguang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Shumin Guo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Evergreen State College, 98505, Olympia, WA, USA
Judith Bayard Cushing
CNRI and University of Virginia, 22908, Charlottesville, VA, USA
James French
Gonzaga University, 99258, Spokane, WA, USA
Shawn Bowers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, K., Xu, H., Tian, F., Guo, S. (2011). CloudVista: Visual Cluster Exploration for Extreme Scale Data in the Cloud. In: Bayard Cushing, J., French, J., Bowers, S. (eds) Scientific and Statistical Database Management. SSDBM 2011. Lecture Notes in Computer Science, vol 6809. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22351-8_21

Download citation

DOI: https://doi.org/10.1007/978-3-642-22351-8_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22350-1
Online ISBN: 978-3-642-22351-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics