Abstract
With the evolution of IT technologies, large-scale graph data have lately become a growing interest. As a result, there are a lot of research results in large-scale graph analysis on Hadoop. The graph analysis based on Hadoop provides parallel programming models with data partitioning and contains iterative phases of MapReduce jobs. Therefore, the effectiveness of data partitioning depends on how the data partitioning maintains data locality in each node of cluster. In this paper, we propose a semi-clustering scheme for large-scale graph analysis such as PageRank algorithm on Hadoop and show that the proposed scheme is effective. With experiment results, PageRank computation with the semi-clustering improves the performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hadoop, http://hadoop.apache.org/
Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I.: Pregel: a system for large-scale graph processing. In: SIGMOD 2010 (2010)
Shinnar, A., Cunningham, D., Herta, B., Saraswat, V.: M3R: Increased performance for in-memory Hadoop jobs. In: VLDB 2012 (2012)
Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: HaLoop: Efficient iterative data processing on large clusters. In: VLDB 2010 (2010)
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW 1998 (1998)
Avrachenkov, K., Dobrynin, V., Nemirovsky, D., Pham, S., Smirnova, E.: PageRank based clustering of hypertext document collections. In: SIGIR 2008 (2008)
White, S., Smyth, P.: Algorithms for estimating relative importance in networks. In: KDD 2003 (2003)
Ivn, G., Grolmusz, V.: When the web meets the cell: Using personalized PageRank for analyzing protein interaction networks. Bioinformatics Advance Access (December 2010)
Kleinberg, J.: Authoritative sources in a hyperlinked environment. JACM 46(5), 604–632 (1999)
Lee, H.C., Borodin, A.: Perturbation of the hyperlinked environment. In: Warnow, T.J., Zhu, B. (eds.) COCOON 2003. LNCS, vol. 2697, pp. 272–283. Springer, Heidelberg (2003)
Lin, J., Schatz, M.: Design pattern for efficient graph algorithms in MapReduce. In: MLG 2010 (2010)
Joycrawler, http://code.google.com/p/joycrawler/
Leskovec, J., Lang, K., Dasgupta, A., Mahoney, M.: Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics (2009)
Yang, J., Leskovec, J.: Defining and Evaluating Network Communities based on Ground-truth. In: ICDM (2012)
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD (2005)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hong, S., Shin, Y., Choi, D.H., Jo, H., Chang, Jw. (2014). A Semi-clustering Scheme for Large-Scale Graph Analysis on Hadoop. In: Park, J., Adeli, H., Park, N., Woungang, I. (eds) Mobile, Ubiquitous, and Intelligent Computing. Lecture Notes in Electrical Engineering, vol 274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40675-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-642-40675-1_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40674-4
Online ISBN: 978-3-642-40675-1
eBook Packages: EngineeringEngineering (R0)