A Semi-clustering Scheme for Large-Scale Graph Analysis on Hadoop

Hong, Seungtae; Shin, Youngsung; Choi, Dong Hoon; Jo, Heeseung; Chang, Jae-woo

doi:10.1007/978-3-642-40675-1_46

Seungtae Hong⁵,
Youngsung Shin⁵,
Dong Hoon Choi⁶,
Heeseung Jo⁵ &
…
Jae-woo Chang⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 274))

2709 Accesses

Abstract

With the evolution of IT technologies, large-scale graph data have lately become a growing interest. As a result, there are a lot of research results in large-scale graph analysis on Hadoop. The graph analysis based on Hadoop provides parallel programming models with data partitioning and contains iterative phases of MapReduce jobs. Therefore, the effectiveness of data partitioning depends on how the data partitioning maintains data locality in each node of cluster. In this paper, we propose a semi-clustering scheme for large-scale graph analysis such as PageRank algorithm on Hadoop and show that the proposed scheme is effective. With experiment results, PageRank computation with the semi-clustering improves the performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Hadoop, http://hadoop.apache.org/
Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I.: Pregel: a system for large-scale graph processing. In: SIGMOD 2010 (2010)
Google Scholar
Shinnar, A., Cunningham, D., Herta, B., Saraswat, V.: M3R: Increased performance for in-memory Hadoop jobs. In: VLDB 2012 (2012)
Google Scholar
Bu, Y., Howe, B., Balazinska, M., Ernst, M.D.: HaLoop: Efficient iterative data processing on large clusters. In: VLDB 2010 (2010)
Google Scholar
Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW 1998 (1998)
Google Scholar
Avrachenkov, K., Dobrynin, V., Nemirovsky, D., Pham, S., Smirnova, E.: PageRank based clustering of hypertext document collections. In: SIGIR 2008 (2008)
Google Scholar
White, S., Smyth, P.: Algorithms for estimating relative importance in networks. In: KDD 2003 (2003)
Google Scholar
Ivn, G., Grolmusz, V.: When the web meets the cell: Using personalized PageRank for analyzing protein interaction networks. Bioinformatics Advance Access (December 2010)
Google Scholar
Kleinberg, J.: Authoritative sources in a hyperlinked environment. JACM 46(5), 604–632 (1999)
Article MathSciNet MATH Google Scholar
Lee, H.C., Borodin, A.: Perturbation of the hyperlinked environment. In: Warnow, T.J., Zhu, B. (eds.) COCOON 2003. LNCS, vol. 2697, pp. 272–283. Springer, Heidelberg (2003)
Chapter Google Scholar
Lin, J., Schatz, M.: Design pattern for efficient graph algorithms in MapReduce. In: MLG 2010 (2010)
Google Scholar
Joycrawler, http://code.google.com/p/joycrawler/
Leskovec, J., Lang, K., Dasgupta, A., Mahoney, M.: Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters. Internet Mathematics (2009)
Google Scholar
Yang, J., Leskovec, J.: Defining and Evaluating Network Communities based on Ground-truth. In: ICDM (2012)
Google Scholar
Leskovec, J., Kleinberg, J., Faloutsos, C.: Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Engineering, Chonbuk National University, Jeonju, South Korea
Seungtae Hong, Youngsung Shin, Heeseung Jo & Jae-woo Chang
Korea Institute of Science and Technology Information (KISTI), Daejeon, South Korea
Dong Hoon Choi

Authors

Seungtae Hong
View author publications
You can also search for this author in PubMed Google Scholar
Youngsung Shin
View author publications
You can also search for this author in PubMed Google Scholar
Dong Hoon Choi
View author publications
You can also search for this author in PubMed Google Scholar
Heeseung Jo
View author publications
You can also search for this author in PubMed Google Scholar
Jae-woo Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Seungtae Hong .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Seoul University of Science & and Technology (SeoulTech), Seoul, Korea, Republic of (South Korea)
James J. (Jong Hyuk) Park
Biomedical Informatics Neuroscience, Ohio State University Center for Biomedical Engineering, Columbus, Ohio, USA
Hojjat Adeli
Dept of Computer Education, Jeju National University Teachers College, Jeju Special Self-Governing Province, Korea, Republic of (South Korea)
Namje Park
Ryerson University Dept. Computer Science, Toronto, Ontario, Canada
Isaac Woungang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hong, S., Shin, Y., Choi, D.H., Jo, H., Chang, Jw. (2014). A Semi-clustering Scheme for Large-Scale Graph Analysis on Hadoop. In: Park, J., Adeli, H., Park, N., Woungang, I. (eds) Mobile, Ubiquitous, and Intelligent Computing. Lecture Notes in Electrical Engineering, vol 274. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40675-1_46

Download citation

DOI: https://doi.org/10.1007/978-3-642-40675-1_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40674-4
Online ISBN: 978-3-642-40675-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics