mDHT: a multi-level-indexed DHT algorithm to extra-large-scale data retrieval on HDFS/Hadoop architecture

Tang, Yu; Fan, Aihua; Wang, Yingjie; Yao, Yuanzhe

doi:10.1007/s00779-014-0784-1

mDHT: a multi-level-indexed DHT algorithm to extra-large-scale data retrieval on HDFS/Hadoop architecture

Original Article
Published: 09 August 2014

Volume 18, pages 1835–1844, (2014)
Cite this article

Personal and Ubiquitous Computing Aims and scope Submit manuscript

Yu Tang¹,
Aihua Fan²,
Yingjie Wang¹ &
…
Yuanzhe Yao¹

1519 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Corresponding to the storing and fast searching needs of an extra-large scale of energy monitoring and statistics data, we propose a multi-level-indexed distributed hash table (mDHT) algorithm and complete a MapReduce implementation of the algorithm on the open-standard HDFS/Hbase platform. Such an approach uses a columnar storage structure for energy consumption data storage and creates a hashed index table to provide a quick search and retrieval method for extra-large-scale data processing systems. Such a hashed indexing scheme is implemented on a 3-node Hadoop cluster, and the simulation experiments at a scale up to 48 million data records indicate that, when the data volume reaches the scale of 12 million to 48 millions, the proposed mDHT algorithm presents an outstanding performance in data writing operation, compared to that of traditional SQL Server implementation. Even compared to the single-indexed DHT (sDHT) application, the mDHT solution outperforms by reducing the data retrieval time by 24.5–48.6 %. The multi-level-indexed DHT algorithm presented in this paper contributes a key technique to developing a fast search engine to the extra-large scale of data on the cloud storage architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big Data Analytics: Partitioned B+-Tree-Based Indexing in MapReduce

HyR-tree: a spatial index for hybrid flash/3D XPoint storage

Article 25 February 2021

SmallClient for big data: an indexing framework towards fast data retrieval

Article 20 December 2016

References

Chen R (2006) Life information frontier of era post genomics, High Technology and Enterprise, vol. 8, (in Chinese)
Research Report of IDC: http://storage.chinabyte.com/163/12110163.shtml
Ghemawat S, Gobioff H, Leung S-T (2003) The Google File System, Proceedings of 19th ACM Symposium on Operating Systems Principles, Lake George, NY, October
HDFS—Apache™ Hadoop: http://hadoop.apache.org/hdfs/
Apache™ Hadoop: http://hadoop.apache.org/
Beginner’s Tutorial for Hadoop Developer, www.hadooppor.com
Chen Y (2009) Design and implementation of an distributed query algorithm on telecommunication data on Hadoop platform, Master Degree Thesis, Beijing Jiaotung University, (in Chinese)
Zhou K, Wang H, Li C (2010) Cloud storage technology and its application, Zhongxing Telecommunication Technology (in Chinese) 16(4)
Berliner H (2003) The B* tree search algorithm: a best-first proof procedure, Computer Science Department, Carnegie-Mellon University, 4 March
Wang W, Wang X, Zhou A (2009) Hash-search: an efficient SLCA-based keyword search algorithm on XML documents, lecture notes in computer science
Byers J (2003) Jeffrey Considine. Simple Load Balancing for Distributed Hash Tables, Lecture Notes in Computer Science
Google Scholar
Litwin W, Neimat MA, Schneider DA (1996) LH*—a scalable distributed data structure. ACM Trans. Database Syst 21(4):280
Article Google Scholar
Grover LK (1997) Quantum computer can search arbitrarily large databases by a single query. Phys Rev Lett 79:4709
Article Google Scholar
Rowstron A, Druschel P (2001) Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems, lecture notes in computer science
Shard (database architecture): http://en.wikipedia.org/wiki/Shard_(database_architecture)
Burtica R, Mocanu EM, Andreica MI et al. (2012) Practical application and evaluation of no-SQL databases in Cloud Computing[C]//Systems Conference (SysCon), 2012 IEEE International. IEEE 1–6

Download references

Author information

Authors and Affiliations

University of Electronic Science and Technology of China, Chengdu, 611731, China
Yu Tang, Yingjie Wang & Yuanzhe Yao
Xi’an Polytechnic University, Xi’an, 710048, China
Aihua Fan

Authors

Yu Tang
View author publications
You can also search for this author in PubMed Google Scholar
Aihua Fan
View author publications
You can also search for this author in PubMed Google Scholar
Yingjie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuanzhe Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Tang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, Y., Fan, A., Wang, Y. et al. mDHT: a multi-level-indexed DHT algorithm to extra-large-scale data retrieval on HDFS/Hadoop architecture. Pers Ubiquit Comput 18, 1835–1844 (2014). https://doi.org/10.1007/s00779-014-0784-1

Download citation

Received: 26 February 2014
Accepted: 29 April 2014
Published: 09 August 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s00779-014-0784-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

mDHT: a multi-level-indexed DHT algorithm to extra-large-scale data retrieval on HDFS/Hadoop architecture

Abstract

Access this article

Similar content being viewed by others

Big Data Analytics: Partitioned B+-Tree-Based Indexing in MapReduce

HyR-tree: a spatial index for hybrid flash/3D XPoint storage

SmallClient for big data: an indexing framework towards fast data retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

mDHT: a multi-level-indexed DHT algorithm to extra-large-scale data retrieval on HDFS/Hadoop architecture

Abstract

Access this article

Similar content being viewed by others

Big Data Analytics: Partitioned B+-Tree-Based Indexing in MapReduce

HyR-tree: a spatial index for hybrid flash/3D XPoint storage

SmallClient for big data: an indexing framework towards fast data retrieval

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation