Skip to main content
Log in

mDHT: a multi-level-indexed DHT algorithm to extra-large-scale data retrieval on HDFS/Hadoop architecture

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

Corresponding to the storing and fast searching needs of an extra-large scale of energy monitoring and statistics data, we propose a multi-level-indexed distributed hash table (mDHT) algorithm and complete a MapReduce implementation of the algorithm on the open-standard HDFS/Hbase platform. Such an approach uses a columnar storage structure for energy consumption data storage and creates a hashed index table to provide a quick search and retrieval method for extra-large-scale data processing systems. Such a hashed indexing scheme is implemented on a 3-node Hadoop cluster, and the simulation experiments at a scale up to 48 million data records indicate that, when the data volume reaches the scale of 12 million to 48 millions, the proposed mDHT algorithm presents an outstanding performance in data writing operation, compared to that of traditional SQL Server implementation. Even compared to the single-indexed DHT (sDHT) application, the mDHT solution outperforms by reducing the data retrieval time by 24.5–48.6 %. The multi-level-indexed DHT algorithm presented in this paper contributes a key technique to developing a fast search engine to the extra-large scale of data on the cloud storage architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Chen R (2006) Life information frontier of era post genomics, High Technology and Enterprise, vol. 8, (in Chinese)

  2. Research Report of IDC: http://storage.chinabyte.com/163/12110163.shtml

  3. Ghemawat S, Gobioff H, Leung S-T (2003) The Google File System, Proceedings of 19th ACM Symposium on Operating Systems Principles, Lake George, NY, October

  4. HDFS—Apache™ Hadoop: http://hadoop.apache.org/hdfs/

  5. Apache™ Hadoop: http://hadoop.apache.org/

  6. Beginner’s Tutorial for Hadoop Developer, www.hadooppor.com

  7. Chen Y (2009) Design and implementation of an distributed query algorithm on telecommunication data on Hadoop platform, Master Degree Thesis, Beijing Jiaotung University, (in Chinese)

  8. Zhou K, Wang H, Li C (2010) Cloud storage technology and its application, Zhongxing Telecommunication Technology (in Chinese) 16(4)

  9. Berliner H (2003) The B* tree search algorithm: a best-first proof procedure, Computer Science Department, Carnegie-Mellon University, 4 March

  10. Wang W, Wang X, Zhou A (2009) Hash-search: an efficient SLCA-based keyword search algorithm on XML documents, lecture notes in computer science

  11. Byers J (2003) Jeffrey Considine. Simple Load Balancing for Distributed Hash Tables, Lecture Notes in Computer Science

    Google Scholar 

  12. Litwin W, Neimat MA, Schneider DA (1996) LH*—a scalable distributed data structure. ACM Trans. Database Syst 21(4):280

    Article  Google Scholar 

  13. Grover LK (1997) Quantum computer can search arbitrarily large databases by a single query. Phys Rev Lett 79:4709

    Article  Google Scholar 

  14. Rowstron A, Druschel P (2001) Pastry: scalable, decentralized object location, and routing for large-scale peer-to-peer systems, lecture notes in computer science

  15. Shard (database architecture): http://en.wikipedia.org/wiki/Shard_(database_architecture)

  16. Burtica R, Mocanu EM, Andreica MI et al. (2012) Practical application and evaluation of no-SQL databases in Cloud Computing[C]//Systems Conference (SysCon), 2012 IEEE International. IEEE 1–6

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Tang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, Y., Fan, A., Wang, Y. et al. mDHT: a multi-level-indexed DHT algorithm to extra-large-scale data retrieval on HDFS/Hadoop architecture. Pers Ubiquit Comput 18, 1835–1844 (2014). https://doi.org/10.1007/s00779-014-0784-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-014-0784-1

Keywords

Navigation