Skip to main content

Forest of Distributed B+Tree Based on Key-Value Store for Big-Set Problem

  • Conference paper
  • First Online:
Book cover Database Systems for Advanced Applications (DASFAA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9645))

Included in the following conference series:

Abstract

In many big-data systems, the amount of data is growing rapidly. Many systems have to store big-sets: the sets with a large number of items. Efficiently storing a large number of big-sets to support high rate updating and querying is a challenging problem in data storage systems. Nowadays, distributed key-value stores play important roles in building large-scale systems with many advantages. They support horizontal scalability, low-latency, high throughput when manipulating small or medium key-value pairs. Unfortunately, when working with big-set data structure, they do not work well and most of them are not scalable with a large number of big sets. In this research, we analyze the difficulty in storing big-sets using key-value stores. An architecture called “Forest of distributed \(B^{+}Tree\) and algorithms are proposed to build NoSql data store for storing big data structures such as set, dictionary. The big-sets are split into multiple small sets of limited size and stored in key-value stores. A Multi-level meta-data is also proposed and used to reduce the complexity in writing operations of big-sets when using key-value stores from O(N) to O(log(N)). This research can store larger number of items in a set than Cassandra and Google BigTable. Parts of big set in this research is distributed while a row in Google BigTable only has a limited size and must be fit in a server. Experiment results show that proposed system has better read performance than Cassandra. The proposed architecture may potentially be used in various applications such as storage system for data from sensors in the Internet of Things (IoT) systems, commercial transaction storages and social networks.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.vng.com.vn/en/.

  2. 2.

    https://cloud.google.com/bigtable/docs/schema-design.

  3. 3.

    https://www.facebook.com.

References

  1. Aguilera, M.K., Golab, W., Shah, M.A.: A practical scalable distributed B-tree. Proc. VLDB Endowment 1(1), 598–609 (2008)

    Article  Google Scholar 

  2. Burrows, M.: The chubby lock service for loosely-coupled distributed systems. In: Proceedings of the 7th Symposium on Operating Systems Design and Implementation, pp. 335–350. USENIX Association (2006)

    Google Scholar 

  3. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. ACM Trans. Comput. Syst. (TOCS) 26(2), 4 (2008)

    Article  Google Scholar 

  4. Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp. 143–154. ACM (2010)

    Google Scholar 

  5. Ghemawat, S., Gobioff, H., Leung, S.-T.: The Google file system. ACM SIGOPS Oper. Syst. Rev. 37, 29–43 (2003). ACM

    Article  Google Scholar 

  6. Hunt, P., Konar, M., Junqueira, F.P., Reed, B.: ZooKeeper: wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, vol. 8, p. 11 (2010)

    Google Scholar 

  7. Google Inc.: LevelDB - A fast and lightweight key/value database library by Google (2013). http://code.google.com/p/leveldb. Accessed on 23 July 2013

  8. FAL Labs: Kyoto Cabinet: a straightforward implementation of DBM (2013). http://fallabs.com/kyotocabinet. Accessed on 1 May 2013

  9. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010)

    Article  Google Scholar 

  10. Lim, H., Fan, B., Andersen, D.G., Kaminsky, M.: SILT: a memory-efficient, high-performance key-value store. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 1–13. ACM (2011)

    Google Scholar 

  11. Litwin, W., Neimat, M.-A., Schneider, D.: RP*: a family of order preserving scalable distributed data structures. VLDB 94, 12–15 (1994)

    Google Scholar 

  12. Mao, Y., Kohler, E., Morris, R.T.: Cache craftiness for fast multicore key-value storage. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp. 183–196. ACM (2012)

    Google Scholar 

  13. Megiddo, N., Modha, D.S.: ARC: a self-tuning, low overhead replacement cache. In: FAST, vol. 3, pp. 115–130 (2003)

    Google Scholar 

  14. Megiddo, N., Modha, D.S.: Outperforming LRU with an adaptive replacement cache algorithm. Computer 37(4), 58–65 (2004)

    Article  Google Scholar 

  15. Nguyen, T., Nguyen, M.: Zing Database: high-performance key-value store for large-scale storage service. Vietnam J. Comput. Sci. 2(1), 13–23 (2015)

    Article  Google Scholar 

  16. Nguyen, T.T., Nguyen, A.T., Nguyen, T.A.H., Vu, L.T., Nguyen, Q.U., Hai, L.D.: Unsupervised anomaly detection in online game. In: Proceedings of the Sixth International Symposium on Information and Communication Technology, SoICT 2015, pp. 4–10. ACM, New York (2015)

    Google Scholar 

  17. O’neil, E.J., O’neil, P.E., Weikum, G.: The LRU-K page replacement algorithm for database disk buffering. ACM SIGMOD Rec. 22(2), 297–306 (1993)

    Article  Google Scholar 

  18. O’neil, E.J., O’Neil, P.E., Weikum, G.: An optimality proof of the LRU-K page replacement algorithm. J. ACM (JACM) 46(1), 92–112 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  19. Oracle: Oracle Berkeley DB 12c: Persistent key value store (2013). http://www.oracle.com/technetwork/products/berkeleydb

  20. Sanfilippo, S., Noordhuis, P.: Redis. http://redis.io. Accessed on 07 June 2013

  21. Sowell, B., Golab, W., Shah, M.A.: Minuet: a scalable distributed multiversion B-tree. Proc. VLDB Endowment 5(9), 884–895 (2012)

    Article  Google Scholar 

  22. Zhang, K., Wang, K., Yuan, Y., Guo, L., Lee, R., Zhang, X.: Mega-KV: a case for GPUs to maximize the throughput of in-memory key-value stores. Proc. VLDB Endowment 8(11), 1226–1237 (2015)

    Article  Google Scholar 

Download references

Acknowledgment

This research is funded by Research and Development Department of VNG.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thanh Trung Nguyen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Nguyen, T.T., Nguyen, M.H. (2016). Forest of Distributed B+Tree Based on Key-Value Store for Big-Set Problem. In: Gao, H., Kim, J., Sakurai, Y. (eds) Database Systems for Advanced Applications. DASFAA 2016. Lecture Notes in Computer Science(), vol 9645. Springer, Cham. https://doi.org/10.1007/978-3-319-32055-7_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-32055-7_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-32054-0

  • Online ISBN: 978-3-319-32055-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics