Abstract
Although people have proposed many kinds of NoSQL databases, also referred as Key-Value stores, there is still lack of an efficient solution for the problem of non-key attribute queries. In this paper, we propose BF-Matrix, a hierarchical index composed of bloom filter and B+ tree. Faced with the massive data and the large scale cluster, the layered solution could shorten the search path and make the best of scattered resources. Moreover, it is able to scale up and scale back according to the changes of data size and cluster scale, and isolate the job of update and retrieval in a limited scope. To eliminate the risk of false negative and to ensure our index “look like consistent”, two rules are given to specify the behavior of index update and data retrieval . Experimental results demonstrate that our solution not only outperforms the state of the art, but also is flexible enough to adapt to the cloud environment.
This work was supported by Natural Science Foundation of China (No.60973002 and No.61170003), the National High Technology Research and Development Program of China (Grant No. 2012AA011002), and MOE-CMCC Research Fund.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Gruber, R.E.: Bigtable: A distributed structured data storage system. In: Proc. of 7th OSDI, pp. 305–314 (2006)
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review 44(2), 35–40 (2010)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: Proc. of SOSP, vol. 7, pp. 205–220 (2007)
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. In: PVLDB, vol. 2(2), pp. 1626–1629 (2009)
Aguilera, M.K., Golab, W., Shah, M.A.: A practical scalable distributed b-tree. In: PVLDB, vol. 1(1), pp. 598–609 (2008)
Dittrich, J., Quian-Ruiz, J.A., Richter, S., Schuh, S., Jindal, A., Schad, J.: Only aggressive elephants are fast elephants. In: PVLDB, vol. 5(11), pp. 1591–1602 (2012)
Dittrich, J., Quian-Ruiz, J.A., Jindal, A., Kargin, Y., Setty, V., Schad, J.: Hadoop++: Making a yellow elephant run like a cheetah (without it even noticing). In: PVLDB, vol. 3(1-2), pp. 515–529 (2010)
Wu, S., Jiang, D., Ooi, B.C., Wu, K.L.: Efficient b-tree based indexing for cloud data processing. In: PVLDB, vol. 3(1-2), pp. 1207–1218 (2010)
Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C.: Indexing multi-dimensional data in a cloud system. In: Procs. of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 591–602. ACM, NY (2010)
Zhang, X., Ai, J., Wang, Z., Lu, J., Meng, X.: An efficient multi-dimensional index for cloud data management. In: Procs. of the CloudDB 2009, pp. 17–24. ACM, NY (2009)
Lu, P., Wu, S., Shou, L., Tan, K.L.: An efficient and compact indexing scheme for large-scale data store. In: 2013 IEEE 29th International Conference on Data Engineering (ICDE), pp. 326–337 (2013)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Communications of the ACM 13(7), 422–426 (1970)
Broder, A., Mitzenmacher, M.: Network applications of bloom filters: A survey. Internet Mathematics 1(4), 485–509 (2004)
Tarkoma, S., Rothenberg, C.E., Lagerspetz, E.: Theory and practice of bloom filters for distributed systems. IEEE Communications Surveys & Tutorials 14(1), 131–155 (2012)
Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Transactions on Networking (TON) 8(3), 281–293 (2000)
Almeida, P.S., Baquero, C., Preguica, N., Hutchison, D.: Scalable bloom filters. Information Processing Letters 101(6), 255–261 (2007)
Guo, D., Wu, J., Chen, H., Yuan, Y., Luo, X.: The dynamic bloom filters. IEEE Transactions on Knowledge and Data Engineering 22(1), 120–133 (2010)
Wang, T.J., Lin, Z.Y., Yang, B.S., et al.: MBA: A market-based approach to data allocation and dynamic migration for cloud database. Science China Information Sciences 55(9), 1935–1948 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Cheng, X., Li, H., Wang, Y., Wang, T., Yang, D. (2014). BF-Matrix: A Secondary Index for the Cloud Storage. In: Li, F., Li, G., Hwang, Sw., Yao, B., Zhang, Z. (eds) Web-Age Information Management. WAIM 2014. Lecture Notes in Computer Science, vol 8485. Springer, Cham. https://doi.org/10.1007/978-3-319-08010-9_40
Download citation
DOI: https://doi.org/10.1007/978-3-319-08010-9_40
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08009-3
Online ISBN: 978-3-319-08010-9
eBook Packages: Computer ScienceComputer Science (R0)