A novel clustered MongoDB-based storage system for unstructured data with high availability

Jiang, Wenbin; Zhang, Lei; Liao, Xiaofei; Jin, Hai; Peng, Yaqiong

doi:10.1007/s00607-013-0355-8

A novel clustered MongoDB-based storage system for unstructured data with high availability

Published: 14 November 2013

Volume 96, pages 455–478, (2014)
Cite this article

Computing Aims and scope Submit manuscript

Wenbin Jiang¹,
Lei Zhang¹,
Xiaofei Liao¹,
Hai Jin¹ &
…
Yaqiong Peng¹

1142 Accesses
9 Citations
1 Altmetric
Explore all metrics

Abstract

More and more unstructured data are produced and consumed over network. How to maintain these data and improve the availability and scalability of the storage systems has become a considerable challenge. Although some NoSQL systems such as Dynamo, Cassandra, MongoDB have provided different advantages for unstructured data management, no one can provide flexible query functions like MongoDB, meanwhile guarantee the availability and scalability as Cassandra simultaneously. This paper presents a new high available distributed storage system called MyStore based on an optimized clustered MongoDB for unstructured data. Consistent hash is used to distribute data on multiple MongoDB nodes by applying virtual node method. NWR mode is applied to provide automatic backup operation and guarantee data consistency. And a gossip protocol is taken for exchanging information of failures in the system. Moreover, a user-friendly interface module and an efficient cache module are designed for improving the usability of the system. Based on above strategies, the system can realize high availability for unstructured data storage, while providing complex query functions like rational databases. Moreover, it is applied in a multi-discipline virtual experiment platform named VeePalms that has run practically. Experimental evaluation shows that the methodology is powerful enough not only to enhance the data availability, but also to improve the server’s scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Article Open access 05 June 2020

Multi-model query languages: taming the variety of big data

Article Open access 31 May 2023

References

Gartner Inc. http://www.gartner.com/
Abadi J (2009) Data management in the Cloud: limitations and opportunities. IEEE Data Eng Bull 32(1):3–12
Google Scholar
Lakshman A, Malik P (2009) Cassandra—a decentralized structured storage system. In: Proceedings of the 3rd ACM SIGOPS international workshop on large scale distributed systems and middleware (LADIS’09). ACM, New York, pp 35–40
Stonebraker M (2010) SQL databases v. NoSQL databases. Commun ACM 53(4):10–11
Article Google Scholar
DeCandia G, Hastorun D, Jampani M (2007) Dynamo: Amazon’s highly available key-value store. In: Proceedings of twenty-first ACM SIGOPS symposium on operating systems principles. ACM, New York, pp 205–220
Chang F, Deam J, Ghemawat S, Hsieh W, Wallach D, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):1–26
Article MATH Google Scholar
Lakshman A, Malik P (2009) Cassandra: structured storage system on a p2p network. In: Proceedings of the 28th ACM symposium on Principles of distributed computing. ACM, New York, pp 5–5
Banker K (2011) MongoDB in action. Manning Press, USA
Google Scholar
Ford D, Labelle F, Popovici F (2010) Availability in globally distributed storage systems. In: Proceedings of USENIX conference on operating system design and imlementation (OSDI’10). USENIX, Berkeley, pp 1–7
Pritchett D (2008) BASE: an acid alternative. ACM Queue 6(3):48–55
Article Google Scholar
Jim G (1981) The transaction concept: virtues and limitations. In: Proceedings of the 7th international conference on very large databases (VLDB). IEEE, New York, pp 144–154
Vogels W (2009) Eventually consistent. Commun ACM 52(1):40–44
Article Google Scholar
Bermbach D, Klems M, Tai S (2011) MetaStorage: a federated cloud storage system to manage consistency-latency tradeoffs. In: Proceedings of 2011 IEEE international conference on cloudcomputing (CLOUD’11). IEEE, Los Alamitos, pp 452–459
Ghandeharizadeh S, Goodney A, Sharma C (2009) Taming the storage dragon: the adventures of hoTMaN. In: Proceedings of the 35th international conference on management of data (SIGMOD’09). ACM, New York, pp 925–930
Sun X, Zhou L, Zhuang L, Jiao W, Mei H (2009) An approach to constructing high-available decentralized systems via self-adaptive components. Int J Softw Eng Knowl Eng 19(4):553–571
Article Google Scholar
Garfinkel S (2007) An evaluation of Amazon’s grid computing services: EC2, S3 and SQS. Tech. Rep. TR-08-07, Harvard University, Cambridge
Abadi D (2012) Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. Computer 45(2):37–42
Google Scholar
Shah D (2008) Gossip algorithms. Found Trends Netw 3(1):1–125
Article MATH Google Scholar
Yu Q (2010) Metadata integration architecture in enterprise data warehouse system. In: Proceedings of 2010 2nd international conference on information science and engineering (ICISE 2010). IEEE, Piscataway, pp 340–343
Verma A, Llora X, Venkataraman S, Goldberg DE, Campbell RH (2010) Scaling eCGA model building via data-intensive computing. In: Proceedings of 2010 IEEE congress on evolutionary computation (CEC’10). IEEE, Piscataway, pp 1–8
Fox A, Gribble SD, Chawathe Y, Brewer EA, Gauthier P (1997) Cluster-based scalable network services. ACM SIGOPS Oper Syst Rev 31(5):78–91
Article Google Scholar
Brewer EA (2000) Towards robust distributed systems. In: Proc. ACM symposium on princinples of distributed computing (PODC’00). ACM, New York, p 7
Carstoiu B, Carstoiu D (2010) High performance eventually consistent distributed database Zatara. In: Proceedings of 2010 6th international conference on networked computing (INC’10) IEEE, Piscataway, pp 54–59
Lee B, Jeong Y, Song H, Lee Y (2010) A scalable and highly available network management architecture on consistent hashing. In: Proceedings of 2010 IEEE global telecommunications conference (GLOBECOM’10), pp 1–6
Petrovic J (2008) Using memcached for data distribution in industrial environment. In: Proceedings of 2008 3rd international conference on systems (ICONS’08). IEEE, Piscataway, pp 368–372
Byers J, Considine J (2003) Simple load balancing for distributed hash tables. Comput Sci 2735(2003):80–87
Google Scholar

Download references

Acknowledgments

This paper is supported by the project of National Science & Technology Pillar Program of China under grant No. 2012BAH14F02, China National Natural Science Foundation under grant No. 61272408, 61322210, CCCPC Youngth Talent Plan, and Natural Science Foundation of Hubei under grant No. 2012FFA007.

Author information

Authors and Affiliations

Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Wenbin Jiang, Lei Zhang, Xiaofei Liao, Hai Jin & Yaqiong Peng

Authors

Wenbin Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofei Liao
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Yaqiong Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaofei Liao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, W., Zhang, L., Liao, X. et al. A novel clustered MongoDB-based storage system for unstructured data with high availability. Computing 96, 455–478 (2014). https://doi.org/10.1007/s00607-013-0355-8

Download citation

Received: 31 January 2013
Accepted: 26 September 2013
Published: 14 November 2013
Issue Date: June 2014
DOI: https://doi.org/10.1007/s00607-013-0355-8

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel clustered MongoDB-based storage system for unstructured data with high availability

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Multi-model query languages: taming the variety of big data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

A novel clustered MongoDB-based storage system for unstructured data with high availability

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

MongoDB Vs PostgreSQL: A comparative study on performance aspects

Multi-model query languages: taming the variety of big data

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation