Skip to main content
Log in

A novel clustered MongoDB-based storage system for unstructured data with high availability

  • Published:
Computing Aims and scope Submit manuscript

Abstract

More and more unstructured data are produced and consumed over network. How to maintain these data and improve the availability and scalability of the storage systems has become a considerable challenge. Although some NoSQL systems such as Dynamo, Cassandra, MongoDB have provided different advantages for unstructured data management, no one can provide flexible query functions like MongoDB, meanwhile guarantee the availability and scalability as Cassandra simultaneously. This paper presents a new high available distributed storage system called MyStore based on an optimized clustered MongoDB for unstructured data. Consistent hash is used to distribute data on multiple MongoDB nodes by applying virtual node method. NWR mode is applied to provide automatic backup operation and guarantee data consistency. And a gossip protocol is taken for exchanging information of failures in the system. Moreover, a user-friendly interface module and an efficient cache module are designed for improving the usability of the system. Based on above strategies, the system can realize high availability for unstructured data storage, while providing complex query functions like rational databases. Moreover, it is applied in a multi-discipline virtual experiment platform named VeePalms that has run practically. Experimental evaluation shows that the methodology is powerful enough not only to enhance the data availability, but also to improve the server’s scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Gartner Inc. http://www.gartner.com/

  2. Abadi J (2009) Data management in the Cloud: limitations and opportunities. IEEE Data Eng Bull 32(1):3–12

    Google Scholar 

  3. Lakshman A, Malik P (2009) Cassandra—a decentralized structured storage system. In: Proceedings of the 3rd ACM SIGOPS international workshop on large scale distributed systems and middleware (LADIS’09). ACM, New York, pp 35–40

  4. Stonebraker M (2010) SQL databases v. NoSQL databases. Commun ACM 53(4):10–11

    Article  Google Scholar 

  5. DeCandia G, Hastorun D, Jampani M (2007) Dynamo: Amazon’s highly available key-value store. In: Proceedings of twenty-first ACM SIGOPS symposium on operating systems principles. ACM, New York, pp 205–220

  6. Chang F, Deam J, Ghemawat S, Hsieh W, Wallach D, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):1–26

    Article  MATH  Google Scholar 

  7. Lakshman A, Malik P (2009) Cassandra: structured storage system on a p2p network. In: Proceedings of the 28th ACM symposium on Principles of distributed computing. ACM, New York, pp 5–5

  8. Banker K (2011) MongoDB in action. Manning Press, USA

    Google Scholar 

  9. Ford D, Labelle F, Popovici F (2010) Availability in globally distributed storage systems. In: Proceedings of USENIX conference on operating system design and imlementation (OSDI’10). USENIX, Berkeley, pp 1–7

  10. Pritchett D (2008) BASE: an acid alternative. ACM Queue 6(3):48–55

    Article  Google Scholar 

  11. Jim G (1981) The transaction concept: virtues and limitations. In: Proceedings of the 7th international conference on very large databases (VLDB). IEEE, New York, pp 144–154

  12. Vogels W (2009) Eventually consistent. Commun ACM 52(1):40–44

    Article  Google Scholar 

  13. Bermbach D, Klems M, Tai S (2011) MetaStorage: a federated cloud storage system to manage consistency-latency tradeoffs. In: Proceedings of 2011 IEEE international conference on cloudcomputing (CLOUD’11). IEEE, Los Alamitos, pp 452–459

  14. Ghandeharizadeh S, Goodney A, Sharma C (2009) Taming the storage dragon: the adventures of hoTMaN. In: Proceedings of the 35th international conference on management of data (SIGMOD’09). ACM, New York, pp 925–930

  15. Sun X, Zhou L, Zhuang L, Jiao W, Mei H (2009) An approach to constructing high-available decentralized systems via self-adaptive components. Int J Softw Eng Knowl Eng 19(4):553–571

    Article  Google Scholar 

  16. Garfinkel S (2007) An evaluation of Amazon’s grid computing services: EC2, S3 and SQS. Tech. Rep. TR-08-07, Harvard University, Cambridge

  17. Abadi D (2012) Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. Computer 45(2):37–42

    Google Scholar 

  18. Shah D (2008) Gossip algorithms. Found Trends Netw 3(1):1–125

    Article  MATH  Google Scholar 

  19. Yu Q (2010) Metadata integration architecture in enterprise data warehouse system. In: Proceedings of 2010 2nd international conference on information science and engineering (ICISE 2010). IEEE, Piscataway, pp 340–343

  20. Verma A, Llora X, Venkataraman S, Goldberg DE, Campbell RH (2010) Scaling eCGA model building via data-intensive computing. In: Proceedings of 2010 IEEE congress on evolutionary computation (CEC’10). IEEE, Piscataway, pp 1–8

  21. Fox A, Gribble SD, Chawathe Y, Brewer EA, Gauthier P (1997) Cluster-based scalable network services. ACM SIGOPS Oper Syst Rev 31(5):78–91

    Article  Google Scholar 

  22. Brewer EA (2000) Towards robust distributed systems. In: Proc. ACM symposium on princinples of distributed computing (PODC’00). ACM, New York, p 7

  23. Carstoiu B, Carstoiu D (2010) High performance eventually consistent distributed database Zatara. In: Proceedings of 2010 6th international conference on networked computing (INC’10) IEEE, Piscataway, pp 54–59

  24. Lee B, Jeong Y, Song H, Lee Y (2010) A scalable and highly available network management architecture on consistent hashing. In: Proceedings of 2010 IEEE global telecommunications conference (GLOBECOM’10), pp 1–6

  25. Petrovic J (2008) Using memcached for data distribution in industrial environment. In: Proceedings of 2008 3rd international conference on systems (ICONS’08). IEEE, Piscataway, pp 368–372

  26. Byers J, Considine J (2003) Simple load balancing for distributed hash tables. Comput Sci 2735(2003):80–87

    Google Scholar 

Download references

Acknowledgments

This paper is supported by the project of National Science & Technology Pillar Program of China under grant No. 2012BAH14F02, China National Natural Science Foundation under grant No. 61272408, 61322210, CCCPC Youngth Talent Plan, and Natural Science Foundation of Hubei under grant No. 2012FFA007.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaofei Liao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Jiang, W., Zhang, L., Liao, X. et al. A novel clustered MongoDB-based storage system for unstructured data with high availability. Computing 96, 455–478 (2014). https://doi.org/10.1007/s00607-013-0355-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-013-0355-8

Keywords

Mathematics Subject Classification

Navigation