Abstract
More and more unstructured data are produced and consumed over network. How to maintain these data and improve the availability and scalability of the storage systems has become a considerable challenge. Although some NoSQL systems such as Dynamo, Cassandra, MongoDB have provided different advantages for unstructured data management, no one can provide flexible query functions like MongoDB, meanwhile guarantee the availability and scalability as Cassandra simultaneously. This paper presents a new high available distributed storage system called MyStore based on an optimized clustered MongoDB for unstructured data. Consistent hash is used to distribute data on multiple MongoDB nodes by applying virtual node method. NWR mode is applied to provide automatic backup operation and guarantee data consistency. And a gossip protocol is taken for exchanging information of failures in the system. Moreover, a user-friendly interface module and an efficient cache module are designed for improving the usability of the system. Based on above strategies, the system can realize high availability for unstructured data storage, while providing complex query functions like rational databases. Moreover, it is applied in a multi-discipline virtual experiment platform named VeePalms that has run practically. Experimental evaluation shows that the methodology is powerful enough not only to enhance the data availability, but also to improve the server’s scalability.
Similar content being viewed by others
References
Gartner Inc. http://www.gartner.com/
Abadi J (2009) Data management in the Cloud: limitations and opportunities. IEEE Data Eng Bull 32(1):3–12
Lakshman A, Malik P (2009) Cassandra—a decentralized structured storage system. In: Proceedings of the 3rd ACM SIGOPS international workshop on large scale distributed systems and middleware (LADIS’09). ACM, New York, pp 35–40
Stonebraker M (2010) SQL databases v. NoSQL databases. Commun ACM 53(4):10–11
DeCandia G, Hastorun D, Jampani M (2007) Dynamo: Amazon’s highly available key-value store. In: Proceedings of twenty-first ACM SIGOPS symposium on operating systems principles. ACM, New York, pp 205–220
Chang F, Deam J, Ghemawat S, Hsieh W, Wallach D, Burrows M, Chandra T, Fikes A, Gruber RE (2008) Bigtable: a distributed storage system for structured data. ACM Trans Comput Syst 26(2):1–26
Lakshman A, Malik P (2009) Cassandra: structured storage system on a p2p network. In: Proceedings of the 28th ACM symposium on Principles of distributed computing. ACM, New York, pp 5–5
Banker K (2011) MongoDB in action. Manning Press, USA
Ford D, Labelle F, Popovici F (2010) Availability in globally distributed storage systems. In: Proceedings of USENIX conference on operating system design and imlementation (OSDI’10). USENIX, Berkeley, pp 1–7
Pritchett D (2008) BASE: an acid alternative. ACM Queue 6(3):48–55
Jim G (1981) The transaction concept: virtues and limitations. In: Proceedings of the 7th international conference on very large databases (VLDB). IEEE, New York, pp 144–154
Vogels W (2009) Eventually consistent. Commun ACM 52(1):40–44
Bermbach D, Klems M, Tai S (2011) MetaStorage: a federated cloud storage system to manage consistency-latency tradeoffs. In: Proceedings of 2011 IEEE international conference on cloudcomputing (CLOUD’11). IEEE, Los Alamitos, pp 452–459
Ghandeharizadeh S, Goodney A, Sharma C (2009) Taming the storage dragon: the adventures of hoTMaN. In: Proceedings of the 35th international conference on management of data (SIGMOD’09). ACM, New York, pp 925–930
Sun X, Zhou L, Zhuang L, Jiao W, Mei H (2009) An approach to constructing high-available decentralized systems via self-adaptive components. Int J Softw Eng Knowl Eng 19(4):553–571
Garfinkel S (2007) An evaluation of Amazon’s grid computing services: EC2, S3 and SQS. Tech. Rep. TR-08-07, Harvard University, Cambridge
Abadi D (2012) Consistency tradeoffs in modern distributed database system design: CAP is only part of the story. Computer 45(2):37–42
Shah D (2008) Gossip algorithms. Found Trends Netw 3(1):1–125
Yu Q (2010) Metadata integration architecture in enterprise data warehouse system. In: Proceedings of 2010 2nd international conference on information science and engineering (ICISE 2010). IEEE, Piscataway, pp 340–343
Verma A, Llora X, Venkataraman S, Goldberg DE, Campbell RH (2010) Scaling eCGA model building via data-intensive computing. In: Proceedings of 2010 IEEE congress on evolutionary computation (CEC’10). IEEE, Piscataway, pp 1–8
Fox A, Gribble SD, Chawathe Y, Brewer EA, Gauthier P (1997) Cluster-based scalable network services. ACM SIGOPS Oper Syst Rev 31(5):78–91
Brewer EA (2000) Towards robust distributed systems. In: Proc. ACM symposium on princinples of distributed computing (PODC’00). ACM, New York, p 7
Carstoiu B, Carstoiu D (2010) High performance eventually consistent distributed database Zatara. In: Proceedings of 2010 6th international conference on networked computing (INC’10) IEEE, Piscataway, pp 54–59
Lee B, Jeong Y, Song H, Lee Y (2010) A scalable and highly available network management architecture on consistent hashing. In: Proceedings of 2010 IEEE global telecommunications conference (GLOBECOM’10), pp 1–6
Petrovic J (2008) Using memcached for data distribution in industrial environment. In: Proceedings of 2008 3rd international conference on systems (ICONS’08). IEEE, Piscataway, pp 368–372
Byers J, Considine J (2003) Simple load balancing for distributed hash tables. Comput Sci 2735(2003):80–87
Acknowledgments
This paper is supported by the project of National Science & Technology Pillar Program of China under grant No. 2012BAH14F02, China National Natural Science Foundation under grant No. 61272408, 61322210, CCCPC Youngth Talent Plan, and Natural Science Foundation of Hubei under grant No. 2012FFA007.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Jiang, W., Zhang, L., Liao, X. et al. A novel clustered MongoDB-based storage system for unstructured data with high availability. Computing 96, 455–478 (2014). https://doi.org/10.1007/s00607-013-0355-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-013-0355-8