ABSTRACT
Running MapReduce computation in public cloud raises a series of security challenges since the service providers may not be properly protected. Due to the fact that the MapReduce applications are long-running, which increases the chance of an attacker to massively perform malicious attacks by exploiting the workers vulnerability, many workers may be compromised. Those workers could misbehave and thereby tamper the results integrity of all computations assigned to them. To tackle this challenge, this paper proposes an effective Result Verification Mechanism (RVM) using a reputation threshold-based voting method to ensure the result integrity of MapReduce on the map and reduce phases. Therefore, render the MapReduce computation accurate. Another major contribution of this paper is that we implement RVM based on Apache Hadoop and perform a series of experiments. The evaluation study of the experimental results demonstrate that RVM can significantly reduce computation overhead and guarantee a low error rate as compared to the simple voting method like m-first voting.
- J. Ekanayake, S. Pallickara, and G. Fox. 2008. MapReduce for data intensive scientific analyses. In Proceedings of the 4th IEEE International Conference on eScience. Indianapolis, IN USA, 277--284. Google ScholarDigital Library
- J. Dean and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation. USENIX Association, 137--149. Google ScholarDigital Library
- David A. Anisi. 2003. Optimal Motion Control of a Ground Vehicle. Master's thesis. Royal Institute of Technology (KTH), Stockholm, Sweden. J. Zhao and J. Pjesivac-Grbovic. 2009. MapReduce: MapReduce: The programming model and practice. SIGMETRICS (Google). June 2009.Google Scholar
- B. Langmead, M. Schatz, J. Lin, M. Pop, and S. Salzberg. 2009. Searching for SNPs with cloud computing. Genome Biology, Vol. 10, No. 11, November 2009.Google ScholarCross Ref
- M. C. Schatz. 2009. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics, Vol. 25, No. 11, 1363--1369. Google ScholarDigital Library
- A. W. Services. Aws case study: Washington post. https://aws.amazon.com/solutions/case-studies, (site visited January 2017).Google Scholar
- M. Moca, G. C. Silaghi, and G. Fedak. 2011. Distributed Results Checking for MapReduce in Volunteer Computing. In IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, Shanghai, China, 1847--1854. Google ScholarDigital Library
- K. Watanabe, M. Fukushi, and S. Horiguchi. 2009. Collusion-Resistant Sabotage-Tolerance mechanisms for volunteer computing systems. IEEE International Conference on e-Business Engineering, Macau, 213--218. Google ScholarDigital Library
- S. Zhao, V. Lo, and C. Gauthier Dickey. 2005. Result verification and trust-based scheduling in peer-to-peer grids. In the 5th IEEE International Conference on Peer-to-Peer Computing. IEEE Computer Society, Washington, 31--38. Google ScholarDigital Library
- J. Dean, and S. Ghemawat. 2008. MapReduce: Simplified Data Processing on Large Clusters. Commun ACM 51, 1 (2008), 107--113. Google ScholarDigital Library
- Amazon Elastic MapReduce. http://aws.amazon.com/elasticmapreduce/. (Site visited January 2016).Google Scholar
- Y. Chen, V. Paxson, and R. Katz. 2010. What's New About Cloud Computing Security?. Technical Report UCB/EECS-2010-5, Berkeley.Google Scholar
- Y. Wang, and J. Wei. 2011. VIAF: Verification-Based integrity assurance framework for mapReduce. In Proc. IEEE International Conference on Cloud Computing (Cloud 11), IEEE Press, 300--307. Google ScholarDigital Library
- W. Wei, J. Du, T. Yu, and X. Gu. 2009. SecureMR: A Service Integrity Assurance Framework for MapReduce. In Proceedings of the 2009 Annual Computer Security Applications Conference, 73--82. Google ScholarDigital Library
- Mapper API for Google AppEngine. http://googleappengine.blogspot.com/2010/07/introducing-mapper-api.html (site visited January 2016).Google Scholar
- B. Gedik, H. Andrade, K. L. Wu, P. S. Yu, and M. Doo. 2008. SPADE: The System S declarative stream processing engine. In Proc. ACM SIGMOD Int. Conf. on Management of Data, 1123--1134. Google ScholarDigital Library
- D. Wenliang, J. Jing, M. Mangal, and M. Murugesan. 2004. Uncheatable grid computing. In the 24th International Conference on Distributed Computing Systems. IEEE Computer Society, Washington, 4--11. Google ScholarDigital Library
- D. Kondo, F. Araujo, P. Malecot, P. Domingues. L. M. Silva, G. Fedak, and F. Cappello. 2007. Characterizing result errors in Internet Desktop Grids. In Euro-Par2007. LNCS, Vol. 4641. Springer, Heidelberg, 361--371. Google ScholarDigital Library
- L. F. Sarmenta. 2002. Sabotage-tolerance mechanisms for volunteer computing systems. Future Generation Computer Systems. Vol. 18, No. 4, 561--572. Google ScholarDigital Library
- P. Domingues, B. Sousa, and L. M. Silva. 2007. Sabotage-tolerance and trust management in Desktop Grid computing. Future Generation Computer System, Vol. 23, No. 7, 904--912. Google ScholarDigital Library
- A. Bendahmane, M. Essaaidi, A. El Moussaoui, and A. Younes. 2012. Result verification mechanism for mapreduce computation integrity in cloud computing. International Conference on Complex Systems. 1--6.Google Scholar
- M. Grant, S. Sehrish, J. Bent, and J. Wang. 2008. Introducing map-reduce to high end computing. 3rd Petascale Data Storage Workshop.Google Scholar
- S. Chen and S. Schlosser. 2008. Mapreduce meets wider varieties of applications. Technical Report IRP- TR - 08- 05, Intel Research.Google Scholar
- A. Matsunaga, M. Tsugawa, and J. Fortes. 2008. Cloudblast: Combining mapreduce and virtualization on distributed resources for bioinformatics. Microsoft eScience Workshop. Google ScholarDigital Library
- Hadoop -- mapreduce. https://wiki.apache.org/hadoop/MapReduce, (site visited April 27th 2017).Google Scholar
- Y. Zhiwei, W. Chaokun, T. Clark, W. Jianmin, L. Shiguo and V. V. Athanasios. 2012. Multimedia Applications and Security in MapReduce: Opportunities and Challenges. Concurrency and Computation: Practice and Experience, Vol. 24, No. 17, 2083--2101. Google ScholarDigital Library
- Y. Brun, G. Edwards, B. Y. Jae and N. Medvidovic, "Smart Redundancy for Distributed Computation," in the 31st International Conference on Distributed Computing Systems, Minneapolis, MN, pp. 665 - 676, 2011. Google ScholarDigital Library
- Z. Xiao and Y. Xiao. 2011. Accountable mapreduce in cloud computing. IEEE International Conference on Computer Communications Workshops. USA, 1082--1087.Google Scholar
- I. Roy, S. Setty, A. Kilzer, V. Shmatikov, and E. Witchel. 2010. Airavat: Security and privacy for mapreduce. In Proceedings of the 7th USENIX conference on Networked systems design and implementation. USENIX Association. Google ScholarDigital Library
- C. Huang, S. Zhu and D. Wu. 2012. Towards Trusted Services: Result Verification Schemes for MapReduce. In Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Ottawa, Canada. Google ScholarDigital Library
- P. Golle and I. Mironov. 2001. Uncheatable distributed computations. In CT-RSA 2001: Proceedings of the 2001 Conference on Topics in Cryptology. London, UK: Springer-Verlag, 425--440. Google ScholarDigital Library
- D. Szajda, B. Lawson, and J. Owen. 2003. Hardening functions for large scale distributed computations. In Proceedings of IEEE Symposium on Security and Privacy. Google ScholarDigital Library
- P. Varalakshmi, S. T. Selvi, K. A. Devi, C. Krithika and R. Kundhavai. 2008. A Quiz-Based Trust Model with Optimized Resource Management in Grid. In the Thirteenth IEEE Asia-Pacific Computer Systems Architecture Conference. Taiwan, 1--6.Google Scholar
- J. D. Sonnek, A. Chandra, and J. B. Weissman. 2007. Adaptive reputation-based scheduling on unreliable distributed infrastructures. IEEE Trans. Parallel Distrib. Syst., Vol. 18, No. 11, 1551--1564. Google ScholarDigital Library
- P. Golle and S. Stubblebine. 2002. Secure distributed computing in a commercial environment. In Financial Cryptography. Springer. Google ScholarDigital Library
- D. Szajda, B. Lawson, and J. Owen. 2005. Toward an optimal redundancy strategy for distributed computations. In Proceedings of the IEEE International Conference on Cluster Computing. Boston, MA, 1--11.Google Scholar
- J. Du, N. Shah, and X. Gu. 2011. Adaptive data-driven service integrity attestation for multi-tenant cloud systems. In Proc. IEEE Int. Workshop on Quality of Service, 1--9. Google ScholarDigital Library
- J. Du, W. Wei, X. Gu, and T. Yu. 2010. RunTest: Assuring integrity of dataflow processing in cloud computing infrastructures. In Proc. ACM Symposium on Information, Computer and Communications Security, 293--304. Google ScholarDigital Library
- Y. A. Zuev. 1998. On the estimation of efficiency of voting procedures. Theory Probab. Appl. Vol. 42, No. 1, 71--81.Google ScholarCross Ref
- J. Lin, C. Dyer. 2010. Data-Intensive Text Processing with MapReduce. Synthesis Lectures on Human Language Technologies, Vol. 3, 1--177. Google ScholarDigital Library
- Wordcount example. http://wiki.apache.org/hadoop/WordCount (site visited June 29th 2017).Google Scholar
Recommendations
Log files Analysis Using MapReduce to Improve Security
AbstractLog files are a very useful source of information to diagnose system security and to detect problems that occur in the system, and are often very large and can have complex structure. In this paper, we provide a methodology of security analysis ...
Mapreduce over the hybrid cloud: a novel infrastructure management policy
UCC '15: Proceedings of the 8th International Conference on Utility and Cloud ComputingOver the last few years, the context of big data has gained a significant traction due to many factors. While the public cloud model had been deeply studied to face the increasing demand for large-scale data processing capabilities, many organizations ...
Software execution protection in the cloud
EWDCC '12: Proceedings of the 1st European Workshop on Dependable Cloud ComputingMost cloud computing services execute software on behalf of their users. Many war stories and several studies suggest that such software execution is threatened by accidental arbitrary faults and malicious insiders. We present two lines of work to ...
Comments