ABSTRACT
Along with the explosive growth of data, large-scale computing resources is much needed to enable data capturing, storing, and processing in the big data era. In order to provide computing resources, cloud computing is increasingly being used for processing big data analysis. However, migrating such data processing application to IaaS clouds involves significant performance variations based on a way to deploy the application into a set of virtual machines. Since the application deployments affects to performance of the data processing application in IaaS clouds, understanding how components of the application utilizes available com-puting resources is necessary to building big data processing platform. In this pa-per, we perform an experimental investigation by deploying a set of application components belonging to HBase and Hadoop. As an applicable example, we ap-ply a processing application of the traffic state categorization to estimate traffic collision probability. Especially, we focus on investigating the utilization of disk I/O resource in a hot-spotting case. As a result, since our testing application shows relatively small write operations, HRegion and TaskTracker virtual machines can be deployed with disk intensive applications (i.e., DataNode) on a same physical machine. Also, we observed overloaded virtual machines, which provides a particular data sets in a specific data region, which represents how data allocation strategy impacts on the performance of the data processing applications.
- Russom, P. 2011. Big data analytics. TDWI Best Practices Report, Fourth Quarter.Google Scholar
- Kaisler, S., Armour, F., Espinosa, J. A., and Money, W. 2013. Big data: Issues and challenges moving forward. In Proceedings of the 46th Hawaii International Conference on System Sciences (Hawaii, U.S.A., January, 2013), IEEE, 995--1004. Google ScholarDigital Library
- Madden, S. 2012. From databases to big data. IEEE Internet Computing, 3, 4--6. Google ScholarDigital Library
- Gantz, J., and Reinsel, D. 2011. Extracting value from chaos. IDC iview, (1142), 9--10.Google Scholar
- Hseush, W., Huang, Y. C., Hsu, S. C., and Pu, C. 2013. Realtime collaborative planning with big data: Technical challenges and in-place computing. In Proceedings of the 9th International Conference on Collaborative Computing: Networking, Applications and Worksharing. IEEE, 96--104.Google Scholar
- Boyd, D., and Crawford, K. 2012. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, 15, 5, 662--679.Google Scholar
- Caltrans Performance Measurement System (PeMS). 2015. http://pems.dot.ca.gov/Google Scholar
- Kambatla, K., Kollias, G., Kumar, V., and Grama, A. 2014. Trends in big data analytics. Journal of Parallel and Distributed Computing, 74, 7, 2561--2573.Google ScholarCross Ref
- Agrawal, D., Das, S., and El Abbadi, A. 2010. Big data and cloud computing: new wine or just new bottles?. In Proceedings of the VLDB Endowment, 3, 1--2, 1647--1648. Google ScholarDigital Library
- Tsuchiya, S., Sakamoto, Y., Tsuchimoto, Y., and Lee, V. 2012. Big data processing in cloud environments. Fujitsu Sci. Tech. J, 48, 2, 159--168.Google Scholar
- Chaudhuri, S. 2012. What next?: a half-dozen data management research goals for big data and the cloud. In Proceedings of the 31st symposium on Principles of Database Systems. ACM. 1--4. Google ScholarDigital Library
- Rehman, M. S., and Sakr, M. F. 2010. Initial findings for provisioning variation in cloud computing. In IEEE Second International Conference on Cloud Computing Technology and Science. IEEE. 473--479. Google ScholarDigital Library
- Jung, G., Mukherjee, T., Kunde, S., Kim, H., Sharma, N., and Goetz, F. 2013. Cloudadvisor: A recommendation-as-a-service platform for cloud configuration and pricing. In IEEE Ninth World Congress on Services. IEEE. 456--463. Google ScholarDigital Library
- Huang, Y., Dong, H., Yesha, Y., and Zhou, S. 2014. A scalable system for community discovery in Twitter during Hurricane Sandy. In 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. IEEE. 893--899.Google Scholar
- HBase. 2015. http://hbase.apache.org/Google Scholar
- Hadoop. 2015. http://hadoop.apache.org/Google Scholar
- Schad, J., Dittrich, J., and Quiané-Ruiz, J. A. 2010. Runtime measurements in the cloud: observing, analyzing, and reducing variance. In Proceedings of the VLDB Endowment. 3, 1--2, 460--471. Google ScholarDigital Library
- Varia, J. 2010. Architecting for the cloud: Best practices. Amazon Web Services.Google Scholar
- Lloyd, W., Pallickara, S., David, O., Lyon, J., Arabi, M., and Rojas, K. 2013. Performance implications of multi-tier application deployments on Infrastructure-as-a-Service clouds: Towards performance modeling. Future Generation Computer Systems. 29, 5, 1254--1264. Google ScholarDigital Library
- Yeo, H., Jang, K., Skabardonis, A., and Kang, S. 2013. Impact of traffic states on freeway crash involvement rates. Accident Analysis & Prevention, 50, 713--723.Google ScholarCross Ref
Recommendations
'Big data', Hadoop and cloud computing in genomics
Graphical abstractDisplay Omitted Ever improving next generation sequencing technologies has led to an unprecedented proliferation of sequence data.Biology is now one of the fastest growing fields of big data science.Cloud computing and big data ...
G-Hadoop: MapReduce across distributed data centers for data-intensive computing
Recently, the computational requirements for large-scale data-intensive analysis of scientific data have grown significantly. In High Energy Physics (HEP) for example, the Large Hadron Collider (LHC) produced 13 petabytes of data in 2010. This huge ...
Research on Security Mechanism of Hadoop Big Data Platform
CIUP '22: Proceedings of the 2022 International Conference on Computational Infrastructure and Urban PlanningAs a virtualized resource realization mode, Hadoop cloud platform has become an open-source cloud computing architecture and big data analysis platform. The platform plays a pivotal role in the information field, but the security mechanism of the Hadoop ...
Comments