Abstract
Recently, many companies and research organizations are seeking scalable solutions by using Hadoop ecosystems. The log data management with large-scale and real-time properties is one of the appropriate application on top of Hadoop. In this paper, we focus on SQL and NoSQL choices for building Hadoop-based log data management system. For this purpose, we first select major products supporting SQL and NoSQL, and we then present an appropriate scheme for each product by considering its own characteristics. All the schema are for real-time monitoring and analyzing the log data. For each product, we implement insertion and selection operations of log data in Hadoop, and we analyze the performance of these operation. Analysis results show that MariaDB and MongoDB are fast in the insertion, and PostgreSQL and HBase are fast in the selection. We believe that our evaluation results will be very helpful for users to choose Hadoop SQL and NoSQL products for handling large-scale and real-time log data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Boulmakoul, A., Karim, L., Laarabi, M.H., Sacile, R., Garbolino, E.: MongoDB-Hadoop distributed and scalable framework for spatio-temporal hazardous materials data warehousing. In: Proceedings of the 7th Int’l Congress on Environmental Modelling and Software (iEMSs), San Diego, CA, vol. 3, pp. 2255–2267, June 2014
Vora, M.N.: Hadoop-HBase for large-scale data. In: Proceedings of Int’l Conference on Computer Science and Network Technology, Harbin, China, vol. 4, pp. 601–605, Dec. 2011
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies (MSST), Lake Tahoe, Nevada, pp. 1–10, May 2010
Massie, M., Li, B., Nicholes, B., Vuksan, V.: Monitoring with Ganglia. O’Reilly Media Inc, Sebastopol (2012)
Rabl, T., Sadoghi, M., Jacobsen, H.-A., Gómez-Villamor, S., Muntés-Mulero, V., Mankowskii, S.: Solving big data challenges for enterprise application performance management. Proc. VLDB Endowment 5(12), 1724–1735 (2012)
Wang, X., Chen, H., Wang, Z.: Research on improvement of dynamic load balancing in MongoDB. In: Proceedings of the 11th IEEE Int’l Conference on Dependable, Autonomic and Secure Computing (DASC), Chengdu, Sichuan, China, pp. 124–130, Dec. 2013
HBase. http://hbase.apache.org/
Riak. http://basho.com/riak/
Acknowledge
This research was funded by the MSIP(Ministry of Science, ICT & Future Planning), Korea in the ICT R&D Program 2014.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Son, S., Gil, MS., Moon, YS., Won, HS. (2015). Performance Analysis of Hadoop-Based SQL and NoSQL for Processing Log Data. In: Liu, A., Ishikawa, Y., Qian, T., Nutanong, S., Cheema, M. (eds) Database Systems for Advanced Applications. DASFAA 2015. Lecture Notes in Computer Science(), vol 9052. Springer, Cham. https://doi.org/10.1007/978-3-319-22324-7_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-22324-7_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22323-0
Online ISBN: 978-3-319-22324-7
eBook Packages: Computer ScienceComputer Science (R0)