Abstract
Nowadays, MapReduce has emerged as a facto programming model for parallel processing of large-scale datasets with a commodity cluster of machines. MapReduce and its variants have been widely researched in the industry and academic communities. ComMapReduce further extends MapReduce by adding lightweight communication mechanisms and also enhances the efficiency of query processing applications. However, we find that the performance of query processing applications changes a lot in different communication strategies of ComMapReduce framework. It is necessary to identify the most optimal communication strategies of the query processing applications. Extreme learning machine (ELM) can exactly provide classification performance with an extremely fast training speed. Therefore, in this paper, first, we propose an efficient query processing optimization approach based on ELM in ComMapReduce framework, named ELM_CMR. Then, we design two implementations of our ELM_CMR approach to further optimize the performance of query processing applications. Finally, extensive experiments are conducted to verify the effectiveness and efficiency of our proposed ELM_CMR.
Similar content being viewed by others
Notes
References
Dean Jeffrey, Ghemawat Sanjay (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp 1099–1110
Thusoo A, Sarma Joydeep S, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009) Hive: a warehousing solution over a map-reduce framework. Proceed VLDB Endow 2(2):1626–1629
Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R (2010) Hive-a petabyte scale data warehouse using hadoop. In: Data Engineering (ICDE), pp 996–1005
Carstoiu D, Lepadatu E, Gaspar M (2010) Hbase-non sql database, performances evaluation. IJACT-AICIT 2(5):42–52
Abouzeid A, Bajda-Pawlikowski K, Abadi D, Silberschatz A, Rasin A (2009) HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proceed VLDB Endow 2(1):922–933
Yang H-C, Dasdan A, Hsiao R-L, Parker DS (2007) Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 1029–1040
Jiang D, Tung Anthony KH, Chen G (2011) Map-join-reduce: toward scalable and efficient data analysis on large clusters. Knowl Data Eng 23(9):1299–1311
Blanas S, Patel JM, Ercegovac V, Rao J, Shekita EJ, Tian Y (2010) A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp 975–986
Vernica R, Carey MJ, Li C (2010) Efficient parallel set-similarity joins using MapReduce. In: Proceedings of the 2010 international conference on management of data, pp 495–506
Afrati FN, Borkar V, Carey M, Polyzotis N, Ullman JD (2011) Map-reduce extensions and recursive queries. In: Proceedings of the 14th international conference on extending database technology, pp 1–8
Dittrich J, Quiané-Ruiz J-A, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proceed VLDB Endow 3(1–2):518–529
Jahani E, Cafarella MJ, Ré C (2011) Automatic optimization for MapReduce programs 4(6):385–396
Zhang X, Chen L, Wang M (2012) Efficient multi-way theta-join processing using MapReduce. Proceed VLDB Endow 5(11):1184–1195
Kim Y, Shim K (2012) Parallel top-k similarity join algorithms using MapReduce. In: Data Engineering (ICDE), pp 510–521
Ding L, Xin J, Wang G, Huang S (2012) ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms, pp 150–168
Ding L, Wang G, Xin J, Wang X, Huang S, Zhang R (2013) ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms. Data Knowl Eng
Huang G-B, Zhu Q-Y, Siew C-K (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings. 2004 IEEE international joint conference Neural Networks, 2004, pp 985–990
Chacko BP, Krishnan VRV, Raju G, Anto PB (2012) Handwritten character recognition using wavelet energy and extreme learning machine. Int J Mach Learn Cybern 3(2):149–161
Huang G-B, Wang Dian H, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122
Rong H-J, Huang G-B, Sundararajan N, Saratchandran P (2009) Online sequential fuzzy extreme learning machine for function approximation and classification problems. Systems Man Cybern Part B Cybern 39(4):1067–1072
Sun Y, Yuan Y, Wang G (2011) An OS-ELM based distributed ensemble classification framework in p2p networks. Neurocomputing 74(16):2438–2443
Wang B, Wang G, Li J, Wang B (2012) Update strategy based on region classification using ELM for mobile object index. Soft Comput 16(9):1607–1615
Wang G, Zhao Y, Wang D (2008) A protein secondary structure prediction framework based on the extreme learning machine. Neurocomputing 72(1):262–268
Zhang R, Huang G-B, Sundararajan N, Saratchandran P (2007) Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Trans Comput Biol Bioinformatics (TCBB) 4(3):485–495
Zhao X-G, Wang G, Bi X, Gong P, Zhao Y (2011) XML document classification based on ELM. Neurocomputing 74(16):2444–2451
Jun W, Shitong W, Chung F-l (2011) Positive and negative fuzzy rule system, extreme learning machine and image classification. Int J Mach Learn Cybern 2(4):261–271
Wang X-Z, Shao Q-Y, Qing M, Jun-Hai Z (2013) Architecture selection for networks trained with extreme learning machine using localized generalization error model. Neurocomputing 102:3–9
Huang G-B, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16):3460–3468
Zhai J-h, Xu H-y, Wang X-z (2012) Dynamic ensemble extreme learning machine based on sample entropy. Soft Comput 16(9):1493–1502
He Q, Shang T, Zhuang F, Shi Z (2013) Parallel extreme learning machine for regression based on MapReduce. Neurocomputing 102:52–58
Huang G-B, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70(16):3056–3062
Huang G-B, Chen L, Siew C-K (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes, Neural Networks. IEEE Trans 17(4):879–892
Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501
Borzsony S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on Data Engineering, pp 421–430
Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. Pattern Anal Mach Intell 27(8):1226–1238
Acknowledgments
This research was partially supported by the National Natural Science Foundation of China under Grant Nos. 60933001, 61025007, and 61100022; the National Basic Research Program of China under Grant No. 2011CB302200-G; the 863 Program under Grant No. 2012AA011004, and the Fundamental Research Funds for the Central Universities under Grant No. N110404009.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ding, L., Xin, J. & Wang, G. An efficient query processing optimization based on ELM in the cloud. Neural Comput & Applic 27, 35–44 (2016). https://doi.org/10.1007/s00521-013-1543-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-013-1543-3