Skip to main content
Log in

An efficient query processing optimization based on ELM in the cloud

  • Extreme Learning Machine and Applications
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Nowadays, MapReduce has emerged as a facto programming model for parallel processing of large-scale datasets with a commodity cluster of machines. MapReduce and its variants have been widely researched in the industry and academic communities. ComMapReduce further extends MapReduce by adding lightweight communication mechanisms and also enhances the efficiency of query processing applications. However, we find that the performance of query processing applications changes a lot in different communication strategies of ComMapReduce framework. It is necessary to identify the most optimal communication strategies of the query processing applications. Extreme learning machine (ELM) can exactly provide classification performance with an extremely fast training speed. Therefore, in this paper, first, we propose an efficient query processing optimization approach based on ELM in ComMapReduce framework, named ELM_CMR. Then, we design two implementations of our ELM_CMR approach to further optimize the performance of query processing applications. Finally, extensive experiments are conducted to verify the effectiveness and efficiency of our proposed ELM_CMR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Notes

  1. http://hadoop.apache.org/.

References

  1. Dean Jeffrey, Ghemawat Sanjay (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  2. Olston C, Reed B, Srivastava U, Kumar R, Tomkins A (2008) Pig latin: a not-so-foreign language for data processing. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data, pp 1099–1110

  3. Thusoo A, Sarma Joydeep S, Jain N, Shao Z, Chakka P, Anthony S, Liu H, Wyckoff P, Murthy R (2009) Hive: a warehousing solution over a map-reduce framework. Proceed VLDB Endow 2(2):1626–1629

    Article  Google Scholar 

  4. Thusoo A, Sarma JS, Jain N, Shao Z, Chakka P, Zhang N, Antony S, Liu H, Murthy R (2010) Hive-a petabyte scale data warehouse using hadoop. In: Data Engineering (ICDE), pp 996–1005

  5. Carstoiu D, Lepadatu E, Gaspar M (2010) Hbase-non sql database, performances evaluation. IJACT-AICIT 2(5):42–52

    Article  Google Scholar 

  6. Abouzeid A, Bajda-Pawlikowski K, Abadi D, Silberschatz A, Rasin A (2009) HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proceed VLDB Endow 2(1):922–933

    Article  Google Scholar 

  7. Yang H-C, Dasdan A, Hsiao R-L, Parker DS (2007) Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD international conference on management of data, pp 1029–1040

  8. Jiang D, Tung Anthony KH, Chen G (2011) Map-join-reduce: toward scalable and efficient data analysis on large clusters. Knowl Data Eng 23(9):1299–1311

    Article  Google Scholar 

  9. Blanas S, Patel JM, Ercegovac V, Rao J, Shekita EJ, Tian Y (2010) A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, pp 975–986

  10. Vernica R, Carey MJ, Li C (2010) Efficient parallel set-similarity joins using MapReduce. In: Proceedings of the 2010 international conference on management of data, pp 495–506

  11. Afrati FN, Borkar V, Carey M, Polyzotis N, Ullman JD (2011) Map-reduce extensions and recursive queries. In: Proceedings of the 14th international conference on extending database technology, pp 1–8

  12. Dittrich J, Quiané-Ruiz J-A, Jindal A, Kargin Y, Setty V, Schad J (2010) Hadoop++: making a yellow elephant run like a cheetah (without it even noticing). Proceed VLDB Endow 3(1–2):518–529

    Google Scholar 

  13. Jahani E, Cafarella MJ, Ré C (2011) Automatic optimization for MapReduce programs 4(6):385–396

    Google Scholar 

  14. Zhang X, Chen L, Wang M (2012) Efficient multi-way theta-join processing using MapReduce. Proceed VLDB Endow 5(11):1184–1195

    Article  MathSciNet  Google Scholar 

  15. Kim Y, Shim K (2012) Parallel top-k similarity join algorithms using MapReduce. In: Data Engineering (ICDE), pp 510–521

  16. Ding L, Xin J, Wang G, Huang S (2012) ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms, pp 150–168

  17. Ding L, Wang G, Xin J, Wang X, Huang S, Zhang R (2013) ComMapReduce: an improvement of mapreduce with lightweight communication mechanisms. Data Knowl Eng

  18. Huang G-B, Zhu Q-Y, Siew C-K (2004) Extreme learning machine: a new learning scheme of feedforward neural networks. In: Proceedings. 2004 IEEE international joint conference Neural Networks, 2004, pp 985–990

  19. Chacko BP, Krishnan VRV, Raju G, Anto PB (2012) Handwritten character recognition using wavelet energy and extreme learning machine. Int J Mach Learn Cybern 3(2):149–161

    Article  Google Scholar 

  20. Huang G-B, Wang Dian H, Lan Y (2011) Extreme learning machines: a survey. Int J Mach Learn Cybern 2(2):107–122

    Article  Google Scholar 

  21. Rong H-J, Huang G-B, Sundararajan N, Saratchandran P (2009) Online sequential fuzzy extreme learning machine for function approximation and classification problems. Systems Man Cybern Part B Cybern 39(4):1067–1072

    Article  Google Scholar 

  22. Sun Y, Yuan Y, Wang G (2011) An OS-ELM based distributed ensemble classification framework in p2p networks. Neurocomputing 74(16):2438–2443

    Article  Google Scholar 

  23. Wang B, Wang G, Li J, Wang B (2012) Update strategy based on region classification using ELM for mobile object index. Soft Comput 16(9):1607–1615

    Article  Google Scholar 

  24. Wang G, Zhao Y, Wang D (2008) A protein secondary structure prediction framework based on the extreme learning machine. Neurocomputing 72(1):262–268

    Article  Google Scholar 

  25. Zhang R, Huang G-B, Sundararajan N, Saratchandran P (2007) Multicategory classification using an extreme learning machine for microarray gene expression cancer diagnosis. IEEE/ACM Trans Comput Biol Bioinformatics (TCBB) 4(3):485–495

    Article  Google Scholar 

  26. Zhao X-G, Wang G, Bi X, Gong P, Zhao Y (2011) XML document classification based on ELM. Neurocomputing 74(16):2444–2451

    Article  Google Scholar 

  27. Jun W, Shitong W, Chung F-l (2011) Positive and negative fuzzy rule system, extreme learning machine and image classification. Int J Mach Learn Cybern 2(4):261–271

    Article  Google Scholar 

  28. Wang X-Z, Shao Q-Y, Qing M, Jun-Hai Z (2013) Architecture selection for networks trained with extreme learning machine using localized generalization error model. Neurocomputing 102:3–9

    Article  Google Scholar 

  29. Huang G-B, Chen L (2008) Enhanced random search based incremental extreme learning machine. Neurocomputing 71(16):3460–3468

    Article  Google Scholar 

  30. Zhai J-h, Xu H-y, Wang X-z (2012) Dynamic ensemble extreme learning machine based on sample entropy. Soft Comput 16(9):1493–1502

    Article  Google Scholar 

  31. He Q, Shang T, Zhuang F, Shi Z (2013) Parallel extreme learning machine for regression based on MapReduce. Neurocomputing 102:52–58

    Article  Google Scholar 

  32. Huang G-B, Chen L (2007) Convex incremental extreme learning machine. Neurocomputing 70(16):3056–3062

    Article  Google Scholar 

  33. Huang G-B, Chen L, Siew C-K (2006) Universal approximation using incremental constructive feedforward networks with random hidden nodes, Neural Networks. IEEE Trans 17(4):879–892

    Google Scholar 

  34. Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70(1):489–501

    Article  Google Scholar 

  35. Borzsony S, Kossmann D, Stocker K (2001) The skyline operator. In: Proceedings of the 17th international conference on Data Engineering, pp 421–430

  36. Peng H, Long F, Ding C (2005) Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

Download references

Acknowledgments

This research was partially supported by the National Natural Science Foundation of China under Grant Nos. 60933001, 61025007, and 61100022; the National Basic Research Program of China under Grant No. 2011CB302200-G; the 863 Program under Grant No. 2012AA011004, and the Fundamental Research Funds for the Central Universities under Grant No. N110404009.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linlin Ding.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ding, L., Xin, J. & Wang, G. An efficient query processing optimization based on ELM in the cloud. Neural Comput & Applic 27, 35–44 (2016). https://doi.org/10.1007/s00521-013-1543-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-013-1543-3

Keywords

Navigation