Skip to main content
Log in

Materialized view selection using evolutionary algorithm for speeding up big data query processing

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

For speeding up query processing on Big Data, frequent sub-queries or views may be materialized such that the query processing cost is minimized with optimum cost of maintaining the materialized views and/or queries. Materializing frequent sub-queries and views means that resultant data set of the views reside in the memory of one or more nodes in the cluster, so that it reduces the MapReduce cost, submission and scheduling cost of Distributed File System jobs for query processing. We have defined materialized views as resultant data of frequent sub-queries and aggregation functions of a set of Big Data warehousing queries that are saved for enhancing query performance. The problem is defined as a multi-objective optimization problem for minimizing the total query processing MapReduce cost, MapReduce cost for maintaining the materialized views and the number of views selected for materializing with maximized total size of the views selected. We applied Differential Evolution algorithm and NSGA-II to study their performances for developing a recommendation system for selecting views for materializing in Big Data warehousing.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Aouiche, K., & Darmont, J. (2009). Data mining-based materialized view and index selection in data warehouses. Journal of Intelligent Information System, 33, 65–93.

    Article  Google Scholar 

  • Aouiche, K., Jouve, P., & Darmont, J. (2006). Clustering-based materialized view selection in data warehouses. Y. Manolopoulos, J. Pokorn & T. Sellis (Eds.), Proceeding of 10th east-European conference advances in database and information systems, ADBIS 2006, LNCS, (Vol. 4152, pp. 81–95). Springer-Verlag, Berlin, Heidelberg, Thessaloniki: Hellas.

  • Bandyopadhyay, S., Pal, S., & Aruna, B. (2004). A simulated annealing-based multiobjective optimization algorithm: Amosa. IEEE Transactions on Systems Man and Cybernetics Part B, 34(5), 2088–2099.

    Article  Google Scholar 

  • Bandyopadhyay, S., Saha, S., Maulik, U., & Deb, K. (2008). A simulated annealing-based multiobjective optimization algorithm: Amosa. IEEE Transactions on Evolutionary Computation, 12(3), 269–283.

    Article  Google Scholar 

  • Das, S., & Suganthan, P.N. (2011). Differential evolution: a survey of the state-of-the-art. IEEE Transactions on Evolutionary Computation, 15(1), 4–31.

    Article  Google Scholar 

  • Dean, J., & Ghemawat, S. (2004). Mapreduce: Simplified data processing on large clusters. In OSDI’04: 6th symposium on operating system design and implementation, San Francisco, CA, December. https://www.usenix.org/legacy/publications/library/proceedings/osdi04/tech/fullpapers/dean/dean_html/index.html.

  • Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Transactions on Evolutionary Computation, 6(2), 182–197.

    Article  Google Scholar 

  • Deb, K., Thiele, L., Laumanns, M., & Zitzler, E. (2001). Scalable test problems for evolutionary multi-objective optimization. Tech. rep., Institute fur Technische Informatik und Kommunikationsetze. Switzerland: Zurich.

    Google Scholar 

  • Derakhshan, R., Dehne, F., Korn, O., & Stantic, B. (2006). Simulated annealing for materialized view selection in data warehousing environment. In Proceedings of 24th IASTED international conference on Database and applications (pp. 89–94). Austria: Innsbruck.

  • Derakhshan, R., Stantic, B., Korn, O., & Dehne, F. (2008). Parallel simulated annealing for materialized view selection in data warehousing environments. In Proceedings of ICA3PP 2008 international conference on algorithms and architecture 2008, LNCS (Vol. 5022, pp. 121–132). Berlin: Springer.

  • Foundation, T.A.S. (2014). Apache hadoop. URL http://hadoop.apache.org.

  • Foundation, T.A.S. (2014). Apache hive tm. URL http://hive.apache.org.

  • Gong, T., & Tuson, A. (2006). Differential evolution for binary encoding. In 11th online world conference on soft computing in industrial applications (WSC11).

  • Goswami, R., Bhattacharyya, D., Dutta, M., & Kalita, J. (2016). Approaches and issues in view selection for materializing in data warehouse. International Journal of Business Information Systems, 21(1), 17–47.

    Article  Google Scholar 

  • Goswami, R., Bhattacharyya, D.K., & Dutta, M. (2012). Selection of views for materializing in data warehouse using mosa and amosa. In D. C. Wyld, J. Zizka & D. Nagamalai (Eds.) Advances in computer science, engineering and applications, Proceedings of the 2nd international conference on computer science, engineering and applications (ICCSEA 2012), Volume 1, advances in intelligent and soft computing (Vol. 166, pp. 619–628). Berlin: Springer.

  • Goswami, R., Bhattacharyya, D.K., & Dutta, M. (2013). Multiobjective differential evolution algorithm using binary encoded data in selecting views for materializing in data warehouse In B. K. Panigrahi, P. N. Suganthan, S. Das & S. S. Dash (Eds.). Swarm, evolutionary, and memetic computing, lecture notes in computer science (Vol. 8298, pp. 95–106). Springer.

  • Gupta, H., Harinarayan, V., Rajaraman, A., & Ullman, J. (1997). Index selection for olap. In Proceedings of the 13th international conference on data engineering, ICDE’97 (pp. 208–219). Washington: IEEE Computer Society.

  • Gupta, H., & Mumick, I. (1999). Selection of views to materialize under a maintenance cost constraint In C. Beeri & P. Bruneman (Eds.) Proceedings of international conference on database theory, ICDT 1999, LNCS (Vol. 1540, pp. 453–470). Heidelberg: Springer.

  • Gupta, H., & Mumick, S. (2005). Selection of views to materialize in a data warehouse. IEEE Transactions on Knowledge and Data Engineering, 17(1), 24–43.

    Article  Google Scholar 

  • Hagleitner, G. (2014). Cost-based optimization in hive. URL https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive.

  • Harinarayan, V., Rajaraman, A., & Ullman, J. (1996). Implementing data cubes efficiently. In Proceedings of ACM SIGMOD international conference on management of data (pp. 205–216). Montreal: ACM SIGMOD.

  • Hyde, J. (2014). Discardable in-memory, materialized query for hadoop. URL http://hadoopsummit.org/san-jose/schedule.

  • Inc., H. (2014). Hortonworks data platform. URL http://hortonworks.com/hdp/.

  • Iorio, A.W., & Li, X. (2006). Incorporating directional information within a differential evolution algorithm for multi-objective optimization. In Proceedings of the 8th annual conference on genetic and evolutionary computation (pp. 691–698). ACM.

  • Kukkonen, S., & Lampinen, J. (2005). Gde3: The third evolution step of generalized differential evolution. In The 2005 IEEE congress on evolutionary computation, 2005 (Vol. 1, pp. 443–450). IEEE.

  • Lahman, S. (2014). Baseball archive. URL http://seanlahman.com/files/database/lahman591-csv.zip.

  • Lawrence, M. (2006). Multiobjective genetic algorithms for materialized view selection in olap data warehouses. In Proceedings of the 8th annual conference on genetic and evolutionary computation (pp. 699–706). ACM.

  • Lee, M., & Hammer, J. (2001). Speeding up materialized view selection in data warehouses using a randomized algorithm. International Journal of Cooperative Information System, 10, 327– 353.

    Article  Google Scholar 

  • Loureiro, J., & Belo, O. (2006). An evolutionary approach to the selection and allocation of distributed cubes. In Proceedings of database engineering and applications symposium IDEAS-06 (pp. 243–248). Delhi: IEEE.

  • Madavan, N.K. (2002). Multiobjective optimization using a pareto differential evolution approach. In Proceedings of the 2002 congress on evolutionary computation, 2002. CEC’02 (Vol. 2, pp. 1145–1150). IEEE.

  • Nadeua, T., & Teorey, T. (2002). Achieving scalability in olap materialized view selection. In Proceedings of ACM 5th international workshop on data warehousing and OLAP, DOLAP-02 (pp. 28–34). McLean: ACM.

  • Qingzhou, Z., Xia, S., & Ziqiang, W. (2009). An efficient ma-based materialized views selection algorithm. In Proceedings of the 2009 IITA international conference on control, automation and systems engineering (pp. 315–318). China: Zhangjiajie.

  • Radcliffe, N.J. (1991). Equivalenc class analysis of genetic algorithms. Complex Systems pp. 183–205.

  • Radcliffe, N.J. (1991). Forma analysis and random respectful recombination. In Proceedings of the international conference on genetic algorithms - ICGA 1991 (pp. 222–229). San Marco: Morgan Kaufmann.

  • Roy, T., & Fielding Laguna Beach, C. (1999). Certificate of incorporation of the apache software foundation. URL http://www.apache.org/foundation/records/certificate.html.

  • Schott, J.R. (1995). Fault tolerent design using single and multi-criteria genetic algorithms. Ph.d. dissertation.

  • Serna-Encinas, M.T., & Hoya-Montano, J.A. (2007). Algorithm for selection of materialized views: based on a costs model. http://dx.doi.org/10.1109/ENC.2007.38.

  • Smith, J.R., Li, C.S., & Jhingran, A. (2004). A wavelet framework for adapting data cube views for olap. IEEE Transactions on Knowledge and Data Engineering, 16 (5), 552–565.

    Article  Google Scholar 

  • Smith, K.I., Everson, R.M., Fieldsend, J.E., Murphy, C., & Misra, R. (2008). Dominance-based multiobjective simulated annealing. IEEE Transactions on Evolutionary Computation, 12(3), 323–283.

    Article  Google Scholar 

  • Sokal, R., & Michener, C. (1958). A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin, 38, 1409–1438.

    Google Scholar 

  • Song, X., & Gao, L. (2010). An ant colony based algorithm for optimal selection of materialized view. In 2010 international conference on intelligent computing and integrated systems (ICISS) (pp. 534–536). IEEE.

  • Srinivas, N., & Deb, K. (1995). A fast and elitist multiobjective genetic algorithm: Nsga-ii. Evolutionary Computing, 2(3), 221–248.

    Article  Google Scholar 

  • Storn, R., & Price, K. (1997). Differential evolution- a simple and efficient heuristic for global optimization over continuous spaces. Journal of Global Optimization, 11, 341–359.

    Article  MATH  MathSciNet  Google Scholar 

  • Sun, X., & Wang, Z. (2009). An efficient materialized views selection algorithm based on pso. In Proceedings of international workshop on intelligent systems and applications 2009 (pp. 1–4). China: Wuhan.

  • Tusar, T., & Filipic, B. (2007). Differential evolution versus genetic algorithms in multiobjective optimization. In Proceedings of the 4th international conference on evolutionary multi-criterion optimization, LNCS, vol. 4403, pp. 257–271. Springer-Verlag.

  • White, T. (2012). Hadoop: The Definitive Guide, 3 edn. O’Reilly, O’Reilly Media, Inc., 1005 Gravenstein Highway North, sebastopol, CA 95472.

  • Xue, F., Sanderson, A.C., & Graves, R.J. (2003). Pareto-based multi-objective differential evolution. In The 2003 congress on evolutionary computation, 2003. CEC’03 (Vol. 2, pp. 862–869). IEEE.

  • Yang, J., Karlapalem, K., & Li, Q. (1997). Algorithm for materialized view design in data warehousing environment. In Proceedings of VLDB 1997 (pp. 136–145). Greece: Athens.

  • Zhang, C., & Yang, J. (1999). Genetic algorithm for materialized view selection in data warehouse environments. In M. Mohania & A. M. Tjoa (Eds.) Proceedings of data warehousing and knowledge discovery, 1st international conference, DaWak 1999, LNCS (Vol. 1676, pp. 116–125).

  • Zhang, C., Yao, X., & Yang, J. (2001). An evolutionary approach to materialized views selection in a data warehouse environment. IEEE Transactions on Systems and Cybernetics Part C: Applications and Reviews, 31(3), 282–294.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajib Goswami.

Ethics declarations

Disclosure of potential conflicts of interest

The authors declare that they have no conflict of interest.

Informed Consent

Not applicable.

Research involving Human Participants and/or Animals

The authors declare that, in this research, neither human participants nor animals are involved in experimentation.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Goswami, R., Bhattacharyya, D.K. & Dutta, M. Materialized view selection using evolutionary algorithm for speeding up big data query processing. J Intell Inf Syst 49, 407–433 (2017). https://doi.org/10.1007/s10844-017-0455-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-017-0455-6

Keywords

Navigation