skip to main content
10.1145/3456172.3456214acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiccdeConference Proceedingsconference-collections
research-article

Optimization Techniques in Data Management: A Survey

Published:06 August 2021Publication History

ABSTRACT

Data Management can be defined as the process of extracting, storing, organizing, and maintaining the data created and collected in organizations. Today's organizations invest in data management solutions that provide an efficient way to manage data in a unified structure. The enormously growth of data in the last decades has created a necessity for the fast extracting, accessing, and processing of the data. Optimization has been a key component in improving the system's performance, searching and accessing data in different data management solutions. Optimization is a mathematical discipline that formulates mathematical models and finds the best solution among a set of feasible solutions. This paper aims to give a general overview of applications of optimization techniques and algorithms in different areas of data management in the last decades. Data management includes a large group of functionalities, but we will focus on studying and reviewing the recent development of optimization algorithms used in databases, data warehouses, big data and machine learning. Furthermore, this paper will identify applications of optimization in data management, reviews the current solutions proposed and emphasize future topics where there is a lack of studies in data management.

References

  1. Global Data Management Community. 2017. DAMA-DMBOK, Data Management Body of Knowledge, 2nd edition. Technics Publications, New Jersey, USAGoogle ScholarGoogle Scholar
  2. Andreas Antiniou and Wu-Sheng Lu. 2007. The Optimization Problem. In: Practical Optimization. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-71107-2_1Google ScholarGoogle Scholar
  3. Xin-She Yang. 2018. Optimization Techniques and Applications with Examples. John Wiley & Sons, Inc. Hoboken, NJ, USAGoogle ScholarGoogle Scholar
  4. Tansel Dokeroglu, Ender Sevinc, Tayfun Kucukyilmaz, and Ahmet Cosar. (2019). A survey on new generation metaheuristic algorithms, Computers & Industrial Engineering, Volume 137, 106040, ISSN 0360-8352, https://doi.org/10.1016/j.cie.2019.106040.Google ScholarGoogle ScholarCross RefCross Ref
  5. DB-BEST Technologies. Managing Data and Applications Anywhere. Retrieved November 13, 2020 from https://www.dbbest.com/technologies/azure-synapse-analytics/Google ScholarGoogle Scholar
  6. Edgar F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13, 6. 377-387. https://doi.org/10.1145/362384.362685Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Peter Lake and Paul Crowther. 2013. Database Performance. In: Concise Guide to Databases. Undergraduate Topics in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-5601-7_11Google ScholarGoogle Scholar
  8. Jeang-Kuo Chen and Wei-Zhe Lee. 2019. An Introduction of NoSQL Databases Based on Their Categories and Applications Industries. Algorithms 12, 106. 1-17. https://doi.org/10.3390/a12050106.Google ScholarGoogle ScholarCross RefCross Ref
  9. Sahatqija, Kosovare, Jaumin Ajdari, Xhemal Zenuni, Bujar Raufi, and Florije Ismaili. 2018. Comparison between relational and NOSQL databases. In Proceedings of the 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). Opatija, Croatia. 0216-0221. https://doi.org/10.23919/MIPRO.2018.8400041.Google ScholarGoogle Scholar
  10. William H. Inmon. 2005. Building the Data Warehouse, 4th ed. John Wiley & Sons, Inc, IN, USAGoogle ScholarGoogle Scholar
  11. Ralph Kimball and Margy Ross. 2013. The data warehouse toolkit: The complete guide to dimensional modeling. John Wiley & Sons.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ishwarappa and J Anuradha. 2015. A brief introduction on big data 5Vs characteristics and Hadoop technology. Procedia Computer Science, 48. 319–324. https://doi.org/10.1016/j.procs.2015.04.188Google ScholarGoogle ScholarCross RefCross Ref
  13. Krish Krishnan. 2013. Introducing Big Data Technologies. In: Data Warehousing in the age of Big Data, 45-99. Elsevier Inc. Waltham, MA, USAGoogle ScholarGoogle Scholar
  14. Martin Strohbach, Jorg Daubert, Herman Ravkin, and Mario Lischka. 2016. Big Data Storage. In: Cavanillas J., Curry E., Wahlster W. (eds) New Horizons for a Data-Driven Economy. Springer, Cham. https://doi.org/10.1007/978-3-319-21569-3_7Google ScholarGoogle Scholar
  15. Gourav Bathla, Rinkle Rani, and Himanshu Aggarwal. 2018. Comparative study of NoSQL databases for big data storage. International Journal of Engineering & Technology, 7(March 2018), 2-6: 83-87. https://doi.org/10.14419/ijet.v7i2.6.10072.Google ScholarGoogle ScholarCross RefCross Ref
  16. Morefield Communications.On-Premises Vs Cloud. Retrieved November 14, 2020 from https://www.morefield.com/blog/on-premises-vs-cloud/Google ScholarGoogle Scholar
  17. A. L. Samuel. 1959. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3, 3: 210-229. https://doi.org/10.1147/rd.33.0210.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Junfei Qiu, Qihui Wu, Guoru Ding, Yuhua Xu and Shuo Feng. 2016. A survey of machine learning for data processing. EURASIP Journal on Advances in Signal Processing, 67. https://doi.org/10.1186/s13634-016-0355-x.Google ScholarGoogle Scholar
  19. Khadija Aziz, Dounia Zaidouni, and Mostafa Bellafkih. 2018. Big Data Processing using Machine Learning algorithms: MLlib and Mahout Use Case. In Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications (SITA'18). New York: Association for Computing Machinery. Article 25, 1-6. https://doi.org/10.1145/3289402.3289525.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Yannis E. Ioannidis. 1996. Query Optimization. ACM Computing Surveys 28, 1, 121-123. https://doi.org/10.1145/234313.234367.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Surajit Chaudhuri. 1998. An overview of query optimization in relational systems. In Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems (PODS '98). Association for Computing Machinery, New York, NY, USA, 34–43. https://doi.org/10.1145/275487.275492Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. M. Astrahan, M. W. Blasgen, D. D. Chamberlin, K. P. Eswaran, J. N. Gray, P. P. Griffiths, W. F. King, R. A. Lorie, P. R. McJones, J. W. Mehl, G. R. Putzolu, I. L. Traiger, B. W. Wade, and V. Watson. 1976. System R: relational approach to database management. ACM Trans. Database Syst. 1, 2 (June 1976), 97–137. https://doi.org/10.1145/320455.320457Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Matthias Jarke and Jurgen Koch. 1984. Query Optimization in Database Systems. ACM Comput. Surv. 16, 2 (June 1984), 111–152. https://doi.org/10.1145/356924.356928Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Johann Christoph Freytag. 1989. The Basic Principles of Query Optimization in Relational Database Management Systems. In Proceedings of the IFIP 11th World Computer Congress. San Francisko, USA.Google ScholarGoogle Scholar
  25. Tejy Johnson and S. K. Srivatsa. 2012. A Study on Optimization Techniques and Query Execution Operators That Enhances Query Performance. International Journal of Advanced Research in Computer Science 3, 3 (May-June 2012). 228-233.Google ScholarGoogle Scholar
  26. Surajit Chaudhuri. 2009. Query optimizers: time to rethink the contract? In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (SIGMOD '09). Association for Computing Machinery, New York, NY, USA, 961–968. https://doi.org/10.1145/1559845.1559955Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Saurabh Gupta, Gopal Singh Tandel and Umashankar Pandey. 2015. A Survey on Query Processing and Optimization in Relational Database Management System. International Journal of Latest Trends in Engineering and Technology (IJLTET) 5, 1 (January 2015). 439-445. ISSN: 2278-621XGoogle ScholarGoogle Scholar
  28. Ankush Sawarkar and Jaishri Mahesh Waghmare. 2014. Query Processing and Query Optimization in Distributed Database; A Survey. In Proceedings of the International Coneference on Computer Science, Electronics and Communication Engineering(ICCECE). Pune, India.Google ScholarGoogle Scholar
  29. Patricia P. Griffiths and Bradford W. Wade. 1976. An authorization mechanism for a relational database system. ACM Trans. Database Syst. 1, 3 (Sept. 1976), 242–255. https://doi.org/10.1145/320473.320482Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Manuel Reimer. 1983. Solving the Phantom Problem by Predicative Optimistic Concurrency Control. In Proceedings of the 9th International Conference on Very Large Data Bases (VLDB '83). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 81–88.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Michael Stonebraker. 1975. Implementation of integrity constraints and views by query modification. In Proceedings of the 1975 ACM SIGMOD international conference on Management of data (SIGMOD '75:). New York, NY, USA: Association for Computing Machinery. 65-78. https://doi.org/10.1145/500080.500091Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Amol Deshpande, Zachary Ives and Vijayshankar Raman. 2007. Adaptive Query Processing. Foundations and Trends® in Databases 1 ,1. 1-140. http://dx.doi.org/10.1561/1900000001Google ScholarGoogle Scholar
  33. Ramalingam Gomathi and Dhandapani Sharmila. 2014. A Novel Adaptive Cuckoo Search for Optimal Query Plan Generation. The Scientific World Journal. https://doi.org/10.1155/2014/727658.Google ScholarGoogle Scholar
  34. Mukul Joshi and Praveen Ranjan Srivastava. 2013. Query Optimization: An Intelligent Hybrid Approach using Cuckoo and Tabu Search. International Journal of Intelligent Information Technologies (IJIIT) 9 ,1. 1-16. https://doi.org/10.4018/jiit.2013010103.Google ScholarGoogle Scholar
  35. Davies Segera, Mwangi Mbuthia and Abraham Nyete. 2020. An Innovative Excited-ACS-IDGWO Algorithm for Optimal Biomedical Data Feature Selection. BioMed Research International 1-17. https://doi.org/10.1155/2020/8506365.Google ScholarGoogle ScholarCross RefCross Ref
  36. Ladan Golshanara, Seyed Mohammad Taghi Rouhani Rankoohi and Hamed Shah-Hosseini. 2014. A multi-colony ant algorithm for optimizing join queries in distributed database systems. Knowledge and Information Systems 39, 175-206. https://doi.org/10.1007/s10115-012-0608-4.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Chintal Upendra Raval and Kaushal Madhu. 2015. A Review of Algorithms for the Join Ordering Problem in Relational Database Systems. International Journal of Engineering Development and Research (IJEDR) 3, 4, 733-743. ISSN: 2321-9939Google ScholarGoogle Scholar
  38. S. Vellev. 2009. Review of Algorithms for the Join Ordering Problems in Database Query Optimization. Information Technologies and Control, 1, 32-40.Google ScholarGoogle Scholar
  39. Mohammad Alamery, Ahmad Faraahi, H. Haj Seyyed Javadi, Sadegh Nourossana and Hossein Erfani . (2010) Multi-Join Query Optimization Using the Bees Algorithm. In: de Leon F. de Carvalho A.P., Rodríguez-González S., De Paz Santana J.F., Rodríguez J.M.C. (eds) Distributed Computing and Artificial Intelligence. Advances in Intelligent and Soft Computing, vol 79. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14883-5_58Google ScholarGoogle Scholar
  40. Dussan Petkovic. 2010. Comparison of Different Solutions for Solving the Optimization Problem of Large Join Queries. In Proceedings of the 2010 Second International Conference on Advances in Databases, Knowledge, and Data Applications, IEEE Computer Society NW Washington, DC, USA. 51-55. https://doi.org/10.1109/DBKDA.2010.1.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Swati V. Chande and Madhavi Sinha. 2011. Genetic Optimization for the Join Ordering Problem of Database Queries. In Proceedings of the 2011 Annual IEEE India Conference. Hyderabad, India. 1-5. https://doi.org/10.1109/INDCON.2011.6139336.Google ScholarGoogle Scholar
  42. Shyam Padia, Sushant Khulge, Akhilesh Gupta and Path Khadilikar. 2015. Query Optimization Strategies in Distributed Databases. International Journal of Computer Science and Information Technologies (IJCSIT) 6, 5. 4228-4234. ISSN:0975-9646.Google ScholarGoogle Scholar
  43. Lin Xue. 2009. Query optimization strategies and implementation based on distributed database. In Proceedings of the 2009 2nd IEEE. International Conference on Computer Science and Information Technology, IEEE, 480-484. https://doi.org/10.1109/ICCSIT.2009.523450Google ScholarGoogle ScholarCross RefCross Ref
  44. Ladjel Bellatreche. 2009. Optimization and Tuning in Data Warehouse. Encyclopedia of Database Systems, 1995-2003. https://doi.org/10.1007/978-0-387-39940-9_259.Google ScholarGoogle Scholar
  45. Cesar A. Galindo-Legaria, Torsten Grabs, Sreenivas Gukal, Steve Herbert, Aleksandras Surna, Shirley Wang, Wei Yu, Peter Zabback, and Shin Zhang. 2008. Optimizing Star Join Queries for Data Warehousing in Microsoft SQL Server. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering (ICDE '08) Cancun. https://doi.ieeecomputersociety.org/10.1109/ICDE.2008.4497528Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Vasiliki Tziovara, Panos Vassiliadis and Alkis Simitsis. 2007. Deciding the physical implementation of ETL workflows. In Proceedings of the ACM tenth international workshop on Data warehousing and OLAP (DOLAP '07). 49-56. https://doi.org/10.1145/1317331.1317341.Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Syed Muhammad Fawad Ali and Robert Wrembel. 2017. From conceptual design to performance optimization of ETL workflows: current state of research and open problems. The VLDB Journal 26, 777–801. https://doi.org/10.1007/s00778-017-0477-2.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Edward Hung, David W. Cheung, and Ben Kao. 2004. Optimization in data cube system design. Journal of Intelligent Information systems, 23. 17-45. https://doi.org/10.1023/B:JIIS.0000029669.16825.54Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Jorge Loureiro, and Orlando Belo. 2006. A discrete particle swarm algorithm for OLAP data cube selection. In Proceedings of the Eighth International Conference on Enterprise Information Systems – DISI. 46-53. https://doi.org/10.5220/0002496000460053Google ScholarGoogle Scholar
  50. Roy Chandrima, Siddharth Swarup Rautaray and Manjusha Pandey. 2018. Big Data Optimization Techniques: A Survey. International Journal of Information Engineering and Electronic Business(IJIEEB), 10, 4. 41-48. https://doi.org/10.5815/ijieeb.2018.04.06.Google ScholarGoogle Scholar
  51. Singh Gill Sukhpal and Rajkumar Buyya. 2019. Bio-Inspired Algorithms for Big Data Analytics: A Survey, Taxonomy, and Open Challenges. In Big Data Analytics for Intelligent Healthcare Management, by Himansu Das, Bighnaraj Naik, Himansu Sekhar Behera, Nilanjan Dey, 1-17. Elsevier Inc. https://doi.org/10.1016/B978-0-12-818146-1.00001-5.Google ScholarGoogle Scholar
  52. Mingyi Hong, Wei-Cheng Liao, Ruoyu Sun, and Zhi-Quan Luo. 2016. Optimization algorithms for big data with application in wireless networks. In Big Data over Networks edited by Shuguang Cui, Alfred O. Hero, III, Zhi-Quan Luo, and José M. F. Moura. 66-100. Cambdrige University Press. https://doi.org/10.1017/CBO9781316162750.004Google ScholarGoogle Scholar
  53. Fu Lin, Makan Fardad and Mihailo R. Jovanovic. 2013. Design of Optimal Sparse Feedback Gains via the Alternating Direction Method of Multipliers. IEEE Transactions on Automatic Control, 58, 9 (September 2013). 2426-2431. https://doi.org/10.1109/TAC.2013.2257618.Google ScholarGoogle ScholarCross RefCross Ref
  54. Ali Ben Ammar. 2016. Query Optimization Techniques in Graph Databases. International Journal of Database Management System 8,4. https://doi.org/10.5121/ijdms.2016.840Google ScholarGoogle ScholarCross RefCross Ref
  55. Meng-Ju Hsieh, Li-Yung Ho, Jan-Jan Wu and Pangfeng Liu. 2017. Data partition optimisation for column-family NoSQL. International Journal of Big Data Intelligence 4,4. 263. https://doi.org/10.1504/IJBDI.2017.086962.Google ScholarGoogle ScholarCross RefCross Ref
  56. Nikos Ntarmos, Ioannis Patlakas and Peter Triantafillow. 2014. Rank Join Queries in NoSQL databases. In the Proceedings of the VLDB Endowment 7,7. 493-504. https://doi.org/10.14778/2732286.2732287.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Zhou-Chen Lin. 2020. How can machine learning and optimizations help each other better. Journal of the Operations Research Society of China (Springer) 8. 341-351. https://doi.org/10.1007/s40305-019-00285-6.Google ScholarGoogle Scholar
  58. Sun Shiliang, Zehui Cao, Han Zhu and Jing Zhao. 2019. "A Survey of Optimization Methods from a Machine Learning Perspective." IEEE Transactions on Cybernetics 50: 3668-3681. https://doi.org/10.1109/TCYB.2019.2950779.Google ScholarGoogle Scholar
  59. El-Ghazali Talbi. 2020. Machine learning into metaheuristics: A survey and taxonomy of data-driven metaheuristics», ACM Computing Surveys, accepted 2020.Google ScholarGoogle Scholar
  60. Kristin P. Bennett and Emilio Parrado-Hernandez. 2006. The interplay of optimization and Machine Learning Research. Journal of Machine Learning Research 7, 46. 1265-1281.Google ScholarGoogle Scholar
  61. Derya Soydaner. 2020. A Comparison of Optimization Algorithms for Deep Learning. International Journal of Pattern Recognition and Artificial Intelligence.34, 13. https://doi.org/10.1142/S0218001420520138Google ScholarGoogle ScholarCross RefCross Ref
  62. Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph M. Hellerstein, and Ion Stoica. 2019. Learning to Optimize Join Queries with Deep Reinforcement Learning. arXiv:1808.03196v2. Retrieved from https://arxiv.org/abs/1808.03196.Google ScholarGoogle Scholar
  63. Ryan Marcus and Olga Papaemmanouil. 2018. Deep Reinforcement Learning for Join Order Enumeration. In the 2018 Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management; arXiv:1803.00055. https://arxiv.org/abs/1803.00055Google ScholarGoogle Scholar
  64. Arun Kumar, Jeffrey Naughton and Jignesh M. Patel. 2015. Learning Generalized Linear Models Over Normalized Data. In the Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). Association for Computing Machinery, New York, NY, USA. 1969-1984. https://doi.org/10.1145/2723372.2723713.Google ScholarGoogle ScholarDigital LibraryDigital Library
  65. Matthias Boehm, Arun Kumar and Jun Yang. 2019. Data Management in Machine Learning Systems. Synthesis Lectures on Data Management. Morgan & Claypool Publishers.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    ICCDE '21: Proceedings of the 2021 7th International Conference on Computing and Data Engineering
    January 2021
    110 pages
    ISBN:9781450388450
    DOI:10.1145/3456172

    Copyright © 2021 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 6 August 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format