ABSTRACT
Data Management can be defined as the process of extracting, storing, organizing, and maintaining the data created and collected in organizations. Today's organizations invest in data management solutions that provide an efficient way to manage data in a unified structure. The enormously growth of data in the last decades has created a necessity for the fast extracting, accessing, and processing of the data. Optimization has been a key component in improving the system's performance, searching and accessing data in different data management solutions. Optimization is a mathematical discipline that formulates mathematical models and finds the best solution among a set of feasible solutions. This paper aims to give a general overview of applications of optimization techniques and algorithms in different areas of data management in the last decades. Data management includes a large group of functionalities, but we will focus on studying and reviewing the recent development of optimization algorithms used in databases, data warehouses, big data and machine learning. Furthermore, this paper will identify applications of optimization in data management, reviews the current solutions proposed and emphasize future topics where there is a lack of studies in data management.
- Global Data Management Community. 2017. DAMA-DMBOK, Data Management Body of Knowledge, 2nd edition. Technics Publications, New Jersey, USAGoogle Scholar
- Andreas Antiniou and Wu-Sheng Lu. 2007. The Optimization Problem. In: Practical Optimization. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-71107-2_1Google Scholar
- Xin-She Yang. 2018. Optimization Techniques and Applications with Examples. John Wiley & Sons, Inc. Hoboken, NJ, USAGoogle Scholar
- Tansel Dokeroglu, Ender Sevinc, Tayfun Kucukyilmaz, and Ahmet Cosar. (2019). A survey on new generation metaheuristic algorithms, Computers & Industrial Engineering, Volume 137, 106040, ISSN 0360-8352, https://doi.org/10.1016/j.cie.2019.106040.Google ScholarCross Ref
- DB-BEST Technologies. Managing Data and Applications Anywhere. Retrieved November 13, 2020 from https://www.dbbest.com/technologies/azure-synapse-analytics/Google Scholar
- Edgar F. Codd. 1970. A Relational Model of Data for Large Shared Data Banks. Communications of the ACM 13, 6. 377-387. https://doi.org/10.1145/362384.362685Google ScholarDigital Library
- Peter Lake and Paul Crowther. 2013. Database Performance. In: Concise Guide to Databases. Undergraduate Topics in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-5601-7_11Google Scholar
- Jeang-Kuo Chen and Wei-Zhe Lee. 2019. An Introduction of NoSQL Databases Based on Their Categories and Applications Industries. Algorithms 12, 106. 1-17. https://doi.org/10.3390/a12050106.Google ScholarCross Ref
- Sahatqija, Kosovare, Jaumin Ajdari, Xhemal Zenuni, Bujar Raufi, and Florije Ismaili. 2018. Comparison between relational and NOSQL databases. In Proceedings of the 41st International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO). Opatija, Croatia. 0216-0221. https://doi.org/10.23919/MIPRO.2018.8400041.Google Scholar
- William H. Inmon. 2005. Building the Data Warehouse, 4th ed. John Wiley & Sons, Inc, IN, USAGoogle Scholar
- Ralph Kimball and Margy Ross. 2013. The data warehouse toolkit: The complete guide to dimensional modeling. John Wiley & Sons.Google ScholarDigital Library
- Ishwarappa and J Anuradha. 2015. A brief introduction on big data 5Vs characteristics and Hadoop technology. Procedia Computer Science, 48. 319–324. https://doi.org/10.1016/j.procs.2015.04.188Google ScholarCross Ref
- Krish Krishnan. 2013. Introducing Big Data Technologies. In: Data Warehousing in the age of Big Data, 45-99. Elsevier Inc. Waltham, MA, USAGoogle Scholar
- Martin Strohbach, Jorg Daubert, Herman Ravkin, and Mario Lischka. 2016. Big Data Storage. In: Cavanillas J., Curry E., Wahlster W. (eds) New Horizons for a Data-Driven Economy. Springer, Cham. https://doi.org/10.1007/978-3-319-21569-3_7Google Scholar
- Gourav Bathla, Rinkle Rani, and Himanshu Aggarwal. 2018. Comparative study of NoSQL databases for big data storage. International Journal of Engineering & Technology, 7(March 2018), 2-6: 83-87. https://doi.org/10.14419/ijet.v7i2.6.10072.Google ScholarCross Ref
- Morefield Communications.On-Premises Vs Cloud. Retrieved November 14, 2020 from https://www.morefield.com/blog/on-premises-vs-cloud/Google Scholar
- A. L. Samuel. 1959. Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3, 3: 210-229. https://doi.org/10.1147/rd.33.0210.Google ScholarDigital Library
- Junfei Qiu, Qihui Wu, Guoru Ding, Yuhua Xu and Shuo Feng. 2016. A survey of machine learning for data processing. EURASIP Journal on Advances in Signal Processing, 67. https://doi.org/10.1186/s13634-016-0355-x.Google Scholar
- Khadija Aziz, Dounia Zaidouni, and Mostafa Bellafkih. 2018. Big Data Processing using Machine Learning algorithms: MLlib and Mahout Use Case. In Proceedings of the 12th International Conference on Intelligent Systems: Theories and Applications (SITA'18). New York: Association for Computing Machinery. Article 25, 1-6. https://doi.org/10.1145/3289402.3289525.Google ScholarDigital Library
- Yannis E. Ioannidis. 1996. Query Optimization. ACM Computing Surveys 28, 1, 121-123. https://doi.org/10.1145/234313.234367.Google ScholarDigital Library
- Surajit Chaudhuri. 1998. An overview of query optimization in relational systems. In Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems (PODS '98). Association for Computing Machinery, New York, NY, USA, 34–43. https://doi.org/10.1145/275487.275492Google ScholarDigital Library
- M. M. Astrahan, M. W. Blasgen, D. D. Chamberlin, K. P. Eswaran, J. N. Gray, P. P. Griffiths, W. F. King, R. A. Lorie, P. R. McJones, J. W. Mehl, G. R. Putzolu, I. L. Traiger, B. W. Wade, and V. Watson. 1976. System R: relational approach to database management. ACM Trans. Database Syst. 1, 2 (June 1976), 97–137. https://doi.org/10.1145/320455.320457Google ScholarDigital Library
- Matthias Jarke and Jurgen Koch. 1984. Query Optimization in Database Systems. ACM Comput. Surv. 16, 2 (June 1984), 111–152. https://doi.org/10.1145/356924.356928Google ScholarDigital Library
- Johann Christoph Freytag. 1989. The Basic Principles of Query Optimization in Relational Database Management Systems. In Proceedings of the IFIP 11th World Computer Congress. San Francisko, USA.Google Scholar
- Tejy Johnson and S. K. Srivatsa. 2012. A Study on Optimization Techniques and Query Execution Operators That Enhances Query Performance. International Journal of Advanced Research in Computer Science 3, 3 (May-June 2012). 228-233.Google Scholar
- Surajit Chaudhuri. 2009. Query optimizers: time to rethink the contract? In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data (SIGMOD '09). Association for Computing Machinery, New York, NY, USA, 961–968. https://doi.org/10.1145/1559845.1559955Google ScholarDigital Library
- Saurabh Gupta, Gopal Singh Tandel and Umashankar Pandey. 2015. A Survey on Query Processing and Optimization in Relational Database Management System. International Journal of Latest Trends in Engineering and Technology (IJLTET) 5, 1 (January 2015). 439-445. ISSN: 2278-621XGoogle Scholar
- Ankush Sawarkar and Jaishri Mahesh Waghmare. 2014. Query Processing and Query Optimization in Distributed Database; A Survey. In Proceedings of the International Coneference on Computer Science, Electronics and Communication Engineering(ICCECE). Pune, India.Google Scholar
- Patricia P. Griffiths and Bradford W. Wade. 1976. An authorization mechanism for a relational database system. ACM Trans. Database Syst. 1, 3 (Sept. 1976), 242–255. https://doi.org/10.1145/320473.320482Google ScholarDigital Library
- Manuel Reimer. 1983. Solving the Phantom Problem by Predicative Optimistic Concurrency Control. In Proceedings of the 9th International Conference on Very Large Data Bases (VLDB '83). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 81–88.Google ScholarDigital Library
- Michael Stonebraker. 1975. Implementation of integrity constraints and views by query modification. In Proceedings of the 1975 ACM SIGMOD international conference on Management of data (SIGMOD '75:). New York, NY, USA: Association for Computing Machinery. 65-78. https://doi.org/10.1145/500080.500091Google ScholarDigital Library
- Amol Deshpande, Zachary Ives and Vijayshankar Raman. 2007. Adaptive Query Processing. Foundations and Trends® in Databases 1 ,1. 1-140. http://dx.doi.org/10.1561/1900000001Google Scholar
- Ramalingam Gomathi and Dhandapani Sharmila. 2014. A Novel Adaptive Cuckoo Search for Optimal Query Plan Generation. The Scientific World Journal. https://doi.org/10.1155/2014/727658.Google Scholar
- Mukul Joshi and Praveen Ranjan Srivastava. 2013. Query Optimization: An Intelligent Hybrid Approach using Cuckoo and Tabu Search. International Journal of Intelligent Information Technologies (IJIIT) 9 ,1. 1-16. https://doi.org/10.4018/jiit.2013010103.Google Scholar
- Davies Segera, Mwangi Mbuthia and Abraham Nyete. 2020. An Innovative Excited-ACS-IDGWO Algorithm for Optimal Biomedical Data Feature Selection. BioMed Research International 1-17. https://doi.org/10.1155/2020/8506365.Google ScholarCross Ref
- Ladan Golshanara, Seyed Mohammad Taghi Rouhani Rankoohi and Hamed Shah-Hosseini. 2014. A multi-colony ant algorithm for optimizing join queries in distributed database systems. Knowledge and Information Systems 39, 175-206. https://doi.org/10.1007/s10115-012-0608-4.Google ScholarDigital Library
- Chintal Upendra Raval and Kaushal Madhu. 2015. A Review of Algorithms for the Join Ordering Problem in Relational Database Systems. International Journal of Engineering Development and Research (IJEDR) 3, 4, 733-743. ISSN: 2321-9939Google Scholar
- S. Vellev. 2009. Review of Algorithms for the Join Ordering Problems in Database Query Optimization. Information Technologies and Control, 1, 32-40.Google Scholar
- Mohammad Alamery, Ahmad Faraahi, H. Haj Seyyed Javadi, Sadegh Nourossana and Hossein Erfani . (2010) Multi-Join Query Optimization Using the Bees Algorithm. In: de Leon F. de Carvalho A.P., Rodríguez-González S., De Paz Santana J.F., Rodríguez J.M.C. (eds) Distributed Computing and Artificial Intelligence. Advances in Intelligent and Soft Computing, vol 79. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14883-5_58Google Scholar
- Dussan Petkovic. 2010. Comparison of Different Solutions for Solving the Optimization Problem of Large Join Queries. In Proceedings of the 2010 Second International Conference on Advances in Databases, Knowledge, and Data Applications, IEEE Computer Society NW Washington, DC, USA. 51-55. https://doi.org/10.1109/DBKDA.2010.1.Google ScholarDigital Library
- Swati V. Chande and Madhavi Sinha. 2011. Genetic Optimization for the Join Ordering Problem of Database Queries. In Proceedings of the 2011 Annual IEEE India Conference. Hyderabad, India. 1-5. https://doi.org/10.1109/INDCON.2011.6139336.Google Scholar
- Shyam Padia, Sushant Khulge, Akhilesh Gupta and Path Khadilikar. 2015. Query Optimization Strategies in Distributed Databases. International Journal of Computer Science and Information Technologies (IJCSIT) 6, 5. 4228-4234. ISSN:0975-9646.Google Scholar
- Lin Xue. 2009. Query optimization strategies and implementation based on distributed database. In Proceedings of the 2009 2nd IEEE. International Conference on Computer Science and Information Technology, IEEE, 480-484. https://doi.org/10.1109/ICCSIT.2009.523450Google ScholarCross Ref
- Ladjel Bellatreche. 2009. Optimization and Tuning in Data Warehouse. Encyclopedia of Database Systems, 1995-2003. https://doi.org/10.1007/978-0-387-39940-9_259.Google Scholar
- Cesar A. Galindo-Legaria, Torsten Grabs, Sreenivas Gukal, Steve Herbert, Aleksandras Surna, Shirley Wang, Wei Yu, Peter Zabback, and Shin Zhang. 2008. Optimizing Star Join Queries for Data Warehousing in Microsoft SQL Server. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering (ICDE '08) Cancun. https://doi.ieeecomputersociety.org/10.1109/ICDE.2008.4497528Google ScholarDigital Library
- Vasiliki Tziovara, Panos Vassiliadis and Alkis Simitsis. 2007. Deciding the physical implementation of ETL workflows. In Proceedings of the ACM tenth international workshop on Data warehousing and OLAP (DOLAP '07). 49-56. https://doi.org/10.1145/1317331.1317341.Google ScholarDigital Library
- Syed Muhammad Fawad Ali and Robert Wrembel. 2017. From conceptual design to performance optimization of ETL workflows: current state of research and open problems. The VLDB Journal 26, 777–801. https://doi.org/10.1007/s00778-017-0477-2.Google ScholarDigital Library
- Edward Hung, David W. Cheung, and Ben Kao. 2004. Optimization in data cube system design. Journal of Intelligent Information systems, 23. 17-45. https://doi.org/10.1023/B:JIIS.0000029669.16825.54Google ScholarDigital Library
- Jorge Loureiro, and Orlando Belo. 2006. A discrete particle swarm algorithm for OLAP data cube selection. In Proceedings of the Eighth International Conference on Enterprise Information Systems – DISI. 46-53. https://doi.org/10.5220/0002496000460053Google Scholar
- Roy Chandrima, Siddharth Swarup Rautaray and Manjusha Pandey. 2018. Big Data Optimization Techniques: A Survey. International Journal of Information Engineering and Electronic Business(IJIEEB), 10, 4. 41-48. https://doi.org/10.5815/ijieeb.2018.04.06.Google Scholar
- Singh Gill Sukhpal and Rajkumar Buyya. 2019. Bio-Inspired Algorithms for Big Data Analytics: A Survey, Taxonomy, and Open Challenges. In Big Data Analytics for Intelligent Healthcare Management, by Himansu Das, Bighnaraj Naik, Himansu Sekhar Behera, Nilanjan Dey, 1-17. Elsevier Inc. https://doi.org/10.1016/B978-0-12-818146-1.00001-5.Google Scholar
- Mingyi Hong, Wei-Cheng Liao, Ruoyu Sun, and Zhi-Quan Luo. 2016. Optimization algorithms for big data with application in wireless networks. In Big Data over Networks edited by Shuguang Cui, Alfred O. Hero, III, Zhi-Quan Luo, and José M. F. Moura. 66-100. Cambdrige University Press. https://doi.org/10.1017/CBO9781316162750.004Google Scholar
- Fu Lin, Makan Fardad and Mihailo R. Jovanovic. 2013. Design of Optimal Sparse Feedback Gains via the Alternating Direction Method of Multipliers. IEEE Transactions on Automatic Control, 58, 9 (September 2013). 2426-2431. https://doi.org/10.1109/TAC.2013.2257618.Google ScholarCross Ref
- Ali Ben Ammar. 2016. Query Optimization Techniques in Graph Databases. International Journal of Database Management System 8,4. https://doi.org/10.5121/ijdms.2016.840Google ScholarCross Ref
- Meng-Ju Hsieh, Li-Yung Ho, Jan-Jan Wu and Pangfeng Liu. 2017. Data partition optimisation for column-family NoSQL. International Journal of Big Data Intelligence 4,4. 263. https://doi.org/10.1504/IJBDI.2017.086962.Google ScholarCross Ref
- Nikos Ntarmos, Ioannis Patlakas and Peter Triantafillow. 2014. Rank Join Queries in NoSQL databases. In the Proceedings of the VLDB Endowment 7,7. 493-504. https://doi.org/10.14778/2732286.2732287.Google ScholarDigital Library
- Zhou-Chen Lin. 2020. How can machine learning and optimizations help each other better. Journal of the Operations Research Society of China (Springer) 8. 341-351. https://doi.org/10.1007/s40305-019-00285-6.Google Scholar
- Sun Shiliang, Zehui Cao, Han Zhu and Jing Zhao. 2019. "A Survey of Optimization Methods from a Machine Learning Perspective." IEEE Transactions on Cybernetics 50: 3668-3681. https://doi.org/10.1109/TCYB.2019.2950779.Google Scholar
- El-Ghazali Talbi. 2020. Machine learning into metaheuristics: A survey and taxonomy of data-driven metaheuristics», ACM Computing Surveys, accepted 2020.Google Scholar
- Kristin P. Bennett and Emilio Parrado-Hernandez. 2006. The interplay of optimization and Machine Learning Research. Journal of Machine Learning Research 7, 46. 1265-1281.Google Scholar
- Derya Soydaner. 2020. A Comparison of Optimization Algorithms for Deep Learning. International Journal of Pattern Recognition and Artificial Intelligence.34, 13. https://doi.org/10.1142/S0218001420520138Google ScholarCross Ref
- Sanjay Krishnan, Zongheng Yang, Ken Goldberg, Joseph M. Hellerstein, and Ion Stoica. 2019. Learning to Optimize Join Queries with Deep Reinforcement Learning. arXiv:1808.03196v2. Retrieved from https://arxiv.org/abs/1808.03196.Google Scholar
- Ryan Marcus and Olga Papaemmanouil. 2018. Deep Reinforcement Learning for Join Order Enumeration. In the 2018 Proceedings of the First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management; arXiv:1803.00055. https://arxiv.org/abs/1803.00055Google Scholar
- Arun Kumar, Jeffrey Naughton and Jignesh M. Patel. 2015. Learning Generalized Linear Models Over Normalized Data. In the Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (SIGMOD '15). Association for Computing Machinery, New York, NY, USA. 1969-1984. https://doi.org/10.1145/2723372.2723713.Google ScholarDigital Library
- Matthias Boehm, Arun Kumar and Jun Yang. 2019. Data Management in Machine Learning Systems. Synthesis Lectures on Data Management. Morgan & Claypool Publishers.Google Scholar
Recommendations
A survey of big data management
The rapid growth of emerging applications and the evolution of cloud computing technologies have significantly enhanced the capability to generate vast amounts of data. Thus, it has become a great challenge in this big data era to manage such voluminous ...
A Brief Survey on Big Data in Healthcare
This article presents a brief introduction to big data and big data analytics and also their roles in the healthcare system. A definite range of scientific researches about big data analytics in the healthcare system have been reviewed. The definition ...
Building the Enterprise Fabric for Big Data with Vertica and Spark Integration
SIGMOD '16: Proceedings of the 2016 International Conference on Management of DataEnterprise customers increasingly require greater flexibility in the way they access and process their Big Data while at the same time they continue to request advanced analytics and access to diverse data sources. Yet customers also still require the ...
Comments