Abstract
Many of the world’s biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing massive datasets. In this paper, exploratory teaching program is proposed. It provides a broad and practical introduction to big data analysis. This exploratory teaching program was designed and given in Department of Computer Engineering at Kocaeli University in the spring semester of 2018–2019. To assess the educational program’s impact on the learning process and to evaluate the acceptance and satisfaction level of students, they answered a questionnaire after finishing the program. According to students’ feedback, the exploratory teaching program is useful for learning how to analyze large datasets and identify patterns that will improve any company’s and organization decision-making process.
Similar content being viewed by others
Notes
Apache Hadoop (2011) http://hadoop.apache.org, Accessed 3 Jan 2020.
Jupyter notebooks (2011) www.jupyter.org, Accessed 3 Jan 2020.
Anaconda (2011) https://www.anaconda.com/, Accessed 3 Jan 2020.
Countries of the World dataset (2018) https://www.kaggle.com/fernandol/countries-of-the-world, Accessed 5 Jan 2020.
Gartner (2020) https://www.gartner.com/en, Accessed 5 Jan 2020.
Gartner BI report (2020) https://www.gartner.com/reviews/market/analytics-business-intelligence-platforms, Accessed 5 Jan 2020.
Black Friday dataset (2016) https://datahack.analyticsvidhya.com/contest/black-friday/, Accessed 5 Jan 2020.
Heart Disease UCI dataset (2018) https://www.kaggle.com/ronitf/heart-disease-uci, Accessed 5 Jan 2020.
House Prices dataset (2017) https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data, Accessed 5 Jan 2020.
World University Rankings dataset (2019) https://www.kaggle.com/mylesoneill/world-university-rankings, Accessed 5 Jan 2020.
Sensorless Drive Diagnosis dataset (2019) https://archive.ics.uci.edu/ml/datasets/Dataset+for+Sensorless+Drive+Diagnosis, Accessed 5 Jan 2020.
Football World Cup 2018 dataset (2018) https://www.kaggle.com/sawya34/football-world-cup-2018-dataset, Accessed 5 Jan 2020.
Hadoop Design Patterns (2012) https://github.com/adamjshook/mapreducepatterns, Accessed 6 Jan 2020.
Apache Zeppelin (2015) https://zeppelin.apache.org/, Accessed 6 Jan 2020.
Company Acquisitions dataset (2018) https://www.kaggle.com/shivamb/company-acquisitions-7-top-companies, Accessed 6 Jan 2020.
References
Aggarwal AK (2019) Opportunities and challenges of big data in public sector. In: Web services: concepts, methodologies, tools, and applications. IGI Global, pp 1749–1761
Batra R (2018) SQL primer: an accelerated introduction to SQL basics. Apress, New York
Bikakis N (2018) Big data visualization tools. In: arXiv:1801.08336
Bloom BS et al (1956) Taxonomy of educational objectives. Cognitive domain, vol 1. McKay, New York, pp 20–24
Cattell R (2011) Scalable SQL and NoSQL data stores. Acm Sigmod Record 39(4):12–27
Chintapalli S et al (2016) Benchmarking streaming computation engines: storm, flink and spark streaming. In: 2016 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, pp 1789–1792
Cuttone A, Sune L, Jakob EL (2016) geoplotlib: a python toolbox for visualizing geographical data. In: arXiv preprint arXiv:1608.01933
Der Walt SV, Colbert SC, Varoquaux G (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13(2):22
Doug H (2013) Big data analytics masters programs: 20 top programs. www.informationweek.com/big-data/slideshows/big-data-analytics/big-data-analytics-masters-degrees-20/240145673?pgno=1. Accessed 25 Jan 2020
Eken S (2019) Introduction to big data analysis course material. https://piazza.com/kocaeli_university/spring2019/blm442/resources. Accessed 25 Jan 2020
Embarak O (2018) Data visualization. Data analysis and visualization using Python. Springer, New York, pp 293–342
Even S (2011) Graph algorithms. Cambridge University Press, Cambridge
Fan W, Gordon MD (2014) The power of social media analytics. Commun Acm 57(6):74–81
Feigelson ED, Jogesh Babu G (2012) Big data in astronomy. Significance 9(4):22–25
Hashem IAT et al (2015) The rise of big data on cloud computing: review and open research issues. Inf Syst 47:98–115
Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90
Karau H et al (2015) Learning spark: lightning-fast big data analysis. O’Reilly Media Inc, Sebastopol
Kluyver T et al. (2016) Jupyter Notebooks—a publishing format for reproducible computational work flows. In: ELPUB, pp 87–90
Lelouche R (2005) Exploratory and experimental learning? For teachers and researchers too! In: CELDA: conference on cognition and exploratory learning in digital age. IADIS: international association for development of information society, pp 167–174
Mahmood T, Uzma A (2013) Security analytics: big data analytics for cybersecurity: a review of trends, techniques and tools. In: 2013 2nd national conference on information assurance (NCIA). IEEE, pp 129–134
McAfee A et al (2012) Big data: the management revolution. Harvard Bus Rev 90(10):60–68
McKinney W (2011) pandas: a foundational Python library for data analysis and statistics. Python High Perform Sci Comput 14(9):1–9
McKinney W (2012) Python for data analysis: data wrangling with Pandas, NumPy, and IPython. O’Reilly Media Inc, Sebastopol
Meng X et al (2016) Mllib: machine learning in apache spark. J Mach Learn Res 17(1):1235–1241
Miller JJ (2013) Graph database applications and concepts with Neo4j. In: Proceedings of the Southern Association for information systems conference, Atlanta, GA, USA, vol 2324, p S36
Miner D, Shook A (2012) MapReduce design patterns: building effective algorithms and analytics for Hadoop and other systems. O’Reilly Media Inc, Sebastopol
Murray DG (2013) Tableau your data!: fast and easy visual analysis with tableau software. Wiley, New York
Oussous A et al (2018) Big Data technologies: a survey. J King Saud Univ Comput Inf Sci 30(4):431–448
Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12 Oct:2825–2830
Shannon K (2013) Data science programs on the increase at universities. www.dataversity.net/data-science-programs-on-the-increase-at-universities/. Accessed 25 Jan 2020
Shen H (2014) Interactive notebooks: sharing the code. Nat News 515(7525):151
Sigman BP et al (2014) Teaching big data: experiences, lessons learned, and future directions. Decis Line 45(1):10–15
Staff DSD (2019) 20 Best data science bachelors degree programs 2019. https://www.datasciencedegreeprograms.net/rankings/data-science-bachelors/. Accessed 25 Jan 2020
Van Der Aalst W (2016) Data science in action. Process mining. Springer, New York, pp 3–23
Will M et al (2017) The Quant Crunch: how the demand for data science skills is disrupting the job market. https://www.ibm.com/downloads/cas/3RL3VXGA. Accessed 25 Jan 2020
Xin RS et al (2013) Shark: SQL and rich analytics at scale. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data. ACM, pp. 13–24
Yates RD, Goodman DJ (2014) Probability and stochastic processes: a friendly introduction for electrical and computer engineers. Wiley, New York
Zaharia M et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65
Zudilova-Seinstra E, Adriaansen T, Van Liere R (2009) Overview of interactive visualisation. Trends in interactive visualization. Springer, New York, pp 3–15
Acknowledgements
I would like to thank GOSB Technology Manager Engin Işık for his support in the survey conducted with big data sector companies.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Eken, S. An exploratory teaching program in big data analysis for undergraduate students. J Ambient Intell Human Comput 11, 4285–4304 (2020). https://doi.org/10.1007/s12652-020-02447-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-02447-4