Skip to main content

Advertisement

Log in

An exploratory teaching program in big data analysis for undergraduate students

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Many of the world’s biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing massive datasets. In this paper, exploratory teaching program is proposed. It provides a broad and practical introduction to big data analysis. This exploratory teaching program was designed and given in Department of Computer Engineering at Kocaeli University in the spring semester of 2018–2019. To assess the educational program’s impact on the learning process and to evaluate the acceptance and satisfaction level of students, they answered a questionnaire after finishing the program. According to students’ feedback, the exploratory teaching program is useful for learning how to analyze large datasets and identify patterns that will improve any company’s and organization decision-making process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Apache Hadoop (2011) http://hadoop.apache.org, Accessed 3 Jan 2020.

  2. Jupyter notebooks (2011) www.jupyter.org, Accessed 3 Jan 2020.

  3. Anaconda (2011) https://www.anaconda.com/, Accessed 3 Jan 2020.

  4. Countries of the World dataset (2018) https://www.kaggle.com/fernandol/countries-of-the-world, Accessed 5 Jan 2020.

  5. Gartner (2020) https://www.gartner.com/en, Accessed 5 Jan 2020.

  6. Gartner BI report (2020) https://www.gartner.com/reviews/market/analytics-business-intelligence-platforms, Accessed 5 Jan 2020.

  7. Black Friday dataset (2016) https://datahack.analyticsvidhya.com/contest/black-friday/, Accessed 5 Jan 2020.

  8. Heart Disease UCI dataset (2018) https://www.kaggle.com/ronitf/heart-disease-uci, Accessed 5 Jan 2020.

  9. House Prices dataset (2017) https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data, Accessed 5 Jan 2020.

  10. World University Rankings dataset (2019) https://www.kaggle.com/mylesoneill/world-university-rankings, Accessed 5 Jan 2020.

  11. Sensorless Drive Diagnosis dataset (2019) https://archive.ics.uci.edu/ml/datasets/Dataset+for+Sensorless+Drive+Diagnosis, Accessed 5 Jan 2020.

  12. Football World Cup 2018 dataset (2018) https://www.kaggle.com/sawya34/football-world-cup-2018-dataset, Accessed 5 Jan 2020.

  13. Hadoop Design Patterns (2012) https://github.com/adamjshook/mapreducepatterns, Accessed 6 Jan 2020.

  14. Apache Zeppelin (2015) https://zeppelin.apache.org/, Accessed 6 Jan 2020.

  15. Company Acquisitions dataset (2018) https://www.kaggle.com/shivamb/company-acquisitions-7-top-companies, Accessed 6 Jan 2020.

References

  • Aggarwal AK (2019) Opportunities and challenges of big data in public sector. In: Web services: concepts, methodologies, tools, and applications. IGI Global, pp 1749–1761

  • Batra R (2018) SQL primer: an accelerated introduction to SQL basics. Apress, New York

    Book  Google Scholar 

  • Bikakis N (2018) Big data visualization tools. In: arXiv:1801.08336

  • Bloom BS et al (1956) Taxonomy of educational objectives. Cognitive domain, vol 1. McKay, New York, pp 20–24

    Google Scholar 

  • Cattell R (2011) Scalable SQL and NoSQL data stores. Acm Sigmod Record 39(4):12–27

    Article  Google Scholar 

  • Chintapalli S et al (2016) Benchmarking streaming computation engines: storm, flink and spark streaming. In: 2016 IEEE international parallel and distributed processing symposium workshops (IPDPSW). IEEE, pp 1789–1792

  • Cuttone A, Sune L, Jakob EL (2016) geoplotlib: a python toolbox for visualizing geographical data. In: arXiv preprint arXiv:1608.01933

  • Der Walt SV, Colbert SC, Varoquaux G (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13(2):22

    Article  Google Scholar 

  • Doug H (2013) Big data analytics masters programs: 20 top programs. www.informationweek.com/big-data/slideshows/big-data-analytics/big-data-analytics-masters-degrees-20/240145673?pgno=1. Accessed 25 Jan 2020

  • Eken S (2019) Introduction to big data analysis course material. https://piazza.com/kocaeli_university/spring2019/blm442/resources. Accessed 25 Jan 2020

  • Embarak O (2018) Data visualization. Data analysis and visualization using Python. Springer, New York, pp 293–342

    Chapter  Google Scholar 

  • Even S (2011) Graph algorithms. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Fan W, Gordon MD (2014) The power of social media analytics. Commun Acm 57(6):74–81

    Article  Google Scholar 

  • Feigelson ED, Jogesh Babu G (2012) Big data in astronomy. Significance 9(4):22–25

    Article  Google Scholar 

  • Hashem IAT et al (2015) The rise of big data on cloud computing: review and open research issues. Inf Syst 47:98–115

    Article  Google Scholar 

  • Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90

    Article  Google Scholar 

  • Karau H et al (2015) Learning spark: lightning-fast big data analysis. O’Reilly Media Inc, Sebastopol

    Google Scholar 

  • Kluyver T et al. (2016) Jupyter Notebooks—a publishing format for reproducible computational work flows. In: ELPUB, pp 87–90

  • Lelouche R (2005) Exploratory and experimental learning? For teachers and researchers too! In: CELDA: conference on cognition and exploratory learning in digital age. IADIS: international association for development of information society, pp 167–174

  • Mahmood T, Uzma A (2013) Security analytics: big data analytics for cybersecurity: a review of trends, techniques and tools. In: 2013 2nd national conference on information assurance (NCIA). IEEE, pp 129–134

  • McAfee A et al (2012) Big data: the management revolution. Harvard Bus Rev 90(10):60–68

    Google Scholar 

  • McKinney W (2011) pandas: a foundational Python library for data analysis and statistics. Python High Perform Sci Comput 14(9):1–9

    Google Scholar 

  • McKinney W (2012) Python for data analysis: data wrangling with Pandas, NumPy, and IPython. O’Reilly Media Inc, Sebastopol

    Google Scholar 

  • Meng X et al (2016) Mllib: machine learning in apache spark. J Mach Learn Res 17(1):1235–1241

    MathSciNet  MATH  Google Scholar 

  • Miller JJ (2013) Graph database applications and concepts with Neo4j. In: Proceedings of the Southern Association for information systems conference, Atlanta, GA, USA, vol 2324, p S36

  • Miner D, Shook A (2012) MapReduce design patterns: building effective algorithms and analytics for Hadoop and other systems. O’Reilly Media Inc, Sebastopol

    Google Scholar 

  • Murray DG (2013) Tableau your data!: fast and easy visual analysis with tableau software. Wiley, New York

    Google Scholar 

  • Oussous A et al (2018) Big Data technologies: a survey. J King Saud Univ Comput Inf Sci 30(4):431–448

    Google Scholar 

  • Pedregosa F et al (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12 Oct:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Shannon K (2013) Data science programs on the increase at universities. www.dataversity.net/data-science-programs-on-the-increase-at-universities/. Accessed 25 Jan 2020

  • Shen H (2014) Interactive notebooks: sharing the code. Nat News 515(7525):151

    Article  Google Scholar 

  • Sigman BP et al (2014) Teaching big data: experiences, lessons learned, and future directions. Decis Line 45(1):10–15

    Google Scholar 

  • Staff DSD (2019) 20 Best data science bachelors degree programs 2019. https://www.datasciencedegreeprograms.net/rankings/data-science-bachelors/. Accessed 25 Jan 2020

  • Van Der Aalst W (2016) Data science in action. Process mining. Springer, New York, pp 3–23

    Chapter  Google Scholar 

  • Will M et al (2017) The Quant Crunch: how the demand for data science skills is disrupting the job market. https://www.ibm.com/downloads/cas/3RL3VXGA. Accessed 25 Jan 2020

  • Xin RS et al (2013) Shark: SQL and rich analytics at scale. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data. ACM, pp. 13–24

  • Yates RD, Goodman DJ (2014) Probability and stochastic processes: a friendly introduction for electrical and computer engineers. Wiley, New York

    MATH  Google Scholar 

  • Zaharia M et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65

    Article  Google Scholar 

  • Zudilova-Seinstra E, Adriaansen T, Van Liere R (2009) Overview of interactive visualisation. Trends in interactive visualization. Springer, New York, pp 3–15

    Chapter  Google Scholar 

Download references

Acknowledgements

I would like to thank GOSB Technology Manager Engin Işık for his support in the survey conducted with big data sector companies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Süleyman Eken.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Eken, S. An exploratory teaching program in big data analysis for undergraduate students. J Ambient Intell Human Comput 11, 4285–4304 (2020). https://doi.org/10.1007/s12652-020-02447-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-02447-4

Keywords

Navigation