ABSTRACT
This lightning talk will focus on our experience of teaching a graduate level Big Data course. Traditionally, such courses have relied on "WordCount" style problems, which involve computing the simple count of words in a corpus of documents using the distributed MapReduce framework. While this is certainly a good way of introducing the students to the BigData framework, more real world examples are needed to motivate students. Further, since a majority of courses require students to work on a large project as part of this course, it is essential that they have access to a diverse and interesting set of data. In our course, we experimented with various data sources, such as text from real-time, streaming news articles, twitter feeds, and property price data from various zip codes in a county. The students were involved in gathering the data, designing and implementing MapReduce style algorithms for distributed processing, and presenting their findings. The feedback was extremely positive and we would like to develop this approach further. In this talk, we will present some ideas on how to collect and analyze real world datasets that are suitable for Big Data analysis. We would also encourage further inputs from the audience about this topic.
Index Terms
- Enhancing Teaching of Big Data by Using Real World Datasets
Recommendations
Developing Big Data Curriculum with Open Source Infrastructure (Abstract Only)
SIGCSE '17: Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science EducationThis lightning talk will focus on our experience of developing and managing large undergraduate and graduate Big Data courses. The demand for trained professionals in the field of Big Data technologies is huge, and there is urgent need to develop and ...
Comments