ABSTRACT
A database management system offers mechanisms that support data management, such as access control, integrity preservation, schema documentation, durability, etc. When datasets are stored in files, as is common in many data science projects, these functionalities need to be done directly by the data owner. This paper proposes ways to teach data management principles for such file-based settings, and we report on some of the challenges students have faced.
- Joshua Burridge and Alan D. Fekete. 2022. Teaching Programming for First Year Data Science. In ITiCSE ’22: Proceedings of the 2022 ACM Conference on Innovation and Technology in Computer Science Education.Google Scholar
- Alan D. Fekete, Judy Kay, and Uwe Röhm. 2021. A Data-centric Computing Curriculum for a Data Science Major. In SIGCSE ’21: The 52nd ACM Technical Symposium on Computer Science Education. 865–871.Google ScholarDigital Library
- Uwe Röhm, Lexi Brent, Tim Dawborn, and Bryn Jeffries. 2020. SQL for Data Scientists: Designing SQL Tutorials for Scalable Online Teaching. Proceedings of the VLDB (PVLDB) 13, 12 (2020), 2989–2992.Google ScholarDigital Library
- Teaching Data Management Concepts for Data in Files
Recommendations
Using data science to understand tape-based archive workloads
XSEDE '15: Proceedings of the 2015 XSEDE Conference: Scientific Advancements Enabled by Enhanced CyberinfrastructureData storage needs continue to grow in most fields, and the cost per byte for tape remains lower than the cost for disk, making tape storage a good candidate for cost-effective long-term storage. However, the workloads suitable for tape archives differ ...
Data Governance as Success Factor for Data Science
Responsible Design, Implementation and Use of Information and Communication TechnologyAbstractMore and more, asset management organizations are introducing data science initiatives to support predictive maintenance and anomaly detection. Asset management organizations are by nature data intensive to manage their assets like bridges, dykes, ...
A Taxonomy of Dirty Data
Today large corporations are constructing enterprise data warehouses from disparate data sources in order to run enterprise-wide data analysis applications, including decision support systems, multidimensional online analytical applications, data mining,...
Comments