Abstract
To enhance and grow the database certificate program and enhance database courses offered by the Computer Science/Information Technology department (CSIT) at the Community College of Baltimore County (CCBC), we are in the process of developing the EDNA Project. The EDNA Project- Extraction of Diverse Datasets and Analysis will be an educational database and project set encompassing several large real-world datasets that will be accessible to students in the database courses. This project was created out of the need for more elaborate examples and hands-on experience with database concepts. This repository will help serve to teach database concepts, data analytics, and lead to topics related to big data to students in these courses and certificates programs. In addition, this paper describes collaboration with the University of Baltimore and the beginnings of Project RED. This paper discusses our current work in developing this project.
Keywords
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
1 Introduction
Teaching fundamental database concepts is essential for many technology-related degree programs. Within the Computer Science/Information Technology department at the Community College of Baltimore County (CCBC) several database courses are offered each semester. These courses are offered as part of the Associates Degree in Information Technology and a database credit certificate. Specific course sequences are required depending on a student’s declared major or certificate program, but any student can take these courses if they have met the prerequisites. The focus of these required core database courses is to develop basic skills and understanding of database concepts to be relevant and applicable for those entering the workforce or transferring to a bachelor’s degree program.
The Community College of Baltimore County (CCBC) located in Maryland, is an integral part of Baltimore County, collaborating with government, business and local organizations to enrich the community. CCBC provides an accessible, affordable and high-quality education that prepares students for transfer and career success. In fiscal year 2017, the college educated 61,191 students, including 29,115 credit students and 33,247 non-credit students [1]. CCBC awarded 2,131 Associate degrees and 1,410 Certificates in 2017. There were 6,687 student transfers to other institutions [2]. CCBC offers 280 associate degree, credit certificate, and non-credit workforce training certification programs including 20 online degree and certificate options [3]. CCBC has the distinct honor of being the number one provider of workforce development in the Baltimore Metropolitan area. Ninety-two percent of the college’s graduates continue to live and work in the Greater Baltimore region. With three main campuses and three extension centers, CCBC serves a diverse population of Maryland residents.
The Computer Science/Information Technology (CSIT) department offers associate degrees in Information Technology, Computer Science, and Computer Science with a concentration in Information Systems Management as well as five credit-based certificates (Database, Information Management, Mobile Development, Programming, and Office Specialist). The database certificate requires taking 22 to 23 credits. Required courses include (1) Introduction to MIS, (2) Database Concepts, (3) Introduction to SQL using Oracle, (4) Emerging Database Design and the addition of two other Computer Science/Information Technology related electives. The CSIT department offers a total of five database courses, three of which are required for the database certificate (See Table 1). The other two classes are offered as elective options for students meeting the prerequisites but encouraged for those in the program (See Table 2).
The additional database related electives as noted in Table 2 are not the only option for students but are the only elective database focused courses. CSIT 256 is the second course in a two-course sequence and required as part of the Database option. Non-database courses that can be taken include Introduction to Information Assurance, Visual Basic programming, Introduction to Data Communications or Introduction to Linux/Unix. Each of database courses will be using the EDNA database as part of the ETL process (Extract, Transform and Load). The order of priority of the courses listed for the EDNA project are as follows: CSIT 154, CSIT 134, CSIT 156, CSIT 254 and CSIT 256. These will be discussed in more detail in the following sections. In the initial testing phase of the project CSIT 154 is the main course being explored to be used with EDNA. Additional components will be introduced into the other classes in subsequent semesters.
The certificate option is often a good choice for students that already have another degree or for those that need to enhance their knowledge for employment purposes. As noted by the CSIT department “The Database Certificate will prepare students for employment as entry-level database programmers and designers or provide current professionals with essential database programming and design skills” [4]. In the next section enhancements to the database curriculum through the EDNA project is discussed.
2 Developing the EDNA Project
To enhance and grow CCBC’s database certificate program and enhance database courses offered by the computer science/information technology department we are in the process of developing the EDNA Project. The EDNA Project-Extraction of Diverse Datasets and Analysis will be a student-centric educational database of large real-world datasets that will be accessible to students in the database courses. This project was created out of the need for more elaborate examples and hands-on experience with database concepts. This repository will help serve to teach database concepts, data analytics, and lead to topics related to big data to students in these courses and certificates programs. EDNA allows students to not only interact with these datasets for learning but provide the infrastructure for database configuration and data manipulation in a more substantial way than was currently available in the program. The datasets through EDNA will allow students to find relevant trends and explore the impacts of technology on society. Having this system in place will allow for an improved database program through real-world problems, increase collaboration, and enhance retention and completion rates. As part of the program, the Computer Science and Information Technology Department (CSIT) of the School of Technology, Art and Design (STAD) will develop this system and maintain the datasets, database, and server. This project as a resource will serve as a significant tool to improve the database certificate program, and all database courses offered through the Department.
Strategies for teaching database concepts are often employed using teaching strategies that are similar to teaching other technical courses where students remain as passive listeners [5]. While there are many factors to student success, there are many pedagogical approaches to make learning more meaningful and engaging where content is retained by students [6]. In the context of teaching database concepts, Connolly, Stansfield, and McLellan (2006), discuss the difficulty students have when there is not a “single, simple or well-known or correct solution” [7, p. 104]. Students also display difficulty when dealing with vagueness and complex database analysis [7]. We need these database projects as encompassing issues of vagueness and can be used to teach students strategies to work through these difficulties. To combat this issue and as a student engagement technique, EDNA infused projects will be collaborative in nature. From this preliminary implementation of EDNA, the following research questions will need to be addressed as the curriculum is designed and improved:
-
R1: What datasets will students find engaging yet beneficial to their understanding of core database concepts?
-
R2: What pedagogical approach is best for teaching using the EDNA framework?
-
R3: Does EDNA provide an improvement to the database curriculum?
Given the exploratory nature of this project, a pilot study will be conducted during the next academic year to examine these research questions in the context of the project framework. Currently, the EDNA server uses the Oracle Linux 6.0 Platform. Students will use the Oracle Business Intelligence Suite 2.820 in the initial phase of the project.
Major components to the overall design of EDNA:
-
1.
Student Dataset Resources
-
a.
Datasets maintained on Server for use in database courses.
-
b.
Projects for database courses aligned to datasets and configuration.
-
a.
-
2.
Database
-
a.
Small Database configured on required Server (PowerEdge T630) to hold student class resources.
-
b.
Provide means for students to learn introductory data analytics and database configuration and tools.
-
a.
3 Enhancing the Curriculum
The Overall objectives of the EDNA project are to enhance the database courses by improving curriculum materials and providing students access to real-world problems and projects. Providing students with real-world projects that include real or large datasets can be advantageous as they explore more complex topics related to databases including understanding the complexities of big data. To address the current gap in skills in preparing students to work with big data, educators should make learning real and relevant [8]. As a preliminary step to this project, several modules will be introduced into the database courses to both expand and align the curriculum in preparation of working with additional datasets. Students will work collaboratively in groups throughout the semester on several projects. Each project will be designed to highlight a particular database topic using the provided datasets. These projects will be in addition to the standard curriculum. Each module will have three major components: 1. Extract raw data, 2. Load relevant data into a database and 3. Examine the data using Business Intelligence tools for analysis (see Fig. 1).
Component 1 would require students to examine the listing of available datasets and choose the appropriate data for the assignment or project. However, before proceeding to Component 2, students need to carefully examine aspects of the data (Process A). What does the data represent? How should the data be organized? What are the attributes? Are there any preprocessing or cleansing steps needed? After this examination step for process A, students would need to prepare the data to be stored in their own database. This would include planning, analyzing the structure and creating various diagrams to illustrate and describe the data and its organization. Students would have then completed Component 2 (Load and Organize). Next, for Process B, students would practice a set of manipulations, queries, and tests on the database they have created. This is to both practice their skills and deepen their knowledge of the dataset. In Component 3, students would then spend time analyzing the data. This process will introduce students to necessary analysis tools, techniques, and terminology. Students can discover new trends about the data.
The list below illustrates some sample projects under development as part of the project:
-
Project 1
-
Students will be given access to a raw data dump of simulated or donated telephone records. They will need to ascertain which information is relevant for a given scenario and consider how additional information can be derived.
-
-
Project 2
-
Students will use graphic files to demonstrate how information can be requested by characteristics other than file name or structure. Students can learn other categorization techniques.
-
-
Project 3
-
Students can utilize social media imports from public posts to examine connections between accounts and other attributes.
-
In addition to examining different types of data, students can be introduced to basic data collection methodologies, ethics, security, data management, and content rights. Although the scope and level of the course may limit the depth of coverage, students can be introduced to general concepts at this level through various projects. The intent of the projects are to be both relevant and exciting to the students while developing particular skillsets.
3.1 EDNA Major Components
In the efforts to enhance the curriculum, several components are underway for the pilot. Based on feedback gained during the initial phases, we intend to make improvements and add to the projects and datasets available. The following steps are being implemented for the preliminary EDNA project:
-
Courses identified for EDNA project trial
-
Initial design of several project modules for the course
-
Develop training for students to connecting to server and datasets
-
Design of dataset exploration component
-
Design a data cleaning component
-
Design or project specification report for student assignments
-
Student performance and assessment criteria
-
Implementation of a short survey to collect student feedback
4 Datasets
There are a growing number of available resources and open datasets that students can use for various analyses. Currently, we are working with several available sources, including the creation of our own large collection. To list a few sample datasets under exploration, include: The Stanford Dog Dataset [9], SMS Spam Collection [10], Social Structure of Facebook Networks Data Scrape [11], 2012 data from the Global Ensemble Forecast System [12], 1996 through 2017 College Scorecard Data [13] and several more. The aim is to have an array of diverse data. Each type of dataset can yield interesting project assignments and interpretations. For example, the images stored in the Stanford Dog Dataset could be used to examine metadata, storage issues for images versus text, or used to explore topics of image classification. These datasets can be fully downloaded. In addition, there are public datasets from government sources also can provide a rich set of content that can be revised for course content. Some are provided by web-based portals like Open Baltimore where there are numerous reports and data available including web-based filters and basic visualization tools [14]. Example datasets include information related to crime, health care, taxation, parking, cultural interests, and much more. Learning and knowledge analytic research by [15] describe several dataset properties and resources.
There are also several concurrent projects underway where students can capture data to create large datasets to become part of the EDNA project. One example is through a group project in an introductory Python programming course, where students have been working on creating a small sensor network to collect temperature, air pressure, and air quality data. This data is being measured over various time increments and recorded to an external server. This data can be changed and enhanced through scripts and stored as part of the EDNA database with ever-increasing amounts of and granularity of data. Currently, this data is being used by students to test a machine learning algorithm and a web-based affective computing-based display. This project is just one example of a student-driven project. In the future, we intend to increasingly use student created content for datasets as they can contribute towards the project.
Another example is data extracted from faculty-driven projects using public social media and search engine data to contribute to the datasets. For text analysis purposes students can make use data collected from the web crawling component of PAsSIVE (Personalized Assisted Search in a Virtual Environment) as it crawls web content for a fixed depth based on a set of starting seed links [16]. As elements used in PAsSIVE, [17] describe several text features that can be obtained from crawled pages as part of the indexing strategy: page content, page descriptions, hyperlink structure, hyperlink text, keywords or meta tags, page title, text with different font, and the first sentence. Additional diverse content can also be obtained through public social media posts through SADD (Social Media Agent for the Detection of the Deceased) where large text content, with link structure and time stamps, can be evaluated [18]. Content generated by SADD has generally been complex providing a real dataset and problem for students. This data will be particularly useful for students wanting to learn and work with sentiment analysis.
5 Collaborations - Project RED
Recent agreements between CCBC and the University of Baltimore (UB) have made it possible for students working on Associates Degrees in Computer Science or Information Technology to complete their 2-year degree at the community college, and then continue to earn their Bachelor’s in Applied Information Technology at UB. Such a pathway is essential not only to ensure that students complete their undergraduate studies in a timely manner but also to give them options in terms of 4-year degree-granting institutions in the area.
The collaboration goes beyond an administrative agreement alone, as students should get some continuity in terms of educational resources and expectations. In particular, the EDNA Project dovetails into a new initiative at UB, where faculty members are collecting databases and datasets particularly suited for educational purposes, creating RED, the Repository of Educational Datasets. The primary intent of this project is to enable students to easily access resources that may not necessarily be designed to be used in a database course, such as datasets made available by open access data initiatives.
Typically, datasets are single-entity schemas that contain significant information about a certain topic. Although this is extremely important in terms of transparency to the public from governmental institutions, for example, these individual datasets are generally not very useful in a SQL-based course. A dataset such as the “Minority and Women’s Business Enterprises Certifications” (MBE/WBE) contains a single table with 25 attributes and 1,835 records [19]. This limits the usefulness of this dataset to queries that utilize a single table.
When we then wish to utilize another dataset from the same repository, such as the “Real Property Tax” dataset (RPT), which also contains a single table with 16 attributes and over 239,000 records, we run into different types of issues in terms of data compatibility [20]. For example, the address in the first dataset is reported as a single field, whereas the second dataset contains information stored with a different format and in two fields, as reported in Table 3.
Although the data is conceptually the same, the two datasets cannot easily be integrated into a larger schema that can then allow students to perform queries using multiple tables. For this reason, we intend to utilize RED in multiple courses in order to integrate different datasets into schemas that are usable in projects and assignments. The classes that we will target are reported in Table 4 above.
The courses will contribute to RED using the model shown in Fig. 2. This approach will allow projects from all courses to contribute to the dataset repository while letting the students develop multiple skills. Similarly, to the EDNA Project, students in scripting courses will learn how to download data from online resources of sensors, adding to the repository. Students in database course can utilize the datasets for queries, as well as to create documentation describing them. Projects in advanced courses can then focus on the integration of datasets either through the direct manipulation of current datasets already in the repository (for example, ETL techniques and tools for advanced database courses), or reach out to external sources and augment any missing data (for example, using freely available API services in advanced scripting courses).
6 Conclusion and Future Work
There are numerous aspects of this project that need additional investigation and continued improvement. After the initial EDNA pilot, several semesters of enhancements are planned. It will take some time to gather and test multiple datasets to build a large assortment of projects that can be used in different ways each semester. We also intend to examine other data analytic tools and investigate what strategies are most helpful at an introductory level.
To measure the impact of EDNA, we plan to assess the course by comparing to a control group by using the online version of database courses. Also, several student surveys will be administered to collect feedback on course design, dataset interest, and perceived helpfulness. This would be in addition to grade assessment and feedback from our external advisory board. With increased collaboration with the University of Baltimore, there are numerous opportunities to examine best practices in teaching database topics, designing new projects and creating fascinating datasets. In the future, we intend to connect together several projects to enhance and augment several of the data sources. Students may also develop an interest to pursue a more advanced degree in these related fields.
In an effort to increase visibility at CCBC, we intend to host several small hands-on workshop activities or demonstrations at the College’s Pathway events. There are six Pathways where students are grouped by their declared major in an effort to target advertisement of resources, student assistance, collaborations, and to foster student-faculty interaction. It is our goal to advertise the usefulness and influence of data in the real world to students to gain interest in the program. There are several semester events that could provide increased visibility for the database courses and various technology majors through these EDNA workshops.
Hands-on learning using interesting and real-world problems can be used as a powerful tool to teach database concepts. As data is continuing to play a large role in our daily lives and our decision-making techniques, students need these essential technical skills. It is through projects like EDNA and RED that we can create diverse problems to engage students and encourage faculty to collaborate and work together to solve today’s data problems.
References
CCBC Office of Planning, Research and Evaluation. CCBC Fact Book. http://www.ccbcmd.edu/~/media/CCBC/About%20CCBC/Administrative%20Offices/PRE/ccbc_factbook.ashx
CCBC Quick Facts. http://www.ccbcmd.edu/About-CCBC/Administrative-Offices/Administrative-Services/Planning-Research-and-Evaluation/CCBC-Facts.aspx
CCBC Program and Courses. http://www.ccbcmd.edu/programs-and-courses
CCBC Database Certificate, Credit Certificate. http://www.ccbcmd.edu/Programs-and-Courses-Finder/program/database-certificate
Mohtashami, M., Scher, J.M.: Application of Bloom’s cognitive domain taxonomy to database design. In: Proceedings of ISECON (Information Systems Educators Conference) (2000)
Barkley, E.F.: Student Engagement Techniques: A Handbook for College Faculty. Wiley, New York (2009)
Connolly, T.M., Stansfield, M., McLellan, E.: Using an online games-based learning approach to teach database design concepts. Electron. J. e-Learn. 4(1), 103–110 (2006)
Henry, R., Venkatraman, S.: Big data analytics the next big learning opportunity. Acad. Inf. Manage. Sci. J. 18(2), 17 (2015)
Khosla, A., Jayadevaprakash, N., Yao, B., Fei-Fei, L.: Novel dataset for fine-grained image categorization. In: First Workshop on Fine-Grained Visual Categorization (FGVC), IEEE Conference on Computer Vision and Pattern Recognition (2011)
Almeida, T.A., Gómez Hidalgo, J.M., Yamakami, A.: Contributions to the study of SMS spam filtering: new collection and results. In: Proceedings of the 2011 ACM Symposium on Document Engineering, Mountain View, CA, USA (2011)
Traud, A.L., Mucha, P.J., Porter, M.A.: Social structure of Facebook networks. Physica A Stat. Mech. Appl. 391(16), 4165–4180 (2012)
National Center for Environmental Information. Global Ensemble Forecast System (GEFS). https://www.ncdc.noaa.gov/data-access/model-data/model-datasets/global-ensemble-forecast-system-gefs
United States Department of Education. College Scorecard Data. https://collegescorecard.ed.gov/data/
Open Baltimore. https://data.baltimorecity.gov/
Verbert, K., Manouselis, N., Drachsler, H., Duval, E.: Dataset-driven research to support learning and knowledge analytics. J. Educ. Technol. Soc. 15(3), 133–148 (2012)
Braman, J., Dierbach, C.: Utilizing virtual worlds for personalized search: developing the PAsSIVE framework. In: Meiselwitz, G. (ed.) SCSM 2015. LNCS, vol. 9182, pp. 3–11. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20367-6_1
Hu, W., Yeh, J.: World wide web search engines. In: Si, S.N., Murthy, V.K. (eds.) Architectural Issues of Web-Enabled Electronic Business. Idea Group Publishing, Hershey (2003)
Braman, J., Dudley, A., Vincenti, G.: Designing SADD: a social media agent for the detection of the deceased. In: Meiselwitz, G. (ed.) SCSM 2018. LNCS, vol. 10914, pp. 345–356. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91485-5_26
Minority and women’s business enterprises certifications. https://data.baltimorecity.gov/City-Services/Minority-and-Women-s-Business-Enterprises-Certific/us2p-bijb
Real property taxes. https://data.baltimorecity.gov/Financial/Real-Property-Taxes/27w9-urtv
Acknowledgments
The authors would like to acknowledge funding from the Maryland State Department of Education CTE Reserve Fund to support the EDNA project.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tavegia, S., Braman, J., Vincenti, G., Yancy, B. (2019). Enhancing Database Courses Through the EDNA Project: A Preliminary Framework for the Extraction of Diverse Datasets and Analysis. In: Meiselwitz, G. (eds) Social Computing and Social Media. Communication and Social Communities. HCII 2019. Lecture Notes in Computer Science(), vol 11579. Springer, Cham. https://doi.org/10.1007/978-3-030-21905-5_18
Download citation
DOI: https://doi.org/10.1007/978-3-030-21905-5_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-21904-8
Online ISBN: 978-3-030-21905-5
eBook Packages: Computer ScienceComputer Science (R0)