ABSTRACT
A significant challenge in handling geographic datasets is that the datasets can come from heterogeneous sources with various data qualities and formats. Before these datasets can be used in a Geographic Information System (GIS) for spatial analysis or to create maps, a typical task is to clean the attribute data and transform the data into a uniform format. However, conventional GIS products focus on manipulating the spatial component of geographic features and only offer basic tools for editing the attribute data (e.g., one row at a time). This limits the capability for handling large datasets in a GIS since manually editing and transforming attribute data between different formats is not practical for thousands of geographic features. In this demo, we present ArcKarma, which is built on our previous work on data transformation, to efficiently clean and transform data attributes in a GIS. ArcKarma generates transformation programs from a few user-provided examples and applies these programs to transform individual attribute columns into the desired formats. We show that ArcKarma produces accurate results and eliminates the need for laborious manual data cleaning and scripting tasks.
- A. Cypher, D. C. Haibert, D. Kurlander, H. Lieberman, D. Maulsby, B. A. Myers, and A. Turransky, editors. Watch what I do: programming by demonstration. MIT Press, 1993. Google ScholarDigital Library
- S. Gulwani. Automating string processing in spreadsheets using input-output examples. In POPL, pages 317--330, 2011. Google ScholarDigital Library
- S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: interactive visual specification of data transformation scripts. In CHI, pages 3363--3372, 2011. Google ScholarDigital Library
- T. Lau, S. A. Wolfman, P. Domingos, and D. S. Weld. Programming by demonstration using version space algebra. Mach. Learn., pages 111--156, 2003. Google ScholarDigital Library
- V. Raman and J. M. Hellerstein. Potter's wheel: An interactive data cleaning system. In VLDB, pages 381--390, 2001. Google ScholarDigital Library
- B. Wu, P. Szekely, and C. A. Knoblock. Minimizing user effort in transforming data by example. In IUI, pages 317--322, 2014. Google ScholarDigital Library
Index Terms
- A system for efficient cleaning and transformation of geospatial data attributes
Recommendations
Research and Design of Interactive Data Transformation and Migration System for Heterogeneous Data Sources
ICIE '09: Proceedings of the 2009 WASE International Conference on Information Engineering - Volume 01To solve the problems of data transformation and migration in heterogeneous environment, an interactive data transformation and migration method for heterogeneous data sources is proposed. The basic theory of data transformation and migration is ...
A Comparative Study of Data Cleaning Tools
In the information era, data is crucial in decision making. Most data sets contain impurities that need to be weeded out before any meaningful decision can be made from the data. Hence, data cleaning is essential and often takes more than 80 percent ...
Analysis of Data Extraction and Data Cleaning in Web Usage Mining
ICARCSET '15: Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering & Technology (ICARCSET 2015)Data preprocessing is considered as an important phase of Web usage mining due to unstructured, heterogeneous and noisy nature of log data. Complete and effective data preprocessing insures the efficiency and scalability of algorithms used in pattern ...
Comments