skip to main content
10.1145/2666310.2666373acmconferencesArticle/Chapter ViewAbstractPublication PagesgisConference Proceedingsconference-collections
research-article

A system for efficient cleaning and transformation of geospatial data attributes

Authors Info & Claims
Published:04 November 2014Publication History

ABSTRACT

A significant challenge in handling geographic datasets is that the datasets can come from heterogeneous sources with various data qualities and formats. Before these datasets can be used in a Geographic Information System (GIS) for spatial analysis or to create maps, a typical task is to clean the attribute data and transform the data into a uniform format. However, conventional GIS products focus on manipulating the spatial component of geographic features and only offer basic tools for editing the attribute data (e.g., one row at a time). This limits the capability for handling large datasets in a GIS since manually editing and transforming attribute data between different formats is not practical for thousands of geographic features. In this demo, we present ArcKarma, which is built on our previous work on data transformation, to efficiently clean and transform data attributes in a GIS. ArcKarma generates transformation programs from a few user-provided examples and applies these programs to transform individual attribute columns into the desired formats. We show that ArcKarma produces accurate results and eliminates the need for laborious manual data cleaning and scripting tasks.

References

  1. A. Cypher, D. C. Haibert, D. Kurlander, H. Lieberman, D. Maulsby, B. A. Myers, and A. Turransky, editors. Watch what I do: programming by demonstration. MIT Press, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Gulwani. Automating string processing in spreadsheets using input-output examples. In POPL, pages 317--330, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. S. Kandel, A. Paepcke, J. Hellerstein, and J. Heer. Wrangler: interactive visual specification of data transformation scripts. In CHI, pages 3363--3372, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. Lau, S. A. Wolfman, P. Domingos, and D. S. Weld. Programming by demonstration using version space algebra. Mach. Learn., pages 111--156, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. V. Raman and J. M. Hellerstein. Potter's wheel: An interactive data cleaning system. In VLDB, pages 381--390, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Wu, P. Szekely, and C. A. Knoblock. Minimizing user effort in transforming data by example. In IUI, pages 317--322, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A system for efficient cleaning and transformation of geospatial data attributes

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGSPATIAL '14: Proceedings of the 22nd ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems
        November 2014
        651 pages
        ISBN:9781450331319
        DOI:10.1145/2666310

        Copyright © 2014 Owner/Author

        Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 4 November 2014

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        SIGSPATIAL '14 Paper Acceptance Rate39of184submissions,21%Overall Acceptance Rate220of1,116submissions,20%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader