skip to main content
10.1145/3374135.3385284acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
research-article

Lightweight Automated Structure Inference and Binding of Data Sources to Predefined Data Types

Published: 25 May 2020 Publication History

Abstract

There is great variety and volume of data sources available on the Internet today. Software developers seeking to access such data and incorporate it into their programs have numerous libraries they can use to do so. However, in many contexts, such as rapid prototyping or lightweight scripting, these libraries are too heavyweight, requiring significant syntactic overhead or scaffolding in order to use them in a program. This paper presents a methodology (and implementations) to facilitate retrieval and dynamic binding of online data sources (in common formats such as XML, CSV, JSON, fixed-width, etc.) to programmer-defined data types with minimal syntactic overhead and no scaffolding. As such, the paper offers a particularly novel approach, among a continuum of existing techniques, towards enabling developers to access external data and convenient bind it to native data types in their programs.

References

[1]
[n.d.]. The Castor Project. https://castor.exolab.org/. December 2019.
[2]
[n.d.]. Java Architecture for XML Binding (JAXB). https://eclipse-ee4j.github.io/jaxb-ri/. December 2019.
[3]
[n.d.]. JSON-B: Java™API for JSON Binding. https://jcp.org/aboutJava/communityprocess/final/jsr367. December 2019.
[4]
[n.d.]. Processing. http://processing.org/overview/. December 2019.
[5]
[n.d.]. Simple API for XML (SAX). http://www.saxproject.org/. September 2014.
[6]
A. C. Bart, E. Tilevich, S. Hall, T. Allevato, and C. A. Shaffer. 2014. Transforming Introductory Computer Science Projects via Real-time Web Data. In Proceedings of the 45th ACM Technical Symposium on Computer Science Education (SIGCSE '14). ACM, New York, NY, USA, 289--294.
[7]
A. C. Bart, R. Whitcomb, D. Kafura, C. A. Shaffer, and E. Tilevich. 2017. Computing with CORGIS: Diverse, Real-world Datasets for Introductory Computing. In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (SIGCSE '17). ACM, New York, NY, USA, 57--62.
[8]
G.J. Bex, F. Neven, T. Schwentick, and K. Tuyls. 2006. Inference of Concise DTDs from XML Data. In Proceedings of the 32Nd International Conference on Very Large Data Bases (VLDB '06). VLDB Endowment, 115--126.
[9]
G. J. Bex, F. Neven, and S. Vansummeren. 2007. Inferring XML Schema Definitions from XML Data. In Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB '07). VLDB Endowment, 998--1009.
[10]
S. A. Carter and N. A. Hamid. 2018. Automated Inference of Fixed-width Data Formats. J. Comput. Sci. Coll. 34, 2 (Dec. 2018), 199--207.
[11]
K. Fisher and R. Gruber. 2005. PADS: A Domain-specific Language for Processing Ad Hoc Data. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '05). ACM, New York, NY, USA, 295--304.
[12]
K. Fisher, D. Walker, K. Q. Zhu, and P. White. 2008. From Dirt to Shovels: Fully Automatic Tool Generation from Ad Hoc Data. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '08). ACM, New York, NY, USA, 421--434.
[13]
M. Flatt and PLT. 2010. Reference:Racket. Technical Report PLT-TR-2010-1. PLT Design Inc. http://racket-lang.org/tr1/.
[14]
M. Garofalakis, A. Gionis, R. Rastogi, S. Seshadri, and K. Shim. 2003. XTRACT: Learning Document Type Descriptors from XML Document Collections. Data Min. Knowl. Discov. 7, 1 (Jan. 2003), 23--56.
[15]
N. A. Hamid. 2016. A Generic Framework for Engaging Online Data Sources in Introductory Programming Courses. In Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education (ITiCSE '16). ACM, New York, NY, USA, 136--141.
[16]
Microsoft Corporation. [n.d.]. Inferring an XML Schema. http://msdn.microsoft.com/en-us/library/b6kwb7fd(v=vs.110).aspx. December 2019.
[17]
T. Petricek, G. Guerra, and D. Syme. 2016. Types from Data: Making Structured Data First-class Citizens in F#.In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '16). ACM, New York, NY, USA, 477--490.
[18]
W3C DOM Interest Group. [n.d.]. Document Object Model (DOM). http://www.w3.org/DOM/. December 2019.
[19]
J. White, B. Kolpackov, B. Natarajan, and D. C. Schmidt. 2005. Reducing Application Code Complexity with Vocabulary-specific XML Language Bindings. In Proceedings of the 43rd Annual Southeast Regional Conference - Volume 2 (ACM-SE 43). ACM, New York, NY, USA, 281--287.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ACMSE '20: Proceedings of the 2020 ACM Southeast Conference
April 2020
337 pages
ISBN:9781450371056
DOI:10.1145/3374135
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Java
  2. Racket
  3. data binding
  4. schema inference
  5. software library

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ACM SE '20
Sponsor:
ACM SE '20: 2020 ACM Southeast Conference
April 2 - 4, 2020
FL, Tampa, USA

Acceptance Rates

Overall Acceptance Rate 502 of 1,023 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 79
    Total Downloads
  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media