skip to main content
10.1145/1963192.1963304acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
demonstration

OXPath: little language, little memory, great value

Published:28 March 2011Publication History

ABSTRACT

Data about everything is readily available on the web-but often only accessible through elaborate user interactions. For automated decision support, extracting that data is essential, but infeasible with existing heavy-weight data extraction systems. In this demonstration, we present OXPath, a novel approach to web extraction, with a system that supports informed job selection and integrates information from several different web sites. By carefully extending XPath, OXPath exploits its familiarity and provides a light-weight interface, which is easy to use and embed. We highlight how OXPath guarantees optimal page buffering, storing only a constant number of pages for non-recursive queries.

References

  1. A. Alba, V. Bhagwan, and T. Grandison. Accessing the deep web: when good ideas go bad. In OOPSLA, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Baumgartner, S. Flesca, and G. Gottlob. Visual web information extraction with Lixto. In VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. P. Bigham, A. C. Cavender, R. S. Kaminsky, C. M. Prince, and T. S. Robison. Transcendence: enabling a personal view of the deep web. In IUI, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Bolin, M. Webber, P. Rha, T. Wilson, and R. C. Miller. Automation and customization of rendered web pages. In UIST, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Marx. Conditional XPath. ACM Trans. Database Syst., 30(4), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. OXPath. http://www.diadem-project.info/oxpath.Google ScholarGoogle Scholar
  7. W. Shen, A. Doan, J. F. Naughton, and R. Ramakrishnan. Declarative information extraction using datalog with embedded extraction predicates. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. OXPath: little language, little memory, great value

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        WWW '11: Proceedings of the 20th international conference companion on World wide web
        March 2011
        552 pages
        ISBN:9781450306379
        DOI:10.1145/1963192

        Copyright © 2011 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 28 March 2011

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • demonstration

        Acceptance Rates

        Overall Acceptance Rate1,899of8,196submissions,23%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader