skip to main content
10.1145/967900.968127acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Learning query languages of Web interfaces

Authors Info & Claims
Published:14 March 2004Publication History

ABSTRACT

This paper studies the problem of automatic acquisition of the query languages supported by a Web information resource. We describe a system that automatically probes the search interface of a resource with a set of test queries and analyses the returned pages to recognize supported query operators. The automatic acquisition assumes the availability of the number of matches the resource returns for a submitted query. The match numbers are used to train a learning system and to generate classification rules that recognize the query operators supported by a provider and their syntactic encodings. These classification rules are employed during the automatic probing of new providers to determine query operators they support. We report on results of experiments with a set of real Web resources.

References

  1. The Invisible Web, http://www.invisibleweb.com/.Google ScholarGoogle Scholar
  2. BrightPlanet, http://www.brightplanet.com/.Google ScholarGoogle Scholar
  3. CompletePlanet, http://www.completeplanet.com/.Google ScholarGoogle Scholar
  4. G. Alonso. Myths around web services. IEEE Bulletin on Data Engineering, 25(4):3--9, 2002.Google ScholarGoogle Scholar
  5. D. Angluin. Queries and concept learning. Machine Learning, 2(4):319--342, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. K. Bergman. The Deep Web: Surfacing hidden value. Journal of Electronic Publishing, 7(1), 2001.Google ScholarGoogle ScholarCross RefCross Ref
  7. D. Bredelet and B. Roustant. Java IWrap: Wrapper induction by grammar learning. Master's thesis, ENSIMAG Grenoble, 2000.Google ScholarGoogle Scholar
  8. S. Byers, J. Freire, and C. T. Silva. Efficient acquisition of web data through restricted query interfaces. In Proc. WWW Conf., China, May 2001.Google ScholarGoogle Scholar
  9. J. P. Callan, M. Connell, and A. Du. Automatic discovery of language models for text databases. In Proc. ACM SIGMOD Conf., pp. 479--490, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C.-C. K. Chang and H Garcia-Molina. Approximate query translation across heterogeneous information sources. In Proc. VLDB Conf., pp. 566--577, Cairo, Egypt, September 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C.-C. K. Chang, H. Garcia-Molina, and A. Paepcke. Boolean query mapping across heterogeneous information sources. IEEE TKDE, 8(4):515--521, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. B. Chidlovskii. Automatic repairing of web wrappers by combining redundant views. In Proc. of the IEEE Intern. Conf. Tools with AI, USA, November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. Gravano, H. Garcia-Molina, and A. Tomasic. Gloss: Text-source discovery over the internet. ACM TODS, 24(2):229--264, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. P. G. Ipeirotis and L. Gravano. Distributed search over the hidden web: Hierarchical database sampling and selection. In Proc. VLDB Conf., pp. 394--405, Hong Kong, China, August 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. G. Ipeirotis, L. Gravano, and M. Sahami. Probe, count, and classify: Categorizing hidden-web databases. In Proc. ACM SIGMOD Conf., pp. 67--78, Santa Barbara, CA, USA, May 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. Perkowitz, R. B. Doorenbos, O. Etzioni, and D. S. Weld. Learning to understand information on the internet: An example-based approach. Journal of Intelligent Information Systems, 8(2):133--153, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Raghavan and H. Garcia-Molina. Crawling the hidden web. In Proc. VLDB Conf., pp. 129--138, Rome, Italy, September 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Tsur. Are web services the next revolution in e-commerce? In Proc. VLDB Conf., pp. 614--617, Rome, Italy, September 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Wang, W. Meng, and C. Yu. Concept hierarchy based text database categorization. In Proc. Intern. WISE Conf., pp. 283--290, China, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Yerneni, C. Li, H. Garcia-Molina, and J. Ullman. Computing capabilities of mediators. In Proc. ACM SIGMOD Conf., pp. 443--454, PA, USA, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  1. Learning query languages of Web interfaces

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SAC '04: Proceedings of the 2004 ACM symposium on Applied computing
      March 2004
      1733 pages
      ISBN:1581138121
      DOI:10.1145/967900

      Copyright © 2004 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 March 2004

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,650of6,669submissions,25%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader