skip to main content
10.1145/584931.584940acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

A framework for web table mining

Published:08 November 2002Publication History

ABSTRACT

Web table mining is about information extraction from tables published inside web pages as HTML texts. Most previous work on this subject makes use of the tags to discover components of the table. Our work treats web as a distinct publication media, in two ways. We argue that new types of table format have been developed specially for the web. We also argue that the visual cues embedded within the HTML text, are utilized by the authors to direct the viewer on how to read the contents contained a web table properly. We develop a framework for comprehensively analyzing the structural aspects of a web table, within which rules are devised to process and extract attribute-value pairs from the table. This approach to web table mining is validated by good experimental results.

References

  1. H. Chen, S. Tsai, and J. Tasi, Ming Tables from Large Scale HTML Texts, In Proc. 18th International Conference on Computational Linguistics, Saabrucken, Germany, July 2000]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Hurst, Classifying TABLE Elements in HTML. In Poster, 11th International World Wide Web Conference, Honolulu, HI, May 2002, http://www2002.org/CDROM/poster/115/index.html]]Google ScholarGoogle Scholar
  3. H. Ng, C. Lim, and J. Koo, Learning to Recognize Tables in Free Text, In Proc. of the 37th Annual Meeting of ACL, 1999, p. 443--450.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Y. Wang, and J. Hu, A Machine Learning Based Approach for Table Detection on The Web, In Proc. 11th International World Wide Web Conference, Honolulu, HI, May 2002, pp. 242--250]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. X. Wang, and D. Wood, A Conceptual Model for Tables, In Proc. of Principles of Digital Document Processing, 4th International Workshop, PODDP'98, Saint Malo, France, March, 1998, Lecture Notes in Computer Science, Vol. 1481]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Y. Yang, Web Table Mining and Database Discovery, M.Sc. thesis, Simon Fraser University, August, 2002]]Google ScholarGoogle Scholar
  7. M. Yoshida, K. Torisawa, and J. Tsujji, A Method to Integrate Tables of the World Wide Web, In Proc. 1st International Workshop on Web Document Analysis, Seattle, WA, USA, September 2001, pp. 31--34]]Google ScholarGoogle Scholar

Index Terms

  1. A framework for web table mining

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        WIDM '02: Proceedings of the 4th international workshop on Web information and data management
        November 2002
        116 pages
        ISBN:1581135939
        DOI:10.1145/584931
        • Program Chairs:
        • Roger Chiang,
        • Ee-Peng Lim

        Copyright © 2002 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 8 November 2002

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader