ABSTRACT
Web table mining is about information extraction from tables published inside web pages as HTML texts. Most previous work on this subject makes use of the tags to discover components of the table. Our work treats web as a distinct publication media, in two ways. We argue that new types of table format have been developed specially for the web. We also argue that the visual cues embedded within the HTML text, are utilized by the authors to direct the viewer on how to read the contents contained a web table properly. We develop a framework for comprehensively analyzing the structural aspects of a web table, within which rules are devised to process and extract attribute-value pairs from the table. This approach to web table mining is validated by good experimental results.
- H. Chen, S. Tsai, and J. Tasi, Ming Tables from Large Scale HTML Texts, In Proc. 18th International Conference on Computational Linguistics, Saabrucken, Germany, July 2000]] Google ScholarDigital Library
- M. Hurst, Classifying TABLE Elements in HTML. In Poster, 11th International World Wide Web Conference, Honolulu, HI, May 2002, http://www2002.org/CDROM/poster/115/index.html]]Google Scholar
- H. Ng, C. Lim, and J. Koo, Learning to Recognize Tables in Free Text, In Proc. of the 37th Annual Meeting of ACL, 1999, p. 443--450.]] Google ScholarDigital Library
- Y. Wang, and J. Hu, A Machine Learning Based Approach for Table Detection on The Web, In Proc. 11th International World Wide Web Conference, Honolulu, HI, May 2002, pp. 242--250]] Google ScholarDigital Library
- X. Wang, and D. Wood, A Conceptual Model for Tables, In Proc. of Principles of Digital Document Processing, 4th International Workshop, PODDP'98, Saint Malo, France, March, 1998, Lecture Notes in Computer Science, Vol. 1481]] Google ScholarDigital Library
- Y. Yang, Web Table Mining and Database Discovery, M.Sc. thesis, Simon Fraser University, August, 2002]]Google Scholar
- M. Yoshida, K. Torisawa, and J. Tsujji, A Method to Integrate Tables of the World Wide Web, In Proc. 1st International Workshop on Web Document Analysis, Seattle, WA, USA, September 2001, pp. 31--34]]Google Scholar
Index Terms
A framework for web table mining
Recommendations
Web data mining: exploring hyperlinks, contents, and usage data
This paper presents a review of the book "Web Data Mining - Exploring Hyperlinks, Contents, and Usage Data" by Bing Liu. The review concludes that the breadth and depth of this book makes it a required staple for every Web mining researcher, student, or ...
Finding and classifying web units in websites
In web classification, most researchers assume that the objects to be classified are individual web pages from one or more websites. In practice, the assumption is too restrictive since a web page itself may not carry sufficient information for it to be ...
Web dynamics and their ramifications for the development of web search engines
Web dynamicsThe World Wide Web has become the largest hypertext system in existence, providing an extremely rich collection of information resources. Compared with conventional information sources, the Web is highly dynamic in the following four factors: size (i.e.,...
Comments