Abstract
Two hundred web tables from ten sites were imported into Excel. The tables were edited as needed, then converted into layout independent Wang Notation using the Table Abstraction Tool (TAT). The output generated by TAT consists of XML files to be used for constructing narrow-domain ontologies. On an average each table required 104 seconds for editing. Augmentations like aggregates, footnotes, table titles, captions, units and notes were also extracted in an average time of 93 seconds. Every user intervention was logged and audited. The logged interactions were analyzed to determine the relative influence of factors like table size, number of categories and various types of augmentations on the processing time. The analysis suggests which aspects of interactive table processing can be automated in the near term, and how much time such automation would save. The correlation coefficient between predicted and actual processing time was 0.66.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tijerino, Y.A., Embley, D.W., Lonsdale, D.W., Ding, Y., Nagy, G.: Toward Ontology Generation from Tables. World Wide Web: Internet and Web Information Systems 8(3), 261–285 (2005)
Padmanabhan, R.: Table Abstraction Tool, RPI DocLab, Master’s Thesis, May 16 (2009)
Jha, P., Nagy, G.: Wang Notation Tool: Layout Independent Representation of Tables. In: Proceedings of the Nineteenth International Conference on Pattern Recognition (ICPR 2008), Tampa (April 2008)
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, observations, transformations, and inferences. International Journal of Document Analysis and Recognition 7(1), 1–16 (2004)
Lopresti, D., Embley, D.W., Hurst, M., Nagy, G.: Table Processing Paradigms: A Research Survey. International Journal of Document Analysis and Recognition 8(2-3), 66–86 (2006)
Sobue, T., Watanabe, T.: Identification of Item Fields in Table-form Documents with/without Line Segments. In: Proceedings of IAPR Workshop on Machine Vision Applications, Tokyo, Japan, November 12-14, pp. 522–525 (1996)
Klink, S., Kieninger, T.: Rule-based document structure understanding with a fuzzy combination of layout and textual features. International Journal of Document Analysis and Recognition 4(1), 18–26 (2001)
Laurentini, A., Viada, P.: Identifying and understanding tabular material in compound documents. In: Proceedings of the Eleventh International Conference on Pattern Recognition (ICPR 1992), The Hague, pp. 405–409 (1992)
Itonori, K.: A table structure recognition based on textblock arrangement and ruled line position. In: Proceedings of the Second International Conference on Document Analysis and Recognition (ICDAR 1993), Tsukuba Science City, Japan, pp. 765–768 (1993)
Silva, E.C., Jorge, A.M., Torgo, L.: Design of an end-to-end method to extract information from tables. International Journal of Document Analysis and Recognition 8(2), 144–171 (2006)
Krüpl, B., Herzog, M., Gatterbauer, W.: Using visual cues for extraction of tabular data from arbitrary HTML documents. In: Proceedings of the 14th Int’l. Conf. on World Wide Web, pp. 1000–1001 (2005)
Lopresti, D., Nagy, G.: Automated Table Processing: An (Opinionated) Survey. In: Proceedings of the Third IAPR International Workshop on Graphics Recognition, Jaipur, India, pp. 109–134 (September 1999)
Wang, Y., Hu, J.: Automatic Table Detection in HTML Documents. In: Web Document Analysis: Challenges and Opportunities, October 2003, pp. 135–154 (2003)
Handley, J.C.: Table analysis for multiline cell identification. In: Proceedings of Document Recognition and Retrieval VIII (IS\&T/SPIE Electronic Imaging), San Jose, CA, vol. 4307, pp. 44–55 (2001)
Jandhyala, R.C., Nagy, G., Seth, S., Silversmith, W., Krishnamoorthy, M., Padmanabhan, R.: From tessellations to table interpretation. In: Carette, J., Dixon, L., Coen, C.S., Watt, S.M. (eds.) Calculemus 2009. LNCS, vol. 5625, pp. 422–437. Springer, Heidelberg (2009)
Embley, D.W., Lopresti, D., Nagy, G.: Notes on Contemporary Table Recognition Workshop on Document Analysis Systems. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 164–175. Springer, Heidelberg (2006)
Wang, X.: Tabular Abstraction, Editing, and Formatting, Ph.D Dissertation, University of Waterloo, Waterloo, ON, Canada (1996)
Lopresti, D., Nagy, G.: A Tabular Survey of Automated Table Processing, Graphics Recognition: Recent Advances. In: Chhabra, A.K., Dori, D. (eds.) GREC 1999. LNCS, vol. 1941, pp. 93–120. Springer, Heidelberg (2000)
Seth, S., Jandhyala, R., Krishnamoorthy, M., Nagy, G.: Analysis and Taxonomy of Column Header Categories for Web Tables. To appear in Proceedings of the Document Analysis Systems, Boston (June 2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Padmanabhan, R.K., Jandhyala, R.C., Krishnamoorthy, M., Nagy, G., Seth, S., Silversmith, W. (2010). Interactive Conversion of Web Tables. In: Ogier, JM., Liu, W., Lladós, J. (eds) Graphics Recognition. Achievements, Challenges, and Evolution. GREC 2009. Lecture Notes in Computer Science, vol 6020. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13728-0_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-13728-0_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13727-3
Online ISBN: 978-3-642-13728-0
eBook Packages: Computer ScienceComputer Science (R0)