Abstract
The extraction of the relations of nested table headers to content cells is automated with a view to constructing narrow domain ontologies of semi-structured web data. A taxonomy of tessellations for displaying tabular data is developed. X-Y tessellations that can be obtained by a divide-and-conquer method are asymptotically only an infinitesimal fraction of all partitions of a rectangle into rectangles. Admissible tessellations are the even smaller subset of all partitions that correspond to the structures of published tables and that contain only rectangles produced by successive guillotine cuts. Many of these can be processed automatically. Their structures can be conveniently represented by X-Y trees, which facilitate relating hierarchical row and column headings to content cells. A formal grammar is proposed for characterizing the X-Y trees of layout-equivalent admissible tessellations. Algorithms are presented for transforming a tessellation into an X-Y tree and hence into multidimensional, layout- independent Category Trees (Wang abstract data types).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tijerino, Y.A., Embley, D.W., Lonsdale, D.W., Nagy, G.: Towards ontology generation from tables. World Wide Web Journal 6, #3 (2005)
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001)
Halevy, A., Norvig, P., Pereira, F.: The Unreasonable Effectiveness of Data. IEEE Transactions on Intelligent Systems, 8–12 (March/April 2009)
Lopresti, D., Embley, D.W., Hurst, M., Nagy, G.: Table Processing Paradigms: A Research Survey. Int. J. Doc. Anal. Recognit. 8(2-3), 66–86 (2006)
Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, observations, transformations, and inferences. Int. J. Doc. Anal. Recognit. 7(1), 1–16 (2004)
Silva, E.C., Jorge, A.M., Torgo, L.: Design of an end-to-end method to extract information from tables. Int. J. Doc. Anal. Recognit. 8(2), 144–171 (2006)
Gatterbauer, W., Bohunsky, P., Herzog, K.M., Pollak, B.: Towards Domain-Independent Information Extraction from Web Tables. In: Proceedings of World Wide Web, Banff, pp. 71–80 (2007)
Padmanabhan, R., Jandhyala, R.C., Krishnamoorhty, M., Nagy, G., Seth, S., Silversmith, W.: How many different kinds of tables are there. In: Procs. Eights Int’l. Workshop on Graphics Recognition (GREC 2009) (2009) (in press)
Klarner, D.A., Magliveras, S.S.: Tilings of a Block with Blocks. Europ. J. Combinatorics 9, 317–330 (1988)
Kuh, E.S., Ohtsuki, T.: Recent Advances in VLSI Layout. Proceedings of the IEEEÂ 78(2) (1990)
Brooks, R.L., Smith, C.A.B., Stone, A.H., Tutte, W.T.: The dissection of rectangles into squares. Duke Math. J. 7, 312–340 (1940)
Nagy, G., Seth, S.: Hierarchical Image Representation with Application to Optically Scanned Documents. In: Procs. Int. Conf. Pat. Recog. VII, Montreal, pp. 347–349 (1984)
Krishnamoorthy, M., Nagy, G., Seth, S., Viswanathan, M.: Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, #7, 737–747 (1993)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2006)
The Chicago Manual of Style, 15th edn. Univ. of Chicago Press, Chicago (2003)
U.S. Government Style Manual, 29th edn. (2000)
Green, E.A., Krishnamoorthy, M.: Model-based analysis of printed tables. In: Procs. of Third International Conference on Document Analysis and Recognition (ICDAR 1995), Montreal, Canada, pp. 214–217 (1995)
Green, E.A., Krishnamoorthy, M.: Recognition of tables using table grammars. In: Procs. of Symposium on Document Analysis and Recognition (SDAIR 1995), Las Vegas, NV, pp. 261–277 (1995)
Green, E.A., Krishnamoorthy, M.: Model-based analysis of printed tables. In: Procs. Third Int’l. Workshop on Graphics Recognition (GREC 1995), pp. 234–242 (1995); in Graphics Recognition Methods and Applications. LNCS, vol. 1072, pp. 80–91. Springer, Heidelberg (1996)
Wang, X.: Tabular Abstraction, Editing, and Formatting. Ph.D Dissertation, University of Waterloo, Waterloo, ON, Canada (1996)
Hu, J., Kashi, R., Lopresti, D., Nagy, G., Wilfong, G.: Why table ground-truthing is hard. In: Procs. of Sixth International Conference on Document Analysis and Recognition, Seattle, WA, pp. 129–133 (2001)
Lopresti, D., Nagy, G.: A Tabular Survey of Table Processing. In: Chhabra, A.K., Dori, D. (eds.) GREC 1999. LNCS, vol. 1941, pp. 93–120. Springer, Heidelberg (2000)
Nagy, G., Lopresti, D.: Issues in ground-truthing graphic documents. In: Blostein, D., Kwon, Y.-B. (eds.) GREC 2001. LNCS, vol. 2390, pp. 46–66. Springer, Heidelberg (2002) (selected papers from the Fourth International Workshop on Graphics Recognition)
Tao, C., Embley, D.W.: Automatic Hidden-Web Table Interpretation by Sibling Page Comparison. In: Parent, C., Schewe, K.-D., Storey, V.C., Thalheim, B. (eds.) ER 2007. LNCS, vol. 4801, pp. 566–581. Springer, Heidelberg (2007)
Tao, C., Embley, D.W., Liddle, S.W.: Enabling a Web of Knowledge. Brigham Young University. manuscript submitted to the special issue about the web of data for the Journal of Web Semantics (2009)
Embley, D.W., Lopresti, D., Nagy, G.: Notes on Contemporary Table Recognition. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 164–175. Springer, Heidelberg (2006)
Horowitz, E., Sahni, S.: Fundamentals of Data Structures. W.H. Freeman & Co., New York (1983)
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading (1986)
DeRemer, F., Pennello, T.: Efficient Computation of LALR(1) Look-Ahead Sets. ACM Trans. Prog. Lang. and Sys. (TOPLAS) 4(4), 615–649 (1982)
Johnson, S.C.: YACC: Yet another Compiler-Compiler. Unix Programmer’s Manual 2b (1979)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jandhyala, R.C., Krishnamoorthy, M., Nagy, G., Padmanabhan, R., Seth, S., Silversmith, W. (2009). From Tessellations to Table Interpretation. In: Carette, J., Dixon, L., Coen, C.S., Watt, S.M. (eds) Intelligent Computer Mathematics. CICM 2009. Lecture Notes in Computer Science(), vol 5625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02614-0_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-02614-0_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02613-3
Online ISBN: 978-3-642-02614-0
eBook Packages: Computer ScienceComputer Science (R0)