Skip to main content

From Tessellations to Table Interpretation

  • Conference paper
Intelligent Computer Mathematics (CICM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5625))

Included in the following conference series:

Abstract

The extraction of the relations of nested table headers to content cells is automated with a view to constructing narrow domain ontologies of semi-structured web data. A taxonomy of tessellations for displaying tabular data is developed. X-Y tessellations that can be obtained by a divide-and-conquer method are asymptotically only an infinitesimal fraction of all partitions of a rectangle into rectangles. Admissible tessellations are the even smaller subset of all partitions that correspond to the structures of published tables and that contain only rectangles produced by successive guillotine cuts. Many of these can be processed automatically. Their structures can be conveniently represented by X-Y trees, which facilitate relating hierarchical row and column headings to content cells. A formal grammar is proposed for characterizing the X-Y trees of layout-equivalent admissible tessellations. Algorithms are presented for transforming a tessellation into an X-Y tree and hence into multidimensional, layout- independent Category Trees (Wang abstract data types).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tijerino, Y.A., Embley, D.W., Lonsdale, D.W., Nagy, G.: Towards ontology generation from tables. World Wide Web Journal 6, #3 (2005)

    Google Scholar 

  2. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001)

    Google Scholar 

  3. Halevy, A., Norvig, P., Pereira, F.: The Unreasonable Effectiveness of Data. IEEE Transactions on Intelligent Systems, 8–12 (March/April 2009)

    Google Scholar 

  4. Lopresti, D., Embley, D.W., Hurst, M., Nagy, G.: Table Processing Paradigms: A Research Survey. Int. J. Doc. Anal. Recognit. 8(2-3), 66–86 (2006)

    Article  Google Scholar 

  5. Zanibbi, R., Blostein, D., Cordy, J.R.: A survey of table recognition: Models, observations, transformations, and inferences. Int. J. Doc. Anal. Recognit. 7(1), 1–16 (2004)

    Article  Google Scholar 

  6. Silva, E.C., Jorge, A.M., Torgo, L.: Design of an end-to-end method to extract information from tables. Int. J. Doc. Anal. Recognit. 8(2), 144–171 (2006)

    Article  Google Scholar 

  7. Gatterbauer, W., Bohunsky, P., Herzog, K.M., Pollak, B.: Towards Domain-Independent Information Extraction from Web Tables. In: Proceedings of World Wide Web, Banff, pp. 71–80 (2007)

    Google Scholar 

  8. Padmanabhan, R., Jandhyala, R.C., Krishnamoorhty, M., Nagy, G., Seth, S., Silversmith, W.: How many different kinds of tables are there. In: Procs. Eights Int’l. Workshop on Graphics Recognition (GREC 2009) (2009) (in press)

    Google Scholar 

  9. Klarner, D.A., Magliveras, S.S.: Tilings of a Block with Blocks. Europ. J. Combinatorics 9, 317–330 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  10. Kuh, E.S., Ohtsuki, T.: Recent Advances in VLSI Layout. Proceedings of the IEEE 78(2) (1990)

    Google Scholar 

  11. Brooks, R.L., Smith, C.A.B., Stone, A.H., Tutte, W.T.: The dissection of rectangles into squares. Duke Math. J. 7, 312–340 (1940)

    Article  MathSciNet  MATH  Google Scholar 

  12. Nagy, G., Seth, S.: Hierarchical Image Representation with Application to Optically Scanned Documents. In: Procs. Int. Conf. Pat. Recog. VII, Montreal, pp. 347–349 (1984)

    Google Scholar 

  13. Krishnamoorthy, M., Nagy, G., Seth, S., Viswanathan, M.: Syntactic Segmentation and Labeling of Digitized Pages from Technical Journals. IEEE Transactions on Pattern Analysis and Machine Intelligence 15, #7, 737–747 (1993)

    Google Scholar 

  14. Samet, H.: Foundations of Multidimensional and Metric Data Structures. Morgan Kaufmann, San Francisco (2006)

    MATH  Google Scholar 

  15. The Chicago Manual of Style, 15th edn. Univ. of Chicago Press, Chicago (2003)

    Google Scholar 

  16. U.S. Government Style Manual, 29th edn. (2000)

    Google Scholar 

  17. Green, E.A., Krishnamoorthy, M.: Model-based analysis of printed tables. In: Procs. of Third International Conference on Document Analysis and Recognition (ICDAR 1995), Montreal, Canada, pp. 214–217 (1995)

    Google Scholar 

  18. Green, E.A., Krishnamoorthy, M.: Recognition of tables using table grammars. In: Procs. of Symposium on Document Analysis and Recognition (SDAIR 1995), Las Vegas, NV, pp. 261–277 (1995)

    Google Scholar 

  19. Green, E.A., Krishnamoorthy, M.: Model-based analysis of printed tables. In: Procs. Third Int’l. Workshop on Graphics Recognition (GREC 1995), pp. 234–242 (1995); in Graphics Recognition Methods and Applications. LNCS, vol. 1072, pp. 80–91. Springer, Heidelberg (1996)

    Google Scholar 

  20. Wang, X.: Tabular Abstraction, Editing, and Formatting. Ph.D Dissertation, University of Waterloo, Waterloo, ON, Canada (1996)

    Google Scholar 

  21. Hu, J., Kashi, R., Lopresti, D., Nagy, G., Wilfong, G.: Why table ground-truthing is hard. In: Procs. of Sixth International Conference on Document Analysis and Recognition, Seattle, WA, pp. 129–133 (2001)

    Google Scholar 

  22. Lopresti, D., Nagy, G.: A Tabular Survey of Table Processing. In: Chhabra, A.K., Dori, D. (eds.) GREC 1999. LNCS, vol. 1941, pp. 93–120. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  23. Nagy, G., Lopresti, D.: Issues in ground-truthing graphic documents. In: Blostein, D., Kwon, Y.-B. (eds.) GREC 2001. LNCS, vol. 2390, pp. 46–66. Springer, Heidelberg (2002) (selected papers from the Fourth International Workshop on Graphics Recognition)

    Chapter  Google Scholar 

  24. Tao, C., Embley, D.W.: Automatic Hidden-Web Table Interpretation by Sibling Page Comparison. In: Parent, C., Schewe, K.-D., Storey, V.C., Thalheim, B. (eds.) ER 2007. LNCS, vol. 4801, pp. 566–581. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  25. Tao, C., Embley, D.W., Liddle, S.W.: Enabling a Web of Knowledge. Brigham Young University. manuscript submitted to the special issue about the web of data for the Journal of Web Semantics (2009)

    Google Scholar 

  26. Embley, D.W., Lopresti, D., Nagy, G.: Notes on Contemporary Table Recognition. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 164–175. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  27. Horowitz, E., Sahni, S.: Fundamentals of Data Structures. W.H. Freeman & Co., New York (1983)

    MATH  Google Scholar 

  28. Aho, A.V., Sethi, R., Ullman, J.D.: Compilers: Principles, Techniques, and Tools. Addison-Wesley, Reading (1986)

    MATH  Google Scholar 

  29. DeRemer, F., Pennello, T.: Efficient Computation of LALR(1) Look-Ahead Sets. ACM Trans. Prog. Lang. and Sys. (TOPLAS) 4(4), 615–649 (1982)

    Google Scholar 

  30. Johnson, S.C.: YACC: Yet another Compiler-Compiler. Unix Programmer’s Manual 2b (1979)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Jandhyala, R.C., Krishnamoorthy, M., Nagy, G., Padmanabhan, R., Seth, S., Silversmith, W. (2009). From Tessellations to Table Interpretation. In: Carette, J., Dixon, L., Coen, C.S., Watt, S.M. (eds) Intelligent Computer Mathematics. CICM 2009. Lecture Notes in Computer Science(), vol 5625. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02614-0_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02614-0_33

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02613-3

  • Online ISBN: 978-3-642-02614-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics