Mining Table Information on the Internet

Jung, Sung-won; Han, Gi-deuk; Kwon, Hyuk-chul

doi:10.1007/978-3-540-30211-7_62

Sung-won Jung²²,
Gi-deuk Han²² &
Hyuk-chul Kwon²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3248))

Included in the following conference series:

International Conference on Natural Language Processing

1569 Accesses

Abstract

Making HTML documents, the authors use various methods for clearly conveying their intension. In those various methods, this paper pays special attention to tables because tables are commonly used within many documents to make the meanings clear, which are well recognized because web documents use tags for additional information. On the Internet, tables are used for the purpose of the knowledge structuring as well as design of documents. Thus, we are firstly interested in classifying tables into two types: meaningful tables and decorative tables. However, this is not easy because HTML does not separate presentation and structure. This paper proposes a method of extracting meaningful tables using a modified k-means and compares it with other methods. The experiment results show that classifying on web documents is promising.

This work was supported by National Research Laboratory Program (Contract Number: M10203000028-02J0000-01510 ) of KISTEP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Mitchell, T.M.: Machine Learning, pp. 53–79. McGraw-Hill, New York (1997)
MATH Google Scholar
Hammer, J., Garcia-Molina, H., Cho, J., Aranha, R., Crespo, A.: Extracting Semistructured Information from the Web. SIGMOD Record 26(2), 18–25 (1997)
Article Google Scholar
Huang, Y., Qi, G.Z., Zhang, F.Y.: Constructing Semistructed information extractor from the Web document. Journal of Software 11(1), 73–75 (2000)
Google Scholar
Ashish, N., Knoblock, C.: Wrapper Generation for Semi-structed Internet Sources. SIGMOD Record 26(4), 8–15 (1997)
Article Google Scholar
Smith, D., Lopez, M.: Information Extraction for Semi-structed Documents. In: Proceedings of the Workshop on Management of Semistructed Data, in conjunction with PODS/SIGMOD, Tucson, AZ, USA, May 12 (1997)
Google Scholar
Ning, G., Guowen, W., Xiaoyuan, W., Baile, S.: Extracting Web table information in cooperative learning activites based on abstract semantic model. In: The Sixth International Conference on Computer Supported Cooperative Work in Design 2001, pp. 492–497 (2001)
Google Scholar
Jung, S.W., Sung, K.H., Park, T.W., Kwon, H.C.: Effective Retrieval of Information in Tables on the Internet. In: Hendtlass, T., Ali, M. (eds.) IEA/AIE 2002. LNCS (LNAI), vol. 2358, pp. 493–501. Springer, Heidelberg (2002)
Chapter Google Scholar
Jung, S.W., Lee, W.H., Park, S.K., Kwon, H.C.: Extraction of Meaningful Tables from the Internet Using Decision Trees. In: Chung, P.W.H., Hinde, C.J., Ali, M. (eds.) IEA/AIE 2003. LNCS (LNAI), vol. 2718, pp. 176–186. Springer, Heidelberg (2003)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Korean Language Processing Lab. School of Electrical & Computer Engineering, Pusan National University, Busan, 609-735, Korea
Sung-won Jung, Gi-deuk Han & Hyuk-chul Kwon

Authors

Sung-won Jung
View author publications
You can also search for this author in PubMed Google Scholar
Gi-deuk Han
View author publications
You can also search for this author in PubMed Google Scholar
Hyuk-chul Kwon
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Behavior Design Corporation, IV Science-Based Industrial Park Hsinchu, 2F, No.5, Industry E. Rd, Taiwan
Keh-Yih Su
University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-0033, JST CREST, Honcho 4-1-8, Kawaguchi-shi,, 332-0012, Saitama,
Jun’ichi Tsujii
Pohang University of Science and Technology (POSTECH), AITrc, Republic of Korea
Jong-Hyeok Lee
Language Information Sciences Research Centre, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
Oi Yee Kwong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jung, Sw., Han, Gd., Kwon, Hc. (2005). Mining Table Information on the Internet. In: Su, KY., Tsujii, J., Lee, JH., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2004. IJCNLP 2004. Lecture Notes in Computer Science(), vol 3248. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30211-7_62

Download citation

DOI: https://doi.org/10.1007/978-3-540-30211-7_62
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24475-2
Online ISBN: 978-3-540-30211-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics