Webpage Segments Classification with Incremental Knowledge Acquisition

Guo, Wei; Kim, Yang Sok; Kang, Byeong Ho

doi:10.1007/978-3-642-17644-9_9

Webpage Segments Classification with Incremental Knowledge Acquisition

Wei Guo⁷,
Yang Sok Kim⁸ &
Byeong Ho Kang⁷

Conference paper

730 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 124))

Abstract

This paper suggests an incremental information extraction method for social network analysis of web publications. For this purpose, we employed an incremental knowledge acquisition method, called MCRDR (Multiple Classification Ripple-Down Rules), to classify web page segments. Our experimental results show that our MCRDR-based web page segments classification system successfully supports easy acquisition and maintenance of information extraction rules.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Yu, S., Cai, D., Wen, J.-R., Ma, W.-Y.: Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation (2003)
Google Scholar
Gregg, D.G., Walczak, S.: Adaptive Web Information Extraction. Commun. ACM. 49(5), 78–84 (2006)
Article Google Scholar
Kushmerick, N., Thomas, B.: Adaptive Information Extraction: Core Technologies for Information Agents. In: Intelligent Information Agents. Agentlink Perspective, pp. 79–103 (2003)
Google Scholar
Kang, J., Choi, J.: Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction. Journal of Universal Computer Science 14(11), 1893–1910 (2008)
Google Scholar
Turmo, J., Ageno, A., Catala, N.: Adaptive Information Extraction. ACM Comput. Surv. 38(2), 4 (2006)
Article Google Scholar
Kushmerick, N., Weld, D.S., Doorenbos, R.: Wrapper Induction for Information Extraction. In: IJCAI 1997. Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pp. 729–735 (1997)
Google Scholar
Chidlovskii, B.: Information Extraction from Tree Documents by Learning Substree Delimiters. In: Workshop on Information Integration on the Web in 18th International Joint Conference on Artificial Intelligence (2003)
Google Scholar
Debnath, S., Mitra, P., Giles, C.L.: Automatic Extraction of Informative Blocks from Webpages. In: 2005 ACM Symposium on Applied Computing, pp. 1722–1726. ACM Press, New York (2005)
Chapter Google Scholar
Gupta, S., Kaiser, G., Neistadt, D., Grimm, P.: Dom-Based Content Extraction of Html Documents. In: International World Wide Web Conference, pp. 207–214. ACM Press, New York (2003)
Google Scholar
Lin, S.-H., Ho, J.-M.: Discovering Informative Content Blocks from Web Documents. In: SIGKDD 2002, Edmonton, Albert, Canada, (2002)
Google Scholar
Pasternack, J., Roth, D.: Extracting Article Text from the Web with Maximum Subsequence Segmentation. In: Proceedings of the 18th International Conference on World Wide Web, pp. 971–980. ACM, Madrid (2009)
Chapter Google Scholar
Gottron, T.: Combining Content Extraction Heuristics: The <I>Combine</I> System. In: Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services, pp. 591–595. ACM, Linz (2008)
Google Scholar
Song, R., Liu, H., Wen, J.-R., Ma, W.-Y.: Learning Block Importance Models for Web Pages. In: 13th International Conference on World Wide Web, pp. 203–211. ACM Press, New York (2004)
Google Scholar
Song, R., Liu, H., Wen, J.-R., Ma, W.-Y.: Learning Important Models for Web Page Blocks Based on Layout and Content Analysis. SIGKDD Explor. Newsl. 6(2), 14–23 (2004)
Article Google Scholar
Bar-Yossef, Z., Rajagopalan, S.: Template Detection Via Data Mining and Its Applications. In: WWW 2002, Honolulu, Hawaii, USA, (2002)
Google Scholar
Chakrabarti, D., Kumar, R., Punera, K.: Page-Level Template Detection Via Isotonic Smoothing. In: Proceedings of the 16th international conference on World Wide Web, pp. 61–70. ACM, Banff (2007)
Chapter Google Scholar
Vieira, K., da Costa Carvalho, A., Berlt, K., de Moura, E., da Silva, A., Freire, J.: On Finding Templates on Web Collections. World Wide Web 12(2), 171–211 (2009)
Article Google Scholar
Wang, Y., Fang, B., Cheng, X., Guo, L., Xu, H.: Incremental Web Page Template Detection. In: Proceeding of the 17th international conference on World Wide Web, pp. 1247–1248. ACM, Beijing (2008)
Chapter Google Scholar
Compton, P., Edwards, G., Kang, B., Lazarus, L., Malor, R., Menzies, T., Preston, P., Srinivasan, A., Sammut, C.: Ripple Down Rules: Possibilities and Limitations. In: 6th Bannf AAAI Knowledge Acquisition for Knowledge Based Systems Workshop, Banff, Canada, pp. 6-1–6-20 (1991)
Google Scholar
Compton, P., Edwards, G., Kang, B., Lazarus, L., Malor, R., Preston, P., Srinivasan, A.: Ripple Down Rules: Turning Knowledge Acquisition into Knowledge Maintenance. Artificial Intelligence in Medicine 4(6), 463–475 (1992)
Article Google Scholar
Compton, P., Jansen, R.: A Philosophical Basis for Knowledge Acquisition. Knowledge Acquisition 2(3), 241–258 (1990)
Article Google Scholar
Compton, P., Kang, B., Preston, P., Mulholland, M.: Knowledge Acquisition without Analysis. In: Aussenac, N., Boy, G.A., Ganascia, J.-G., Kodratoff, Y., Linster, M., Gaines, B.R. (eds.) EKAW 1993. LNCS, vol. 723, pp. 277–299. Springer, Heidelberg (1993)
Chapter Google Scholar
Kang, B.H., Gambetta, W., Compton, P.: Verification and Validation with Ripple-Down Rules. International Journal of Human-Computer Studies 44(2), 257–269 (1996)
Article Google Scholar
Kang, B., Compton, P., Preston, P.: Multiple Classification Ripple Down Rules: Evaluation and Possibilities. In: 9th AAAI-Sponsored Banff Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada, University of Calgary (1995)
Google Scholar
Park, S.S., Kim, Y.S., Kang, B.H.: Web Document Classification: Managing Context Change. In: IADIS International Conference WWW/Internet 2004, Madrid, Spain, pp. 143–151 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Tasmania, Sandy Bay, Tasmania, Australia
Wei Guo & Byeong Ho Kang
University of New South Wales, Sydney, New South Wales, Australia
Yang Sok Kim

Authors

Wei Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yang Sok Kim
View author publications
You can also search for this author in PubMed Google Scholar
Byeong Ho Kang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Hannam University, Daejeon, South Korea
Tai-hoon Kim
Hosei University, 3-7-2, Kajino-cho, Koganei-shi, 184-8584, Tokyo, Japan
Jianhua Ma
National Chiao Tung University, Hsinchu, Taiwan
Wai-chi Fang
Hannam University, 133 Ojeong-dong, 306-791, Daeduk-gu, Daejeon, Korea
Byungjoo Park
University of Tasmania, 7001, Hobart, Australia
Byeong-Ho Kang
University of Warsaw & Infobright Inc., Poland
Dominik Ślęzak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guo, W., Kim, Y.S., Kang, B.H. (2010). Webpage Segments Classification with Incremental Knowledge Acquisition. In: Kim, Th., Ma, J., Fang, Wc., Park, B., Kang, BH., Ślęzak, D. (eds) U- and E-Service, Science and Technology. UNESST 2010. Communications in Computer and Information Science, vol 124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17644-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-17644-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17643-2
Online ISBN: 978-3-642-17644-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics