Skip to main content

Webpage Segments Classification with Incremental Knowledge Acquisition

  • Conference paper
  • 730 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 124))

Abstract

This paper suggests an incremental information extraction method for social network analysis of web publications. For this purpose, we employed an incremental knowledge acquisition method, called MCRDR (Multiple Classification Ripple-Down Rules), to classify web page segments. Our experimental results show that our MCRDR-based web page segments classification system successfully supports easy acquisition and maintenance of information extraction rules.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yu, S., Cai, D., Wen, J.-R., Ma, W.-Y.: Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation (2003)

    Google Scholar 

  2. Gregg, D.G., Walczak, S.: Adaptive Web Information Extraction. Commun. ACM. 49(5), 78–84 (2006)

    Article  Google Scholar 

  3. Kushmerick, N., Thomas, B.: Adaptive Information Extraction: Core Technologies for Information Agents. In: Intelligent Information Agents. Agentlink Perspective, pp. 79–103 (2003)

    Google Scholar 

  4. Kang, J., Choi, J.: Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction. Journal of Universal Computer Science 14(11), 1893–1910 (2008)

    Google Scholar 

  5. Turmo, J., Ageno, A., Catala, N.: Adaptive Information Extraction. ACM Comput. Surv. 38(2), 4 (2006)

    Article  Google Scholar 

  6. Kushmerick, N., Weld, D.S., Doorenbos, R.: Wrapper Induction for Information Extraction. In: IJCAI 1997. Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pp. 729–735 (1997)

    Google Scholar 

  7. Chidlovskii, B.: Information Extraction from Tree Documents by Learning Substree Delimiters. In: Workshop on Information Integration on the Web in 18th International Joint Conference on Artificial Intelligence (2003)

    Google Scholar 

  8. Debnath, S., Mitra, P., Giles, C.L.: Automatic Extraction of Informative Blocks from Webpages. In: 2005 ACM Symposium on Applied Computing, pp. 1722–1726. ACM Press, New York (2005)

    Chapter  Google Scholar 

  9. Gupta, S., Kaiser, G., Neistadt, D., Grimm, P.: Dom-Based Content Extraction of Html Documents. In: International World Wide Web Conference, pp. 207–214. ACM Press, New York (2003)

    Google Scholar 

  10. Lin, S.-H., Ho, J.-M.: Discovering Informative Content Blocks from Web Documents. In: SIGKDD 2002, Edmonton, Albert, Canada, (2002)

    Google Scholar 

  11. Pasternack, J., Roth, D.: Extracting Article Text from the Web with Maximum Subsequence Segmentation. In: Proceedings of the 18th International Conference on World Wide Web, pp. 971–980. ACM, Madrid (2009)

    Chapter  Google Scholar 

  12. Gottron, T.: Combining Content Extraction Heuristics: The <I>Combine</I> System. In: Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services, pp. 591–595. ACM, Linz (2008)

    Google Scholar 

  13. Song, R., Liu, H., Wen, J.-R., Ma, W.-Y.: Learning Block Importance Models for Web Pages. In: 13th International Conference on World Wide Web, pp. 203–211. ACM Press, New York (2004)

    Google Scholar 

  14. Song, R., Liu, H., Wen, J.-R., Ma, W.-Y.: Learning Important Models for Web Page Blocks Based on Layout and Content Analysis. SIGKDD Explor. Newsl. 6(2), 14–23 (2004)

    Article  Google Scholar 

  15. Bar-Yossef, Z., Rajagopalan, S.: Template Detection Via Data Mining and Its Applications. In: WWW 2002, Honolulu, Hawaii, USA, (2002)

    Google Scholar 

  16. Chakrabarti, D., Kumar, R., Punera, K.: Page-Level Template Detection Via Isotonic Smoothing. In: Proceedings of the 16th international conference on World Wide Web, pp. 61–70. ACM, Banff (2007)

    Chapter  Google Scholar 

  17. Vieira, K., da Costa Carvalho, A., Berlt, K., de Moura, E., da Silva, A., Freire, J.: On Finding Templates on Web Collections. World Wide Web 12(2), 171–211 (2009)

    Article  Google Scholar 

  18. Wang, Y., Fang, B., Cheng, X., Guo, L., Xu, H.: Incremental Web Page Template Detection. In: Proceeding of the 17th international conference on World Wide Web, pp. 1247–1248. ACM, Beijing (2008)

    Chapter  Google Scholar 

  19. Compton, P., Edwards, G., Kang, B., Lazarus, L., Malor, R., Menzies, T., Preston, P., Srinivasan, A., Sammut, C.: Ripple Down Rules: Possibilities and Limitations. In: 6th Bannf AAAI Knowledge Acquisition for Knowledge Based Systems Workshop, Banff, Canada, pp. 6-1–6-20 (1991)

    Google Scholar 

  20. Compton, P., Edwards, G., Kang, B., Lazarus, L., Malor, R., Preston, P., Srinivasan, A.: Ripple Down Rules: Turning Knowledge Acquisition into Knowledge Maintenance. Artificial Intelligence in Medicine 4(6), 463–475 (1992)

    Article  Google Scholar 

  21. Compton, P., Jansen, R.: A Philosophical Basis for Knowledge Acquisition. Knowledge Acquisition 2(3), 241–258 (1990)

    Article  Google Scholar 

  22. Compton, P., Kang, B., Preston, P., Mulholland, M.: Knowledge Acquisition without Analysis. In: Aussenac, N., Boy, G.A., Ganascia, J.-G., Kodratoff, Y., Linster, M., Gaines, B.R. (eds.) EKAW 1993. LNCS, vol. 723, pp. 277–299. Springer, Heidelberg (1993)

    Chapter  Google Scholar 

  23. Kang, B.H., Gambetta, W., Compton, P.: Verification and Validation with Ripple-Down Rules. International Journal of Human-Computer Studies 44(2), 257–269 (1996)

    Article  Google Scholar 

  24. Kang, B., Compton, P., Preston, P.: Multiple Classification Ripple Down Rules: Evaluation and Possibilities. In: 9th AAAI-Sponsored Banff Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada, University of Calgary (1995)

    Google Scholar 

  25. Park, S.S., Kim, Y.S., Kang, B.H.: Web Document Classification: Managing Context Change. In: IADIS International Conference WWW/Internet 2004, Madrid, Spain, pp. 143–151 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Guo, W., Kim, Y.S., Kang, B.H. (2010). Webpage Segments Classification with Incremental Knowledge Acquisition. In: Kim, Th., Ma, J., Fang, Wc., Park, B., Kang, BH., Ślęzak, D. (eds) U- and E-Service, Science and Technology. UNESST 2010. Communications in Computer and Information Science, vol 124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17644-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17644-9_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17643-2

  • Online ISBN: 978-3-642-17644-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics