Abstract
This paper suggests an incremental information extraction method for social network analysis of web publications. For this purpose, we employed an incremental knowledge acquisition method, called MCRDR (Multiple Classification Ripple-Down Rules), to classify web page segments. Our experimental results show that our MCRDR-based web page segments classification system successfully supports easy acquisition and maintenance of information extraction rules.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Yu, S., Cai, D., Wen, J.-R., Ma, W.-Y.: Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation (2003)
Gregg, D.G., Walczak, S.: Adaptive Web Information Extraction. Commun. ACM. 49(5), 78–84 (2006)
Kushmerick, N., Thomas, B.: Adaptive Information Extraction: Core Technologies for Information Agents. In: Intelligent Information Agents. Agentlink Perspective, pp. 79–103 (2003)
Kang, J., Choi, J.: Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Information Extraction. Journal of Universal Computer Science 14(11), 1893–1910 (2008)
Turmo, J., Ageno, A., Catala, N.: Adaptive Information Extraction. ACM Comput. Surv. 38(2), 4 (2006)
Kushmerick, N., Weld, D.S., Doorenbos, R.: Wrapper Induction for Information Extraction. In: IJCAI 1997. Proceedings of the Fifteenth International Joint Conference on Artificial Intelligence, pp. 729–735 (1997)
Chidlovskii, B.: Information Extraction from Tree Documents by Learning Substree Delimiters. In: Workshop on Information Integration on the Web in 18th International Joint Conference on Artificial Intelligence (2003)
Debnath, S., Mitra, P., Giles, C.L.: Automatic Extraction of Informative Blocks from Webpages. In: 2005 ACM Symposium on Applied Computing, pp. 1722–1726. ACM Press, New York (2005)
Gupta, S., Kaiser, G., Neistadt, D., Grimm, P.: Dom-Based Content Extraction of Html Documents. In: International World Wide Web Conference, pp. 207–214. ACM Press, New York (2003)
Lin, S.-H., Ho, J.-M.: Discovering Informative Content Blocks from Web Documents. In: SIGKDD 2002, Edmonton, Albert, Canada, (2002)
Pasternack, J., Roth, D.: Extracting Article Text from the Web with Maximum Subsequence Segmentation. In: Proceedings of the 18th International Conference on World Wide Web, pp. 971–980. ACM, Madrid (2009)
Gottron, T.: Combining Content Extraction Heuristics: The <I>Combine</I> System. In: Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services, pp. 591–595. ACM, Linz (2008)
Song, R., Liu, H., Wen, J.-R., Ma, W.-Y.: Learning Block Importance Models for Web Pages. In: 13th International Conference on World Wide Web, pp. 203–211. ACM Press, New York (2004)
Song, R., Liu, H., Wen, J.-R., Ma, W.-Y.: Learning Important Models for Web Page Blocks Based on Layout and Content Analysis. SIGKDD Explor. Newsl. 6(2), 14–23 (2004)
Bar-Yossef, Z., Rajagopalan, S.: Template Detection Via Data Mining and Its Applications. In: WWW 2002, Honolulu, Hawaii, USA, (2002)
Chakrabarti, D., Kumar, R., Punera, K.: Page-Level Template Detection Via Isotonic Smoothing. In: Proceedings of the 16th international conference on World Wide Web, pp. 61–70. ACM, Banff (2007)
Vieira, K., da Costa Carvalho, A., Berlt, K., de Moura, E., da Silva, A., Freire, J.: On Finding Templates on Web Collections. World Wide Web 12(2), 171–211 (2009)
Wang, Y., Fang, B., Cheng, X., Guo, L., Xu, H.: Incremental Web Page Template Detection. In: Proceeding of the 17th international conference on World Wide Web, pp. 1247–1248. ACM, Beijing (2008)
Compton, P., Edwards, G., Kang, B., Lazarus, L., Malor, R., Menzies, T., Preston, P., Srinivasan, A., Sammut, C.: Ripple Down Rules: Possibilities and Limitations. In: 6th Bannf AAAI Knowledge Acquisition for Knowledge Based Systems Workshop, Banff, Canada, pp. 6-1–6-20 (1991)
Compton, P., Edwards, G., Kang, B., Lazarus, L., Malor, R., Preston, P., Srinivasan, A.: Ripple Down Rules: Turning Knowledge Acquisition into Knowledge Maintenance. Artificial Intelligence in Medicine 4(6), 463–475 (1992)
Compton, P., Jansen, R.: A Philosophical Basis for Knowledge Acquisition. Knowledge Acquisition 2(3), 241–258 (1990)
Compton, P., Kang, B., Preston, P., Mulholland, M.: Knowledge Acquisition without Analysis. In: Aussenac, N., Boy, G.A., Ganascia, J.-G., Kodratoff, Y., Linster, M., Gaines, B.R. (eds.) EKAW 1993. LNCS, vol. 723, pp. 277–299. Springer, Heidelberg (1993)
Kang, B.H., Gambetta, W., Compton, P.: Verification and Validation with Ripple-Down Rules. International Journal of Human-Computer Studies 44(2), 257–269 (1996)
Kang, B., Compton, P., Preston, P.: Multiple Classification Ripple Down Rules: Evaluation and Possibilities. In: 9th AAAI-Sponsored Banff Knowledge Acquisition for Knowledge-Based Systems Workshop, Banff, Canada, University of Calgary (1995)
Park, S.S., Kim, Y.S., Kang, B.H.: Web Document Classification: Managing Context Change. In: IADIS International Conference WWW/Internet 2004, Madrid, Spain, pp. 143–151 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Guo, W., Kim, Y.S., Kang, B.H. (2010). Webpage Segments Classification with Incremental Knowledge Acquisition. In: Kim, Th., Ma, J., Fang, Wc., Park, B., Kang, BH., Ślęzak, D. (eds) U- and E-Service, Science and Technology. UNESST 2010. Communications in Computer and Information Science, vol 124. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17644-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-17644-9_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-17643-2
Online ISBN: 978-3-642-17644-9
eBook Packages: Computer ScienceComputer Science (R0)