Abstract
It is an investigative purpose to acquire information on the event information page that exists in the municipality website in the form of a possible machine process. In this paper, we propose an extraction method from a HTML document based on dictionary.HTML tag is deleted from the HTML document and it converts it into the text. And, it proposes the method for extracting a target character string by comparing the text with the collection of words prepared beforehand. The evaluation experiment was done to the municipality in 23 Tokyo district and 56 Chiba prefecture in Japan. The proposal method was able to extract event information on as a whole 73%. The LR-Wrapper was 52%. The Tree-Wrapper was 55%. The PLR-Wrapper was 32%. The proposal method confirmed event information was rating higher than an existing method extractive by the combination of a simple algorithm and the collection of words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Noguchi, R., Yamada, Y., Ikeda, D.: Template Rxtraction from Web Documents using Substring Amplification. In: DEWS (2004)
Kushmerick, N.: Wrapper induction: Efficiency and Expressiveness. Artificial Intelligence 118(1-2), 15–68 (2000)
Yshitsugu, M., Hiroshi, S., Hiroki, A., Arikawa, S.: Extracting Text Data from HTML Documents. Information Processing Society of Japan 42(14), 39–49 (2001)
Yamada, Y., Ikeda, D., Sachio, H.: Automatic Tree and String Based Wrapper Generation for semi-structured Documents. IPSJ SIG Notes 2003 98, 115–122 (2003)
Yukio, U., Toshio, U., Ryoji, K., Tohgoro, M., Ohwada, H.: Information Extraction Using Specic Rule Wrapper Array. IPSJ SIG Notes, 117–123 (2007)
Masayuki, U., Koji, I., Hirokazu, N.: A Case-Based Semi-automatic Transformation from HTML Documents to XML Ones - Using the Similarity between HTML Documents Constituting a Series. Journal of Japanese Society for Artificial intelligence 16(5), 408–416 (2001)
Sen home, https://sen.dev.java.net/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ushioda, T., Fujita, S. (2010). An Extraction Method to Get a Municipality Event Information. In: Taniar, D., Gervasi, O., Murgante, B., Pardede, E., Apduhan, B.O. (eds) Computational Science and Its Applications – ICCSA 2010. ICCSA 2010. Lecture Notes in Computer Science, vol 6019. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12189-0_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-12189-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12188-3
Online ISBN: 978-3-642-12189-0
eBook Packages: Computer ScienceComputer Science (R0)