Abstract
The work described here is part of an ongoing research on the application of general-purpose inductive logic programming, logic representation of wrappers (L-wrappers) and XML technologies (including the XSLT transformation language) to information extraction from the Web. The L-wrappers methodology is based on a sound theoretical approach and has already proved its efficacy on a smaller scale, in the area of collecting product information. This paper proposes the use of L-wrappers for tuple extraction from HTML in the domain of e-tourism. It also outlines a method for translating L-wrappers into XSLT and illustrates it with the example of a real-world travel agency Web site.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bădică, C., Bădică, A.: Logic Wrappers and XSLT Transformations for Tuples Extraction from HTML. In: Bressan, S., Ceri, S., Hunt, E., Ives, Z.G., Bellahsène, Z., Rys, M., Unland, R. (eds.) XSym 2005. LNCS, vol. 3671, pp. 177–191. Springer, Heidelberg (2005)
Bădică, C., Bădică, A., Popescu, E.: Tuples Extraction from HTML Using Logic Wrappers and Inductive Logic Programming. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 44–50. Springer, Heidelberg (2005)
Bex, G.J., Maneth, S., Neven, F.: A formal model for an expressive fragment of XSLT. Information Systems (27), 21–39 (2002)
Clark, J.: XSLT Transformation (XSLT) Version 1.0, W3C Recommendation November 16 (1999), http://www.w3.org/TR/xslt2
Chidlovskii, B.: Information Extraction from Tree Documents by Learning Subtree Delimiters. In: Proc. IIWeb 2003, Acapulco, Mexico, pp. 3–8 (2003)
Freitag, D.: Information extraction from HTML: application of a general machine learning approach. In: Proc. AAAI 1998, pp. 517–523 (1998)
Ikeda, D., Yamada, Y., Hirokawa, S.: Expressive Power of Tree and String Based Wrappers. In: Proc. IIWeb 2003, Acapulco, Mexoco, pp. 16–21 (2003)
Knoblock, C.: Agents for Gathering, Integrating, and Monitoring Information for Travel Planning. In: Intelligent Systems for Tourism. IEEE Intelligent Systems. pp. 53–66, November/December (2002)
Kosala, R., Bussche, J., van den Bruynooghe, M., Blockeel, H.: Information Extraction in Structured Documents Using Tree Automata Induction. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 299–310. Springer, Heidelberg (2002)
Kushmerick, N., Thomas, B.: Adaptive Information Extraction: Core Technologies for Information Agents. In: Klusch, M., Bergamaschi, S., Edwards, P., Petta, P. (eds.) Intelligent Information Agents. LNCS (LNAI), vol. 2586, pp. 79–103. Springer, Heidelberg (2003)
Laender, A.H.F., Ribeiro-Neto, B., Silva, A.S., Teixeira., J.S.: A Brief Survey of Web Data Extraction Tools. SIGMOD Record 31(2), 84–93 (2002)
Laudon, K.C., Traver, C.G.: E-commerce. business. technology. society, 2nd edn. Pearson Addison-Wesley, London (2004)
Li, Z., Ng, W.K.: WDEE: Web Data Extraction by Example. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 347–358. Springer, Heidelberg (2005)
Oxygen XML Editor, http://www.oxygenxml.com/
Quinlan, J.R., Cameron-Jones, R.M.: Induction of Logic Programs: FOIL and Related Systems. New Generation Computing 13, 287–312 (1995)
Sakamoto, H., Arimura, H., Arikawa, S.: Knowledge Discovery from Semistructured Texts. In: Arikawa, S., Shinohara, A. (eds.) Progress in Discovery Science. LNCS (LNAI), vol. 2281, pp. 586–599. Springer, Heidelberg (2002)
Travelocity Web site. http://www.w3.org/TR/xslt
Xiao, L., Wissmann, D., Brown, M., Jablonski, S.: Information Extraction from HTML: Combining XML and Standard Techniques fro IE from the Web. In: Monostori, L., Váncza, J., Ali, M. (eds.) IEA/AIE 2001. LNCS (LNAI), vol. 2070, pp. 165–174. Springer, Heidelberg (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Popescu, E., Bădică, A., Bădică, C. (2006). Mining Travel Resources on the Web Using L-Wrappers. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2006. ICAISC 2006. Lecture Notes in Computer Science(), vol 4029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11785231_125
Download citation
DOI: https://doi.org/10.1007/11785231_125
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35748-3
Online ISBN: 978-3-540-35750-6
eBook Packages: Computer ScienceComputer Science (R0)