Mining Travel Resources on the Web Using L-Wrappers

Popescu, Elvira; Bădică, Amelia; Bădică, Costin

doi:10.1007/11785231_125

Mining Travel Resources on the Web Using L-Wrappers

Elvira Popescu²²,
Amelia Bădică²³ &
Costin Bădică²²

Conference paper

1582 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4029))

Abstract

The work described here is part of an ongoing research on the application of general-purpose inductive logic programming, logic representation of wrappers (L-wrappers) and XML technologies (including the XSLT transformation language) to information extraction from the Web. The L-wrappers methodology is based on a sound theoretical approach and has already proved its efficacy on a smaller scale, in the area of collecting product information. This paper proposes the use of L-wrappers for tuple extraction from HTML in the domain of e-tourism. It also outlines a method for translating L-wrappers into XSLT and illustrates it with the example of a real-world travel agency Web site.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bădică, C., Bădică, A.: Logic Wrappers and XSLT Transformations for Tuples Extraction from HTML. In: Bressan, S., Ceri, S., Hunt, E., Ives, Z.G., Bellahsène, Z., Rys, M., Unland, R. (eds.) XSym 2005. LNCS, vol. 3671, pp. 177–191. Springer, Heidelberg (2005)
Chapter Google Scholar
Bădică, C., Bădică, A., Popescu, E.: Tuples Extraction from HTML Using Logic Wrappers and Inductive Logic Programming. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 44–50. Springer, Heidelberg (2005)
Chapter Google Scholar
Bex, G.J., Maneth, S., Neven, F.: A formal model for an expressive fragment of XSLT. Information Systems (27), 21–39 (2002)
Article MATH Google Scholar
Clark, J.: XSLT Transformation (XSLT) Version 1.0, W3C Recommendation November 16 (1999), http://www.w3.org/TR/xslt2
Chidlovskii, B.: Information Extraction from Tree Documents by Learning Subtree Delimiters. In: Proc. IIWeb 2003, Acapulco, Mexico, pp. 3–8 (2003)
Google Scholar
Freitag, D.: Information extraction from HTML: application of a general machine learning approach. In: Proc. AAAI 1998, pp. 517–523 (1998)
Google Scholar
Ikeda, D., Yamada, Y., Hirokawa, S.: Expressive Power of Tree and String Based Wrappers. In: Proc. IIWeb 2003, Acapulco, Mexoco, pp. 16–21 (2003)
Google Scholar
Knoblock, C.: Agents for Gathering, Integrating, and Monitoring Information for Travel Planning. In: Intelligent Systems for Tourism. IEEE Intelligent Systems. pp. 53–66, November/December (2002)
Google Scholar
Kosala, R., Bussche, J., van den Bruynooghe, M., Blockeel, H.: Information Extraction in Structured Documents Using Tree Automata Induction. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 299–310. Springer, Heidelberg (2002)
Chapter Google Scholar
Kushmerick, N., Thomas, B.: Adaptive Information Extraction: Core Technologies for Information Agents. In: Klusch, M., Bergamaschi, S., Edwards, P., Petta, P. (eds.) Intelligent Information Agents. LNCS (LNAI), vol. 2586, pp. 79–103. Springer, Heidelberg (2003)
Chapter Google Scholar
Laender, A.H.F., Ribeiro-Neto, B., Silva, A.S., Teixeira., J.S.: A Brief Survey of Web Data Extraction Tools. SIGMOD Record 31(2), 84–93 (2002)
Article Google Scholar
Laudon, K.C., Traver, C.G.: E-commerce. business. technology. society, 2nd edn. Pearson Addison-Wesley, London (2004)
Google Scholar
Li, Z., Ng, W.K.: WDEE: Web Data Extraction by Example. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 347–358. Springer, Heidelberg (2005)
Chapter Google Scholar
Oxygen XML Editor, http://www.oxygenxml.com/
Quinlan, J.R., Cameron-Jones, R.M.: Induction of Logic Programs: FOIL and Related Systems. New Generation Computing 13, 287–312 (1995)
Article Google Scholar
Sakamoto, H., Arimura, H., Arikawa, S.: Knowledge Discovery from Semistructured Texts. In: Arikawa, S., Shinohara, A. (eds.) Progress in Discovery Science. LNCS (LNAI), vol. 2281, pp. 586–599. Springer, Heidelberg (2002)
Chapter Google Scholar
Travelocity Web site. http://www.w3.org/TR/xslt
Xiao, L., Wissmann, D., Brown, M., Jablonski, S.: Information Extraction from HTML: Combining XML and Standard Techniques fro IE from the Web. In: Monostori, L., Váncza, J., Ali, M. (eds.) IEA/AIE 2001. LNCS (LNAI), vol. 2070, pp. 165–174. Springer, Heidelberg (2001)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Software Engineering Department, University of Craiova, Bvd.Decebal 107, Craiova, RO-200440, Romania
Elvira Popescu & Costin Bădică
Business Information Systems Department, University of Craiova, A.I.Cuza 13, Craiova, RO-200585, Romania
Amelia Bădică

Authors

Elvira Popescu
View author publications
You can also search for this author in PubMed Google Scholar
Amelia Bădică
View author publications
You can also search for this author in PubMed Google Scholar
Costin Bădică
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Artificial Intelligence, Academy of Humanities and Economics, Poland
Leszek Rutkowski
Institute of Automatics, AGH University of Science and Technology, Al. Mickiewicza 30, PL-30-059, Kraków, Poland
Ryszard Tadeusiewicz
Department of Electrical Engineering and Computer Sciences, Berkeley Initiative in Soft Computing (BISC), University of California, 94720-1776, Berkeley, CA, USA
Lotfi A. Zadeh
Department of Electrical Engineering, University of Louisville, 40292, Louisville, KY, U.S.A
Jacek M. Żurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Popescu, E., Bădică, A., Bădică, C. (2006). Mining Travel Resources on the Web Using L-Wrappers. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2006. ICAISC 2006. Lecture Notes in Computer Science(), vol 4029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11785231_125

Download citation

DOI: https://doi.org/10.1007/11785231_125
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-35748-3
Online ISBN: 978-3-540-35750-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics