Skip to main content

Mining Travel Resources on the Web Using L-Wrappers

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4029))

Abstract

The work described here is part of an ongoing research on the application of general-purpose inductive logic programming, logic representation of wrappers (L-wrappers) and XML technologies (including the XSLT transformation language) to information extraction from the Web. The L-wrappers methodology is based on a sound theoretical approach and has already proved its efficacy on a smaller scale, in the area of collecting product information. This paper proposes the use of L-wrappers for tuple extraction from HTML in the domain of e-tourism. It also outlines a method for translating L-wrappers into XSLT and illustrates it with the example of a real-world travel agency Web site.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   139.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bădică, C., Bădică, A.: Logic Wrappers and XSLT Transformations for Tuples Extraction from HTML. In: Bressan, S., Ceri, S., Hunt, E., Ives, Z.G., Bellahsène, Z., Rys, M., Unland, R. (eds.) XSym 2005. LNCS, vol. 3671, pp. 177–191. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  2. Bădică, C., Bădică, A., Popescu, E.: Tuples Extraction from HTML Using Logic Wrappers and Inductive Logic Programming. In: Szczepaniak, P.S., Kacprzyk, J., Niewiadomski, A. (eds.) AWIC 2005. LNCS (LNAI), vol. 3528, pp. 44–50. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Bex, G.J., Maneth, S., Neven, F.: A formal model for an expressive fragment of XSLT. Information Systems (27), 21–39 (2002)

    Article  MATH  Google Scholar 

  4. Clark, J.: XSLT Transformation (XSLT) Version 1.0, W3C Recommendation November 16 (1999), http://www.w3.org/TR/xslt2

  5. Chidlovskii, B.: Information Extraction from Tree Documents by Learning Subtree Delimiters. In: Proc. IIWeb 2003, Acapulco, Mexico, pp. 3–8 (2003)

    Google Scholar 

  6. Freitag, D.: Information extraction from HTML: application of a general machine learning approach. In: Proc. AAAI 1998, pp. 517–523 (1998)

    Google Scholar 

  7. Ikeda, D., Yamada, Y., Hirokawa, S.: Expressive Power of Tree and String Based Wrappers. In: Proc. IIWeb 2003, Acapulco, Mexoco, pp. 16–21 (2003)

    Google Scholar 

  8. Knoblock, C.: Agents for Gathering, Integrating, and Monitoring Information for Travel Planning. In: Intelligent Systems for Tourism. IEEE Intelligent Systems. pp. 53–66, November/December (2002)

    Google Scholar 

  9. Kosala, R., Bussche, J., van den Bruynooghe, M., Blockeel, H.: Information Extraction in Structured Documents Using Tree Automata Induction. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) PKDD 2002. LNCS (LNAI), vol. 2431, pp. 299–310. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  10. Kushmerick, N., Thomas, B.: Adaptive Information Extraction: Core Technologies for Information Agents. In: Klusch, M., Bergamaschi, S., Edwards, P., Petta, P. (eds.) Intelligent Information Agents. LNCS (LNAI), vol. 2586, pp. 79–103. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Laender, A.H.F., Ribeiro-Neto, B., Silva, A.S., Teixeira., J.S.: A Brief Survey of Web Data Extraction Tools. SIGMOD Record 31(2), 84–93 (2002)

    Article  Google Scholar 

  12. Laudon, K.C., Traver, C.G.: E-commerce. business. technology. society, 2nd edn. Pearson Addison-Wesley, London (2004)

    Google Scholar 

  13. Li, Z., Ng, W.K.: WDEE: Web Data Extraction by Example. In: Zhou, L.-z., Ooi, B.-C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 347–358. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  14. Oxygen XML Editor, http://www.oxygenxml.com/

  15. Quinlan, J.R., Cameron-Jones, R.M.: Induction of Logic Programs: FOIL and Related Systems. New Generation Computing 13, 287–312 (1995)

    Article  Google Scholar 

  16. Sakamoto, H., Arimura, H., Arikawa, S.: Knowledge Discovery from Semistructured Texts. In: Arikawa, S., Shinohara, A. (eds.) Progress in Discovery Science. LNCS (LNAI), vol. 2281, pp. 586–599. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  17. Travelocity Web site. http://www.w3.org/TR/xslt

  18. Xiao, L., Wissmann, D., Brown, M., Jablonski, S.: Information Extraction from HTML: Combining XML and Standard Techniques fro IE from the Web. In: Monostori, L., Váncza, J., Ali, M. (eds.) IEA/AIE 2001. LNCS (LNAI), vol. 2070, pp. 165–174. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Popescu, E., Bădică, A., Bădică, C. (2006). Mining Travel Resources on the Web Using L-Wrappers. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds) Artificial Intelligence and Soft Computing – ICAISC 2006. ICAISC 2006. Lecture Notes in Computer Science(), vol 4029. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11785231_125

Download citation

  • DOI: https://doi.org/10.1007/11785231_125

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35748-3

  • Online ISBN: 978-3-540-35750-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics