Skip to main content

High-Level Web Data Abstraction Using Language Integrated Query

  • Conference paper
Intelligent Distributed Computing IV

Part of the book series: Studies in Computational Intelligence ((SCI,volume 315))

  • 605 Accesses

Abstract

Web pages containing huge amount of information are designed for human readers; it makes their automatic computer processing difficult. Moreover web pages live their content is changing. Once a page is downloaded and processed, few seconds after that its content can be different. Many scraping frameworks and extraction mechanisms have been proposed and implemented; their common task is to download and extract required data. Nevertheless, the complexity of development of such application is enormous since the nature of data does not conform to common programming paradigms. Moreover, the changing content of the web pages often implies repetitive extracting of the whole data set.

This paper describes the LinqToWeb framework for web data extraction. It is designed in an innovative way that allows defining strongly typed object model transparently reflecting data on the living web. This mechanism provides access to raw web data in a completely object oriented way using modern techniques of Language Integrated Query (LINQ). Using this framework development of web-based applications such as data semantization tools is more efficient, type-safe, and the resulting product is easily maintainable and extendable.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Gisle Aas: HTML Parser informations, http://search.cpan.org/~GAAS/HTML-Parser/

  2. Bednarek, D., Dokulil, J., Yaghob, J., Zavoral, F.: Using Methods of Paral-lel Semi-structured Data Processing for SemanticWeb. In: Proceedings of SEMAPRO 2009. IEEE Computer Society Press, Los Alamitos (2009)

    Google Scholar 

  3. Beno, M., Misek, J., Zavoral, F.: AgentMat: Framework for Data Scraping and Semantization. In: RCIS, Fez, Morocco (2009)

    Google Scholar 

  4. Box, D., Hejlsberg, A.: LINQ:NET Language-Integrated Query. In: MSDN (2007)

    Google Scholar 

  5. Dokulil, J., Yaghob, J., Zavoral, F.: Trisolda: The Environment for Semantic Data Processing. International Journal On Advances in Software 2008, IARIA 1(1) (2009)

    Google Scholar 

  6. Friedl, J.: Mastering Regular Expressions. O’Reilly Media, Inc., Sebastopol (2006)

    Google Scholar 

  7. Kulkarni, D., Bolognese, L., Warren, M., Hejlsberg, A., George, K.: LINQ to SQL:NET Language-Integrated Query for Relational Data

    Google Scholar 

  8. Lester, A.: WWW:Mechanize, http://search.cpan.org/~petdance/WWW-Mechanize-1.52/

  9. Mackay, C.A.: Using .NET Enumerators, The Code Project (2003), http://www.codeproject.com/KB/cs/csenumerators.aspx

  10. Misek, J.: LinqToWeb Language Definition, Technical report KSI 2010/01, Charles University in Prague (2010)

    Google Scholar 

  11. Misek, J.: LINQ to Web project, http://linqtoweb.codeplex.com/

  12. Ekiwi: Screen scraper informations, http://www.screen-scraper.com/

  13. Kapow Technologies: Kapowtech Mashup Server informations, http://www.kapowtech.com

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Misek, J., Zavoral, F. (2010). High-Level Web Data Abstraction Using Language Integrated Query. In: Essaaidi, M., Malgeri, M., Badica, C. (eds) Intelligent Distributed Computing IV. Studies in Computational Intelligence, vol 315. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15211-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15211-5_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15210-8

  • Online ISBN: 978-3-642-15211-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics