Skip to main content

Managing Terabytes of Web Semantics Data

  • Chapter
  • First Online:
Semantic Web Information Management

Abstract

A large amount of semi structured data is now made available on the Web in form of RDF, RDFa and Microformats. In this chapter, we discuss a general model for the Web of Data and, based on our experience in Sindice.com, we discuss how this is reflected in the architecture and components of a large scale infrastructure. Aspects such as data collection, processing, indexing, ranking are touched, and we give an ample example of an applications built on top of said infrastructure.

The authors’ names are listed in alphabetical order for convenience. This was a fully collaborative effort.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bazzanella, B., Stoermer, H., Bouquet, P.: An entity name system (ENS) for the Semantic Web. In: Proceedings of the European Semantic Web Conference (2008)

    Google Scholar 

  2. Berners-Lee, T.: Linked data. W3C Design Issues (2006). http://www.w3.org/DesignIssues/LinkedData.html

  3. Beyer, K., Viglas, S.D., Tatarinov, I., Shanmugasundaram, J., Shekita, E., Zhang, C.: Storing and querying ordered xml using a relational database system. In: SIGMOD ’02: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, pp. 204–215. ACM, New York (2002). doi:10.1145/564691.564715

    Google Scholar 

  4. Carroll, J., Bizer, C., Hayes, P., Stickler, P.: Named graphs, provenance and trust. In: WWW ’05: Proceedings of the 14th International Conference on World Wide Web, pp. 613–622. ACM, New York (2005). doi:10.1145/1060745.1060835

    Chapter  Google Scholar 

  5. d’Aquin, M., Baldassarre, C., Gridinoc, L., Angeletou, S., Sabou, M., Motta, E.: Characterizing knowledge on the Semantic Web with Watson. In: EON, pp. 1–10 (2007)

    Google Scholar 

  6. Delbru, R., Polleres, A., Tummarello, G., Decker, S.: Context dependent reasoning for semantic documents in sindice. In: Proceedings of the 4th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2008) (2008)

    Google Scholar 

  7. Delbru, R., Toupikov, N., Catasta, M., Tummarello, G.: Siren: a semantic information retrieval engine for the web of data. In: Proceedings of the 8th International Semantic Web Conference (ISWC 2009) (2009)

    Google Scholar 

  8. Eiron, N., McCurley, K.S., Tomlin, J.A.: Ranking the web frontier. In: WWW ’04: Proceedings of the 13th International Conference on World Wide Web, pp. 309–318. ACM, New York (2004). doi:10.1145/988672.988714

    Chapter  Google Scholar 

  9. Guha, R.V.: Contexts: a formalization and some applications. Ph.D. Thesis, Stanford, CA, USA (1992)

    Google Scholar 

  10. Guha, R.V., McCool, R., Fikes, R.: Contexts for the Semantic Web. In: International Semantic Web Conference, pp. 32–46 (2004)

    Google Scholar 

  11. Harth, A., Decker, S.: Optimized index structures for querying rdf from the web. In: LA-WEB, pp. 71–80 (2005)

    Google Scholar 

  12. Hayes, P.: RDF Semantics. W3C Recommendation, World Wide Web Consortium (2004)

    Google Scholar 

  13. ter Horst, H.J.: Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary. J. Web Semant. 3(2–3), 79–115 (2005)

    MathSciNet  Google Scholar 

  14. Mayfield, J., Finin, T.: Information retrieval on the Semantic Web: Integrating inference and retrieval. In: Proceedings of the SIGIR Workshop on the Semantic Web (2003)

    Google Scholar 

  15. Miles, A., Baker, T., Swick, R.: Best Practice Recipes for Publishing RDF Vocabularies. Tech. Rep. (2006). http://www.w3.org/TR/swbp-vocab-pub/

  16. Neumann, T., Weikum, G.: RDF-3X—a RISC-style engine for RDF. In: Proceedings of the VLDB Endowment, vol. 1(1), pp. 647–659 (2008). doi:10.1145/1453856.1453927

  17. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical Report 1999-66, Stanford InfoLab (1999)

    Google Scholar 

  18. Polleres, A., Feier, C., Harth, A.: Rules with contextually scoped negation. In: 3rd European Semantic Web Conference (ESWC2006). LNCS, vol. 4011. Springer, Berlin (2006). http://www.polleres.net/publications/poll-etal-2006b.pdf

    Google Scholar 

  19. Su-Cheng, H., Chien-Sing, L.: Node labeling schemes in XML query optimization: a survey and trends. IETE Tech. Rev. 26(2), 88 (2009). doi:10.4103/0256-4602.49086

    Article  Google Scholar 

  20. Zhang, L., Liu, Q., Zhang, J., Wang, H., Pan, Y., Yu, Y.: Semplore: an IR approach to scalable hybrid query of Semantic Web data. In: Proceedings of the 6th International Semantic Web Conference and 2nd Asian Semantic Web Conference. Lecture Notes in Computer Science, vol. 4825, pp. 652–665. Springer, Berlin (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michele Catasta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Catasta, M., Delbru, R., Toupikov, N., Tummarello, G. (2010). Managing Terabytes of Web Semantics Data. In: de Virgilio, R., Giunchiglia, F., Tanca, L. (eds) Semantic Web Information Management. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04329-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04329-1_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04328-4

  • Online ISBN: 978-3-642-04329-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics