Skip to main content

Chapter 7: Dataspaces

  • Chapter
Search Computing

Abstract

The vision of dataspaces is to provide various of the benefits of classical data integration, but with reduced up-front costs, combined with opportunities for incremental refinement, enabling a “pay as you go” approach. As such, dataspaces join a long stream of research activities that aim to build tools that simplify integrated access to distributed data. To address dataspace challenges, many different techniques may need to be considered: data integration from multiple sources, machine learning approaches to resolving schema heterogeneity, integration of structured and unstructured data, management of uncertainty, and query processing and optimization. Results that seek to realize the different visions exhibit considerable variety in their contexts, priorities and techniques. This chapter presents a classification of the key concepts in the area, encouraging the use of consistent terminology, and enabling a systematic comparison of proposals. This chapter also seeks to identify common and complementary ideas in the dataspace and search computing literatures, in so doing identifying opportunities for both areas and open issues for further research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Amer-Yahia, S., Botev, C., Shanmugasundaram, J.: Texquery: a full-text search extension to xquery. In: WWW 2004: Proceedings of the 13th international conference on World Wide Web, pp. 583–594. ACM, New York (2004)

    Google Scholar 

  2. Atzeni, P., Cappellari, P., Torlone, R., Bernstein, P.A., Gianforme, G.: Model-independent schema translation. VLDB J. 17(6), 1347–1370 (2008)

    Article  Google Scholar 

  3. Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A., Hedeler, C.: Feedback-based annotation, selection and refinement of schema mappings for dataspaces. In: EDBT (2010)

    Google Scholar 

  4. Blunschi, L., Dittrich, J.-P., Girard, O.R., Karakashian, S.K., Salles, M.A.V.: A dataspace odyssey: The imemex personal dataspace management system (demo). In: CIDR, pp. 114–119 (2007)

    Google Scholar 

  5. Braga, D., Ceri, S., Daniel, F., Martinenghi, D.: Optimization of multi-domain queries on the web. PVLDB 1(1), 562–573 (2008)

    Google Scholar 

  6. Cafarella, M.J., Etzioni, O.: A search engine for natural language applications. In: WWW 2005: Proceedings of the 14th international conference on World Wide Web, pp. 442–452. ACM, New York (2005)

    Google Scholar 

  7. Cafarella, M.J., Halevy, A.Y., Khoussainova, N.: Data integration for the relational web. PVLDB 2(1), 1090–1101 (2009)

    Google Scholar 

  8. Chakrabarti, S., Puniyani, K., Das, S.: Optimizing scoring functions and indexes for proximity search in type-annotated corpora. In: WWW 2006: Proceedings of the 15th international conference on World Wide Web, pp. 717–726. ACM, New York (2006)

    Google Scholar 

  9. Chaudhuri, S., Das, G., Hristidis, V., Weikum, G.: Probabilistic information retrieval approach for ranking of database query results. ACM Trans. Database Syst. 31(3), 1134–1168 (2006)

    Article  Google Scholar 

  10. Dittrich, J.-P., Salles, M.A.V.: idm: A unified and versatile data model for personal dataspace management. In: VLDB 2006: 32nd International Conference on Very Large Data Bases, pp. 367–378. ACM, New York (2006)

    Google Scholar 

  11. Doan, A., Ramakrishnan, R., Chen, F., DeRose, P., Lee, Y., McCann, R., Sayyadian, M., Shen, W.: Community information management. IEEE Data Eng. Bull. 29(1), 64–72 (2006)

    Google Scholar 

  12. Dong, X., Halevy, A.Y.: A platform for personal information management and integration. In: CIDR 2005, pp. 119–130 (2005)

    Google Scholar 

  13. Dong, X., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: VLDB 2007: 33rd International Conference on Very Large Data Bases, pp. 687–698 (2007)

    Google Scholar 

  14. Dong, X.L., Halevy, A.Y., Yu, C.: Data integration with uncertainty. VLDB J. 18(2), 469–500 (2009)

    Article  Google Scholar 

  15. Florescu, D., Kossmann, D., Manolescu, I.: Integrating keyword search into xml query processing. In: Proceedings of the 9th international World Wide Web conference on Computer networks: the international journal of computer and telecommunications netowrking, pp. 119–135. North-Holland Publishing Co., Amsterdam (2000)

    Google Scholar 

  16. Franklin, M., Halevy, A., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Record 34(4), 27–33 (2005)

    Article  Google Scholar 

  17. Haas, L., Lin, E., Roth, M.: Data integration through database federation. IBM Systems Journal 41(4), 578–596 (2002)

    Article  Google Scholar 

  18. Halevy, A., Franklin, M., Maier, D.: Principles of dataspace systems. In: PODS 2006: Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 1–9. ACM, New York (2006)

    Chapter  Google Scholar 

  19. Hedeler, C., Belhajjame, K., Fernandes, A.A.A., Embury, S.M., Paton, N.W.: Dimensions of dataspaces. In: Sexton, A.P. (ed.) BNCOD 2009. LNCS, vol. 5588, pp. 55–66. Springer, Heidelberg (2009)

    Google Scholar 

  20. Howe, B., Maier, D., Rayner, N., Rucker, J.: Quarrying dataspaces: Schemaless profiling of unfamiliar information sources. In: ICDE Workshops, pp. 270–277. IEEE Computer Society, Los Alamitos (2008)

    Google Scholar 

  21. Ives, Z.G., Green, T.J., Karvounarakis, G., Taylor, N.E., Tannen, V., Talukdar, P.P., Jacob, M., Pereira, F.: The orchestra collaborative data sharing system. SIGMOD Record 37(3), 26–32 (2008)

    Article  Google Scholar 

  22. Ives, Z.G., Knoblock, C.A., Minton, S., Jacob, M., Talukdar, P.P., Tuchinda, R., Ambite, J.L., Muslea, M., Gazen, C.: Interactive data integration through smart copy & paste. In: CIDR (2009)

    Google Scholar 

  23. Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 847–860. ACM, New York (2008)

    Chapter  Google Scholar 

  24. Leser, U., Naumann, F.: (almost) hands-off information integration for the life sciences. In: Conf. on Innovative Database Research (CIDR), pp. 131–143 (2005)

    Google Scholar 

  25. Llu, J., Dong, X., Halevy, A.: Answering structured queries on unstructured data. In: WebDB 2006, pp. 25–30 (2006)

    Google Scholar 

  26. Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.: Corpus-based schema matching. In: International Conference on Data Engineering (ICDE 2005), pp. 57–68 (2005)

    Google Scholar 

  27. Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C.: Web-scale data integration: You can afford to pay as you go. In: CIDR 2007: Third Biennial Conference on Innovative Data Systems Research, pp. 342–350 (2007)

    Google Scholar 

  28. Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C.: Web-scale data integration: You can afford to pay as you go. In: CIDR, pp. 342–350 (2007)

    Google Scholar 

  29. McCann, R., Shen, W., Doan, A.: Matching schemas in online communities: A web 2.0 approach. In: ICDE, pp. 110–119 (2008)

    Google Scholar 

  30. Miller, R.J., Hernández, M.A., Haas, L.M., Yan, L., Ho, C.T.H., Fagin, R., Popa, L.: The clio project: managing heterogeneity. SIGMOD Record 30(1), 78–83 (2001)

    Article  Google Scholar 

  31. Pottinger, R., Bernstein, P.A.: Schema merging and mapping creation for relational sources. In: EDBT, pp. 73–84 (2008)

    Google Scholar 

  32. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal: Very Large Data Bases 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  33. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J. 10(4), 334–350 (2001)

    Article  MATH  Google Scholar 

  34. Salles, M.A.V., Dittrich, J.-P., Karakashian, S.K., Girard, O.R., Blunschi, L.: itrails: Pay-as-you-go information integration in dataspaces. In: VLDB 2007: 33rd International Conference on Very Large Data Bases, pp. 663–674. ACM, New York (2007)

    Google Scholar 

  35. Sarma, A.D., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: SIGMOD 2008: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, pp. 861–874. ACM, New York (2008)

    Chapter  Google Scholar 

  36. Sarma, A.D., Dong, X.L., Halevy, A.Y.: Data modeling in dataspace support platforms. In: Borgida, A.T., Chaudhri, V.K., Giorgini, P., Yu, E.S. (eds.) Conceptual Modeling: Foundations and Applications. LNCS, vol. 5600, pp. 122–138. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  37. Talukdar, P.P., Jacob, M., Mehmood, M.S., Crammer, K., Ives, Z.G., Pereira, F., Guha, S.: Learning to create data-integrating queries. PVLDB 1(1), 785–796 (2008)

    Google Scholar 

  38. Tatemura, J., Chen, S., Liao, F., Po, O., Candan, K.S., Agrawal, D.: Uqbe: uncertain query by example for web service mashup. In: SIGMOD Conference, pp. 1275–1280 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Hedeler, C., Belhajjame, K., Paton, N.W., Campi, A., Fernandes, A.A.A., Embury, S.M. (2010). Chapter 7: Dataspaces. In: Ceri, S., Brambilla, M. (eds) Search Computing. Lecture Notes in Computer Science, vol 5950. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12310-8_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12310-8_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12309-2

  • Online ISBN: 978-3-642-12310-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics