Abstract
The vision of dataspaces is to provide various of the benefits of classical data integration, but with reduced up-front costs. Combining this with opportunities for incremental refinement enables a ‘pay-as-you-go’ approach to data integration, resulting in simplified integrated access to distributed data. It has been speculated that model management could provide the basis for Dataspace Management, however, this has not been investigated until now.
Here, we present DSToolkit, the first dataspace management system that is based on model management, and therefore, benefits from the flexibility provided by the approach for the management of schemas represented in heterogeneous models, supports the complete dataspace lifecycle, which includes automatic initialisation, maintenance and improvement of a dataspace, and allows the user to provide feedback by annotating result tuples returned as a result of queries the user has posed. The user feedback gathered is utilised for improvement by annotating, selecting and refining mappings. Without the need for additional feedback on a new data source, these techniques can also be applied to determine its perceived quality with respect to already gathered feedback and to identify the best mappings over all sources including the new one.
The work reported in this paper was supported by a grant from the EPSRC.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Atzeni, P., Bellomarini, L., Bugiotti, F., Gianforme, G.: Mism: A platform for model-independent solutions to model management problems. J. Data Semantics 14, 133–161 (2009)
Atzeni, P., Gianforme, G., Cappellari, P.: A universal metamodel and its dictionary. T. Large-Scale Data- and Knowledge-Centered Systems 1, 38–62 (2009)
Aumueller, D., Do, H.H., Massmann, S., Rahm, E.: Schema and ontology matching with coma++. In: SIGMOD Conference, pp. 906–908 (2005)
Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C.: Feedback-based annotation, selection and refinement of schema mappings for dataspaces. In: EDBT, pp. 573–584 (2010)
Belhajjame, K., Paton, N.W., Fernandes, A.A.A., Hedeler, C., Embury, S.M.: User feedback as a first class citizen in information integration systems. In: CIDR, pp. 175–183 (2011)
Bernstein, P.A.: Applying model management to classical meta data problems. In: CIDR, pp. 209–220 (2003)
Bernstein, P.A., Halevy, A.Y., Pottinger, R.A.: A vision for management of complex models. SIGMOD Record 29(4), 55–63 (2000)
Bernstein, P.A., Melnik, S.: Model management 2.0: manipulating richer mappings. In: SIGMOD Conference, pp. 1–12 (2007)
Bernstein, P.A., Melnik, S., Petropoulos, M., Quix, C.: Industrial-strength schema matching. SIGMOD Record 33(4), 38–43 (2004)
Cao, H., Qi, Y., Candan, K.S., Sapino, M.L.: Feedback-driven result ranking and query refinement for exploring semi-structured data collections. In: EDBT, pp. 3–14 (2010)
Chai, X., Vuong, B.Q., Doan, A., Naughton, J.F.: Efficiently incorporating user feedback into information extraction and integration programs. In: SIGMOD Conference, pp. 87–100 (2009)
Chiticariu, L., Kolaitis, P.G., Popa, L.: Interactive generation of integrated schemas. In: SIGMOD Conference, pp. 833–846 (2008)
Chiticariu, L., Tan, W.C.: Debugging schema mappings with routes. In: VLDB, pp. 79–90 (2006)
Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: SIGMOD, pp. 861–874 (2008)
Dittrich, J., Salles, M.A.V., Blunschi, L.: imemex: From search to information integration and back. IEEE Data Eng. Bull. 32(2), 28–35 (2009)
Do, H.H., Rahm, E.: Coma: a system for flexible combination of schema matching approaches. In: VLDB, pp. 610–621 (2002)
Do, H.H., Rahm, E.: Matching large schemas: Approaches and evaluation. Inf. Syst. 32(6), 857–885 (2007)
Dong, X., Halevy, A.Y.: A platform for personal information management and integration. In: CIDR, pp. 119–130 (2005)
Franklin, M.J., Halevy, A.Y., Maier, D.: From databases to dataspaces: a new abstraction for information management. SIGMOD Record 34(4), 27–33 (2005)
Garcia-Molina, H., Ullman, J.D., Widom, J.: Database Systems The Complete Book. Pearson International edn., 2nd edn. (2009)
Graefe, G.: Encapsulation of parallelism in the volcano query processing system. In: SIGMOD Conference, pp. 102–111 (1990)
Haas, L.: Beauty and the Beast: The Theory and Practice of Information Integration. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 28–43. Springer, Heidelberg (2006)
Haas, L., Lin, E., Roth, M.: Data integration through database federation. IBM Systems Journal 41(4), 578–596 (2002)
Halevy, A.Y.: Answering queries using views: A survey. The VLDB Journal 10(4), 270–294 (2001)
Halevy, A.Y., Franklin, M.J., Maier, D.: Principles of dataspace systems. In: PODS, pp. 1–9 (2006)
Hedeler, C., Belhajjame, K., Fernandes, A.A.A., Embury, S.M., Paton, N.W.: Dimensions of Dataspaces. In: Sexton, A.P. (ed.) BNCOD 2009. LNCS, vol. 5588, pp. 55–66. Springer, Heidelberg (2009)
Hedeler, C., Belhajjame, K., Paton, N.W., Campi, A., Fernandes, A.A.A., Embury, S.M.: Dataspaces. In: SeCO Workshop, pp. 114–134 (2009)
Hedeler, C., Paton, N.W.: Utilising the MISM Model Independent Schema Management Platform for Query Evaluation. In: Fernandes, A.A.A., Gray, A.J.G., Belhajjame, K. (eds.) BNCOD 2011. LNCS, vol. 7051, pp. 108–117. Springer, Heidelberg (2011)
Ives, Z.G., Green, T.J., Karvounarakis, G., Taylor, N.E., Tannen, V., Talukdar, P.P., Jacob, M., Pereira, F.: The orchestra collaborative data sharing system. SIGMOD Record 37(3), 26–32 (2008)
Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: SIGMOD Conference, pp. 847–860 (2008)
Kensche, D., Quix, C., Li, X., Li, Y., Jarke, M.: Generic schema mappings for composition and query answering. Data & Knowledge Engineering (DKE) 68(7), 599–621 (2009)
Kim, W., Choi, I., Gala, S.K., Scheevel, M.: On resolving schematic heterogeneity in multidatabase systems. Distributed and Parallel Databases 1(3), 251–279 (1993)
Kim, W., Seo, J.: Classifying schematic and data heterogeneity in multidatabase systems. IEEE Computer 24(12), 12–18 (1991)
Lynden, S., Mukherjee, A., Hume, A.C., Fernandes, A.A.A., Paton, N.W., Sakellariou, R., Watson, P.: The design and implementation of OGSA-DQP: A service-based distributed query processor. Future Generation Comp. Syst. 25(3), 224–236 (2009)
Madhavan, J., Cohen, S., Dong, X.L., Halevy, A.Y., Jeffery, S.R., Ko, D., Yu, C.: Web-scale data integration: You can afford to pay as you go. In: CIDR, pp. 342–350 (2007)
Mao, L., Belhajjame, K., Paton, N.W., Fernandes, A.A.A.: Defining and Using Schematic Correspondences for Automatically Generating Schema Mappings. In: van Eck, P., Gordijn, J., Wieringa, R. (eds.) CAiSE 2009. LNCS, vol. 5565, pp. 79–93. Springer, Heidelberg (2009)
McBrien, P., Poulovassilis, A.: P2P Query Reformulation over Both-As-View Data Transformation Rules. In: Moro, G., Bergamaschi, S., Joseph, S., Morin, J.-H., Ouksel, A.M. (eds.) DBISP2P 2005 and DBISP2P 2006. LNCS, vol. 4125, pp. 310–322. Springer, Heidelberg (2007)
McCann, R., Kramnik, A., Shen, W., Varadarajan, V., Sobulo, O., Doan, A.: Integrating data from disparate sources: A mass collaboration approach. In: ICDE, pp. 487–488 (2005)
Melnik, S., Rahm, E., Bernstein, P.A.: Rondo: a programming platform for generic model management. In: SIGMOD, pp. 193–204 (2003)
Michalewicz, Z., Fogel, D.: How to solve it: modern heuristics. Springer, Heidelberg (2000)
Mork, P., Seligman, L., Rosenthal, A., Korb, J., Wolf, C.: The harmony integration workbench. J. Data Semantics 11, 65–93 (2008)
Naumann, F., Leser, U., Freytag, J.C.: Quality-driven integration of heterogenous information systems. In: VLDB, pp. 447–458 (1999)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB Journal 10(4), 334–350 (2001)
Scannapieco, M., Virgillito, A., Marchetti, C., Mecella, M., Baldoni, R.: The architecture: a platform for exchanging and improving data quality in cooperative information systems. Inf. Syst. 29(7), 551–582 (2004)
Seligman, L., Mork, P., Halevy, A.Y., Smith, K.P., Carey, M.J., Chen, K., Wolf, C., Madhavan, J., Kannan, A., Burdick, D.: Openii: an open source information integration toolkit. In: SIGMOD Conference, pp. 1057–1060 (2010)
Smith, A., Rizopoulos, N., McBrien, P.: AutoMed Model Management. In: Li, Q., Spaccapietra, S., Yu, E., Olivé, A. (eds.) ER 2008. LNCS, vol. 5231, pp. 542–543. Springer, Heidelberg (2008)
Talukdar, P.P., Ives, Z.G., Pereira, F.: Automatically incorporating new sources in keyword search-based data integration. In: SIGMOD Conference, pp. 387–398 (2010)
Talukdar, P.P., Jacob, M., Mehmood, M.S., Crammer, K., Ives, Z.G., Pereira, F., Guha, S.: Learning to create data-integrating queries. PVLDB 1(1), 785–796 (2008)
Wang, R.Y.: A product perspective on total data quality management. Commun. ACM 41(2), 58–65 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Hedeler, C. et al. (2012). DSToolkit: An Architecture for Flexible Dataspace Management. In: Hameurlain, A., Küng, J., Wagner, R. (eds) Transactions on Large-Scale Data- and Knowledge-Centered Systems V. Lecture Notes in Computer Science, vol 7100. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28148-8_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-28148-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28147-1
Online ISBN: 978-3-642-28148-8
eBook Packages: Computer ScienceComputer Science (R0)