Abstract
Data analysis in rich spaces of heterogeneous data sources is an increasingly common activity. Examples include exploratory data analysis and personal information management. Mapping specification is one of the key issues in this data management setting that answer to the need of a unified search over the full spectrum of relevant knowledge. Indeed, while users in data analytics are engaged in an open-ended interaction between data discovery and data orchestration, most of the solutions for mapping specification available so far are intended for expert users.
This paper proposes a general framework for a novel paradigm for user-driven mapping discovery where mapping specification is interactively driven by the information seeking activities of users and the exclusive role of mappings is to contribute to users satisfaction. The underlying key idea is that data semantics is in the eye of the consumers. Thus, we start from user queries which we try to satisfy in the dataspace. In this process of satisfaction, we often need to discover new mappings, to expose the user to the data thereby discovered for their feedback, and possibly continued towards user satisfaction.
The framework is made up of (a) a theoretical foundation where we formally introduce the notion of candidate mapping sets for a user query, and (b) an interactive and incremental algorithm that, given a user query, finds a candidate mapping set that satisfies the user. The algorithm incrementally builds the candidate mapping set by searching in the dataspace data samples and deriving mapping lattices that are explored to deliver mappings for user feedback. With the aim of fitting the user information need in a limited number of interactions, the algorithm provides for a multi-criteria selection strategy for candidate mapping sets. Finally, a proof of the correctness of the algorithm is provided in the paper.
This work is partially founded by the University of Modena and Reggio Emilia under the project “MapQS: mapping discovery and refinement driven by query samples in dataspace”.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
h(B) means the point-wise application of h to the variables and constants in B.
- 2.
For compactness reasons, in the following \(The\ name\ of\ the\ rose\) will be abbreviated in the text with nor.
- 3.
This is always possible since we assume that B is connected for all queries.
- 4.
For compactness reasons, in the following \(Sliding\ doors\) will be abbreviated in the text with \(sd^4\), .
References
Abiteboul, S., Marian, A.: Personal information management systems. In: EDBT Tutorial (2015)
Alexe, B., ten Cate, B., Kolaitis, P.G., Tan, W.C.: Designing and refining schema mappings via data examples. In: Proceedings of ACM SIGMOD, pp. 133–144 (2011)
Belhajjame, K., Paton, N.W., Embury, S.M., Fernandes, A.A.A., Hedeler, C.: Incrementally improving dataspaces based on user feedback. Inf. Syst. 38(5), 656–687 (2013)
Bonifati, A., Comignani, U., Coquery, E., Thion, R.: Interactive mapping specification with exemplar tuples. In: Proceedings of SIGMOD, pp. 667–682 (2017)
Buoncristiano, M., et al.: Database challenges for exploratory computing. SIGMOD Rec. 44(2), 17–22 (2015)
ten Cate, B., Kolaitis, P.G.: Structural characterizations of schema-mapping languages. Commun. ACM 53(1), 101–110 (2010)
Fagin, R., Kolaitis, P.G., Popa, L.: Data exchange: getting to the core. ACM TODS 30(1), 174–210 (2005)
Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Synthesis Lectures on the Semantic Web. Morgan & Claypool Publishers, San Rafael (2011)
Jeffery, S.R., Franklin, M.J., Halevy, A.Y.: Pay-as-you-go user feedback for dataspace systems. In: Proceedings of ACM SIGMOD, pp. 847–860 (2008)
Kantere, V., Orfanoudakis, G., Kementsietsidis, A., Sellis, T.K.: Query relaxation across heterogeneous data sources. In: Proceedings of ACM CIKM, pp. 473–482 (2015)
Mandreoli, F., Martoglia, R., Penzo, W.: Approximating expressive queries on graph-modeled data: The GeX approach. Elservier JSS 109, 106–123 (2015)
Mecca, G., Papotti, P., Raunich, S.: Core schema mappings. In: Proceedings of ACM SIGMOD, pp. 655–668 (2009)
Rekatsinas, T., Deshpande, A., Dong, X.L., Getoor, L., Srivastava, D.: Finding quality in quantity: the challenge of discovering valuable sources for integration. In: CIDR (2015)
Rivero, C.R., Hernández, I., Ruiz, D., Corchuelo, R.: Mostodex: a tool to exchange rdf data using exchange samples. Elsevier JSS 100, 67–79 (2015)
Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. JDS 4, 146–171 (2005)
Torre-Bastida, A.I., Bermúdez, J., Illarramendi, A., Mena, E., González, M.: Query rewriting for an incremental search in heterogeneous linked data sources. In: Larsen, H.L., Martin-Bautista, M.J., Vila, M.A., Andreasen, T., Christiansen, H. (eds.) FQAS 2013. LNCS, vol. 8132, pp. 13–24. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40769-7_2
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Mandreoli, F. (2017). A Framework for User-Driven Mapping Discovery in Rich Spaces of Heterogeneous Data. In: Panetto, H., et al. On the Move to Meaningful Internet Systems. OTM 2017 Conferences. OTM 2017. Lecture Notes in Computer Science(), vol 10574. Springer, Cham. https://doi.org/10.1007/978-3-319-69459-7_27
Download citation
DOI: https://doi.org/10.1007/978-3-319-69459-7_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69458-0
Online ISBN: 978-3-319-69459-7
eBook Packages: Computer ScienceComputer Science (R0)