Abstract
To integrate data on the Internet, we often have to deal with uncertainties when matching data schemas from different sources. The paper proposes an approach called Mashroom+ to support human-machine interactive data mashup, which can better handle uncertainties during the semantic matching process. To improve the correctness of matching results, an interactive matching algorithm is proposed to synthesize the matching results from multiple automatic matchers based on user feedbacks. Meanwhile, to avoid bringing too much burden on users, we utilize the entropy in information theory to measure and quantify the ambiguities of different matchers and calculate the best times for users to participate. An interactive integration environment is developed based on our approach with operator recommendation capability to support on-demand data integration. Experiments show that Mashroom+ approach can achieve good balance between high correctness of matching results and low user burden with real data.
Similar content being viewed by others
References
Agarwal, V., et al.: Understanding approaches for web service composition and execution. In: COMPUTE ’08. ACM, New York, NY (2008)
Altinel, M., et al.: Damia: a data mashup fabric for intranet applications. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 1370–1373 (2007)
Bae, H.R., Grandhi, R.V., Canfield, R.A.: An approximation approach for uncertainty quantification using evidence theory[J]. Reliab. Eng. Syst. Saf. 86(3), 215–225 (2004)
Baxter, R., Christen, P.: A comparison of fast blocking methods for record linkage. In: Proceedings of the ACM SIGKDD Workshop Data Cleaning, Record Linkage, and Object Consolidation, pp. 25–27 (2003)
Benferhat, S., Dubois, D., Kaci, S., Prade, H.: Bipolar possibility theory in preference modeling: representation, fusion and optimal solutions. J. Inf. Fusion 7(1), 135–150 (2006)
Carey, M.J., Onose, N., Petropoulos, M.: Data services. Commun. ACM 55(6), 86–97 (2012)
Castano, S., Ferrara, A., Montanelli, S.: H-match: an algorithm for dynamically matching ontologies in peer-based systems. In: Proceedings of the First Workshop on Semantic Web and Databases (SWDB-03), co-located at VLDB 03 (2003)
Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: Proceedings of the 27th ACM SIGMOD International Conference on Management of Data, pp. 861–874 (2008)
Di Lorenzo, G., et al.: Data integration in mashups. J. SIGMOD Rec. 38(1), 59–66 (2009)
Do, H.H., Rahm, E.: COMA – a system for flexible combination of schema matching approaches. In: Proceedings of the Very Large Data Bases Conference (VLDB), pp. 610–621 (2001)
Do, H., Melnik, S., Rahm, E.: Comparison of schema matching evaluations. In: Proceedings of the 2nd International Workshop on Web Databases, Erfurt, Germany, pp. 221–237 (2002)
Dong, X., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 687–698 (2007)
Dustdar, S., et al.: Quality-aware service-oriented data integration: requirements, state of the art and open challenges. J. SIGMOD Rec. 41(1), 11–19 (2012)
Dziubecki, P., Grabowski, P., Krysiñski, M., Kuczyñski, T., Kurowski, K., Szejnfeld, D.: Easy development and integration of science gateways with vine toolkit. J. Grid Computing 10(4), 631–645 (2012)
Fuxman, A., Hernandez, M.A., Ho, H., Miller, R.J., Papotti, P., Popa, L.: Nested mappings: schema mapping reloaded. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 67–78. VLDB Endowment (2006)
Gal, A.: Uncertain schema matching. In: Synthesis Lectures on Data Management, vol. 3, no. 1, pp. 1–97. Morgan & Claypool Publishers (2011)
Gal, A., Anaby-Tavor, A., Trombetta, A., Montesi, D.: A framework for modeling and evaluating automatic semantic reconciliation. VLDB J. 14(1), 50–67 (2005)
Hall, P.A., Dowling, G.R.: Approximate string matching. ACM Comput. Surv. (CSUR) 12(4), 381–402 (1980)
Han, Y., et al.: Situational data integration with data services and nested table. SOCA 7(2), 129–150 (2012)
Hoang, D.D., Paik, H., Benatallah, B.: An analysis of spreadsheet-based services mashup. In: Proceedings of the Twenty-First Australasian Conference on Database Technologies (ADC ’10), pp. 141–150. Australian Computer Society, Inc.: Darlinghurst, Australia (2010)
Hung, V., Benatallah, B., Saint-Paul, R.: Spreadsheet-based complex data transformation. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pp. 1749–1754 (2011)
Ihara, S.: Information Theory for Continuous Systems. World Scientific (1993)
Ives, Z., et al.: Interactive data integration through smart copy \& paste. In: Proceedings of Fourth Biennial Conference on Innovative Data Systems Research (CIDR, online proceedings) (2009)
Jones, M.C., Churchill, E.F.: Conversations in developer communities: a preliminary analysis of the yahoo! pipes community. In: Proceedings of the Fourth International Conference on Communities and Technologies, pp. 195–204 (2009)
Köpcke, H., Rahm, E.: Frameworks for entity matching: a comparison. Data Knowl. Eng. 69(2), 197–210 (2010)
Kongdenfha, W., Benatallah, B., et al.: Rapid development of spreadsheet-based web mashups. In: Proc. 18th International Conf. on World Wide Web (WWW), pp. 851–860. ACM (2009)
Lenzerini, M.: Data integration: a theoretical perspective. In: Symposium on Principles of Database Systems: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 233–246 (2002)
Liu, B., Jagadish, H.: A spreadsheet algebra for a direct data manipulation query interface. In: Proceedings of International Conference on Very Large Database (VLDB), pp. 417–428 (2009)
Ludwig, S.A., Rana, O.F., Padget, J., Naylor, W.: Matchmaking framework for mathematical web services. J. Grid Computing 4(1), 33–48 (2006)
Magnani, M., et al.: Schema integration based on uncertain semantic mappings. Conceptual Modeling–ER 2005, pp. 31–46 (2005)
Makinouchi, A.: A consideration of normal form of notnecessarily-normalized relations in the relational data model. In: Proceedings of the 3rd VLDB Conference (Tokyo), pp. 445–453 (1977)
Miller, A.G.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Miller, R.J., et al.: The Clio project: managing heterogeneity. J. SIGMOD Rec. 30(1), 78–83 (2001)
Nottelmann, H., Straccia, U.: splmap: a probabilistic approach to schema matching. In: European Conference on Information Retrieval, pp. 81–95 (2005)
Noy, N., Musen, M.: Anchor-PROMPT: using non-local context for semantic matching. In: Proceedings of the workshop on Ontologies and Information Sharing at the International Joint Conference on Artificial Intelligence (IJCAI), pp. 63–70 (2001)
Obrenovic, Z., Gasevic, D.: End-user service computing: spreadsheets as a service composition tool. IEEE Trans. Serv. Comput. 1(4), 229–242 (2008)
Popa, L., Velegrakis, Y., Hernández, M.A., Miller, R.J., Fagin, R.: Translating web data. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 598–609. VLDB Endowment (2002)
Raffio, A., et al.: Clip: a visual language for explicit schema mappings. In: Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE), pp. 30–39 (2008)
Said, M.P., Kojima, I.: S-MDS: Semantic monitoring and discovery system for the Grid. J. Grid Computing 7(2), 205–224 (2009)
Schulz, S., Blochinger, W., Hannak, H.: Capability-aware information aggregation in peer-to-peer Grids. J. Grid Computing 7(2), 135–167 (2009)
Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal on Data Semantics 4, 146–171 (2005)
van Keulen, M., de Keijzer, A.: Qualitative effects of knowledge rules and user feedback in probabilistic data integration. VLDB J. 18(5), 1191–1217 (2009)
Wang, Y., Liu, W., Bell, D.: Combining uncertain outputs from multiple ontology matchers. In: Proceedings of the 1st International Conference on Scalable Uncertainty Management (SUM), pp. 201–214 (2007)
Wang, G., Yang, S., Han, Y.: Mashroom: end-user mashup programming using nested tables. In: Proceedings of the 18th International Conference on World Wide Web, pp. 861–870 (2009)
Wen, Y., Liu, C., Han, Y.: On-demand integration of cross-organizational data in an intelligent and user-guided way. Journal of Xi’an Jiaotong University Xuebao 47(2), 116–123 (2013)
Yang, K., Steele, R.: An ontology mediated web service aggregation hub. In: IEEE/WIC/ACM International Conference on Web Intelligence, pp. 572–576 (2007)
Yang, S., Wang, G., Han, Y.: Grubber: allowing end-users to develop XML-based wrappers for web data sources. APWeb/WAIM, pp. 647–652 (2009)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, C., Wang, J. & Han, Y. Mashroom+: An Interactive Data Mashup Approach with Uncertainty Handling. J Grid Computing 12, 221–244 (2014). https://doi.org/10.1007/s10723-013-9280-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10723-013-9280-5