Skip to main content

Advertisement

Log in

Mashroom+: An Interactive Data Mashup Approach with Uncertainty Handling

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

To integrate data on the Internet, we often have to deal with uncertainties when matching data schemas from different sources. The paper proposes an approach called Mashroom+ to support human-machine interactive data mashup, which can better handle uncertainties during the semantic matching process. To improve the correctness of matching results, an interactive matching algorithm is proposed to synthesize the matching results from multiple automatic matchers based on user feedbacks. Meanwhile, to avoid bringing too much burden on users, we utilize the entropy in information theory to measure and quantify the ambiguities of different matchers and calculate the best times for users to participate. An interactive integration environment is developed based on our approach with operator recommendation capability to support on-demand data integration. Experiments show that Mashroom+ approach can achieve good balance between high correctness of matching results and low user burden with real data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agarwal, V., et al.: Understanding approaches for web service composition and execution. In: COMPUTE ’08. ACM, New York, NY (2008)

    Google Scholar 

  2. Altinel, M., et al.: Damia: a data mashup fabric for intranet applications. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 1370–1373 (2007)

  3. Bae, H.R., Grandhi, R.V., Canfield, R.A.: An approximation approach for uncertainty quantification using evidence theory[J]. Reliab. Eng. Syst. Saf. 86(3), 215–225 (2004)

    Article  Google Scholar 

  4. Baxter, R., Christen, P.: A comparison of fast blocking methods for record linkage. In: Proceedings of the ACM SIGKDD Workshop Data Cleaning, Record Linkage, and Object Consolidation, pp. 25–27 (2003)

  5. Benferhat, S., Dubois, D., Kaci, S., Prade, H.: Bipolar possibility theory in preference modeling: representation, fusion and optimal solutions. J. Inf. Fusion 7(1), 135–150 (2006)

    Article  Google Scholar 

  6. Carey, M.J., Onose, N., Petropoulos, M.: Data services. Commun. ACM 55(6), 86–97 (2012)

    Article  Google Scholar 

  7. Castano, S., Ferrara, A., Montanelli, S.: H-match: an algorithm for dynamically matching ontologies in peer-based systems. In: Proceedings of the First Workshop on Semantic Web and Databases (SWDB-03), co-located at VLDB 03 (2003)

  8. Das Sarma, A., Dong, X., Halevy, A.: Bootstrapping pay-as-you-go data integration systems. In: Proceedings of the 27th ACM SIGMOD International Conference on Management of Data, pp. 861–874 (2008)

  9. Di Lorenzo, G., et al.: Data integration in mashups. J. SIGMOD Rec. 38(1), 59–66 (2009)

    Article  MathSciNet  Google Scholar 

  10. Do, H.H., Rahm, E.: COMA – a system for flexible combination of schema matching approaches. In: Proceedings of the Very Large Data Bases Conference (VLDB), pp. 610–621 (2001)

  11. Do, H., Melnik, S., Rahm, E.: Comparison of schema matching evaluations. In: Proceedings of the 2nd International Workshop on Web Databases, Erfurt, Germany, pp. 221–237 (2002)

  12. Dong, X., Halevy, A.Y., Yu, C.: Data integration with uncertainty. In: Proceedings of the 33rd International Conference on Very Large Data Bases, pp. 687–698 (2007)

  13. Dustdar, S., et al.: Quality-aware service-oriented data integration: requirements, state of the art and open challenges. J. SIGMOD Rec. 41(1), 11–19 (2012)

    Article  Google Scholar 

  14. Dziubecki, P., Grabowski, P., Krysiñski, M., Kuczyñski, T., Kurowski, K., Szejnfeld, D.: Easy development and integration of science gateways with vine toolkit. J. Grid Computing 10(4), 631–645 (2012)

    Article  Google Scholar 

  15. Fuxman, A., Hernandez, M.A., Ho, H., Miller, R.J., Papotti, P., Popa, L.: Nested mappings: schema mapping reloaded. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 67–78. VLDB Endowment (2006)

  16. Gal, A.: Uncertain schema matching. In: Synthesis Lectures on Data Management, vol. 3, no. 1, pp. 1–97. Morgan & Claypool Publishers (2011)

  17. Gal, A., Anaby-Tavor, A., Trombetta, A., Montesi, D.: A framework for modeling and evaluating automatic semantic reconciliation. VLDB J. 14(1), 50–67 (2005)

    Article  Google Scholar 

  18. Hall, P.A., Dowling, G.R.: Approximate string matching. ACM Comput. Surv. (CSUR) 12(4), 381–402 (1980)

    Article  MathSciNet  Google Scholar 

  19. Han, Y., et al.: Situational data integration with data services and nested table. SOCA 7(2), 129–150 (2012)

    Article  Google Scholar 

  20. Hoang, D.D., Paik, H., Benatallah, B.: An analysis of spreadsheet-based services mashup. In: Proceedings of the Twenty-First Australasian Conference on Database Technologies (ADC ’10), pp. 141–150. Australian Computer Society, Inc.: Darlinghurst, Australia (2010)

  21. Hung, V., Benatallah, B., Saint-Paul, R.: Spreadsheet-based complex data transformation. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pp. 1749–1754 (2011)

  22. Ihara, S.: Information Theory for Continuous Systems. World Scientific (1993)

  23. Ives, Z., et al.: Interactive data integration through smart copy \& paste. In: Proceedings of Fourth Biennial Conference on Innovative Data Systems Research (CIDR, online proceedings) (2009)

  24. Jones, M.C., Churchill, E.F.: Conversations in developer communities: a preliminary analysis of the yahoo! pipes community. In: Proceedings of the Fourth International Conference on Communities and Technologies, pp. 195–204 (2009)

  25. Köpcke, H., Rahm, E.: Frameworks for entity matching: a comparison. Data Knowl. Eng. 69(2), 197–210 (2010)

    Article  Google Scholar 

  26. Kongdenfha, W., Benatallah, B., et al.: Rapid development of spreadsheet-based web mashups. In: Proc. 18th International Conf. on World Wide Web (WWW), pp. 851–860. ACM (2009)

  27. Lenzerini, M.: Data integration: a theoretical perspective. In: Symposium on Principles of Database Systems: Proceedings of the Twenty-First ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 233–246 (2002)

  28. Liu, B., Jagadish, H.: A spreadsheet algebra for a direct data manipulation query interface. In: Proceedings of International Conference on Very Large Database (VLDB), pp. 417–428 (2009)

  29. Ludwig, S.A., Rana, O.F., Padget, J., Naylor, W.: Matchmaking framework for mathematical web services. J. Grid Computing 4(1), 33–48 (2006)

    Article  Google Scholar 

  30. Magnani, M., et al.: Schema integration based on uncertain semantic mappings. Conceptual Modeling–ER 2005, pp. 31–46 (2005)

  31. Makinouchi, A.: A consideration of normal form of notnecessarily-normalized relations in the relational data model. In: Proceedings of the 3rd VLDB Conference (Tokyo), pp. 445–453 (1977)

  32. Miller, A.G.: WordNet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)

    Article  Google Scholar 

  33. Miller, R.J., et al.: The Clio project: managing heterogeneity. J. SIGMOD Rec. 30(1), 78–83 (2001)

    Article  Google Scholar 

  34. Nottelmann, H., Straccia, U.: splmap: a probabilistic approach to schema matching. In: European Conference on Information Retrieval, pp. 81–95 (2005)

  35. Noy, N., Musen, M.: Anchor-PROMPT: using non-local context for semantic matching. In: Proceedings of the workshop on Ontologies and Information Sharing at the International Joint Conference on Artificial Intelligence (IJCAI), pp. 63–70 (2001)

  36. Obrenovic, Z., Gasevic, D.: End-user service computing: spreadsheets as a service composition tool. IEEE Trans. Serv. Comput. 1(4), 229–242 (2008)

    Article  Google Scholar 

  37. Popa, L., Velegrakis, Y., Hernández, M.A., Miller, R.J., Fagin, R.: Translating web data. In: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 598–609. VLDB Endowment (2002)

  38. Raffio, A., et al.: Clip: a visual language for explicit schema mappings. In: Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE), pp. 30–39 (2008)

  39. Said, M.P., Kojima, I.: S-MDS: Semantic monitoring and discovery system for the Grid. J. Grid Computing 7(2), 205–224 (2009)

    Article  Google Scholar 

  40. Schulz, S., Blochinger, W., Hannak, H.: Capability-aware information aggregation in peer-to-peer Grids. J. Grid Computing 7(2), 135–167 (2009)

    Article  Google Scholar 

  41. Shvaiko, P., Euzenat, J.: A survey of schema-based matching approaches. Journal on Data Semantics 4, 146–171 (2005)

    Google Scholar 

  42. van Keulen, M., de Keijzer, A.: Qualitative effects of knowledge rules and user feedback in probabilistic data integration. VLDB J. 18(5), 1191–1217 (2009)

    Article  Google Scholar 

  43. Wang, Y., Liu, W., Bell, D.: Combining uncertain outputs from multiple ontology matchers. In: Proceedings of the 1st International Conference on Scalable Uncertainty Management (SUM), pp. 201–214 (2007)

  44. Wang, G., Yang, S., Han, Y.: Mashroom: end-user mashup programming using nested tables. In: Proceedings of the 18th International Conference on World Wide Web, pp. 861–870 (2009)

  45. Wen, Y., Liu, C., Han, Y.: On-demand integration of cross-organizational data in an intelligent and user-guided way. Journal of Xi’an Jiaotong University Xuebao 47(2), 116–123 (2013)

    Google Scholar 

  46. Yang, K., Steele, R.: An ontology mediated web service aggregation hub. In: IEEE/WIC/ACM International Conference on Web Intelligence, pp. 572–576 (2007)

  47. Yang, S., Wang, G., Han, Y.: Grubber: allowing end-users to develop XML-based wrappers for web data sources. APWeb/WAIM, pp. 647–652 (2009)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chen Liu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, C., Wang, J. & Han, Y. Mashroom+: An Interactive Data Mashup Approach with Uncertainty Handling. J Grid Computing 12, 221–244 (2014). https://doi.org/10.1007/s10723-013-9280-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-013-9280-5

Keywords

Navigation