Abstract
Data transformation is a key task in mashup development (e.g., access to heterogeneous services, data flow). It is considered as a labour-intensive and error-prone process. The possibility of reusing previously specified mappings promises a significant reduction in manual and time-consuming transformation tasks, nevertheless its potential has not been fully realized in current approaches and systems. In this chapter, we study the problem of data transformation logic reuse in mashup development platforms. We formulate the problem and propose a solution that features novel reuse abstractions and techniques including spreadsheet templates, mapping generalization, and similarity join. Given a spreadsheet instance that is being mapped to the target schema, we recommend a list of mapping formulas that can be potentially reused for the instance. We implemented a prototype of the proposed solution and evaluated its performance via synthetic datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Merrill, D.: Mashups: The new breed of web app. IBM Web Architecture Technical Library, pp. 1–13 (2006)
Yu, J., Benatallah, B., Casati, F., Daniel, F.: Understanding mashup development. IEEE Internet Comput. 12(5), 44–52 (2008)
Y. Corp. Yahoo! pipes. http://pipes.yahoo.com/pipes. Accessed 03 July 2012
Intel Mash Maker. http://mashmaker.intel.com. Accessed 15 June 2012
Kovanovic, V., Djuric, D.: Highway: a domain specific language for enterprise application integration. In: Proceedings of the 5th India Software Engineering Conference, pp. 33–36. ACM (2012)
Kongdenfha, W., Benatallah, B., Vayssière, J., Saint-Paul, R., Casati, F.: Rapid development of spreadsheet-based web mashups. In: Proceedings of the 18th International Conference on World Wide Web, pp. 851–860. ACM (2009)
Hung, V., Benatallah, B., Saint-Paul, R.: Spreadsheet-based complex data transformation. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pp. 1749–1754. ACM (2011)
Scaffidi, C., Shaw, M., Myers, B.: Estimating the numbers of end users and end user programmers. In: VLHCC ’05: Proceedings of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, pp. 207–214. IEEE Computer Society, Washington, DC, USA (2005)
Abraham, R., Erwig, M.: Header and unit inference for spreadsheets through spatial analyses. In: VLHCC ’04: Proceedings of the 2004 IEEE Symposium on Visual Languages—Human Centric Computing, pp. 165–172. IEEE Computer Society, Washington, DC, USA (2004)
Jones, S., Blackwell, A., Burnett, M.: A user-centered approach to functions in excel. In: Proceedings of the 8th ACM SIGPLAN International Conference on Functional Programming, pp. 165–176. ACM Press (2003)
Haas, L.M., Hernández, M.A., Ho, H., Popa, L., Roth, M.: Clio grows up: from research prototype to industrial tool. In: SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 805–810. ACM, New York, NY, USA (2005)
Roth, M., Hernandez, M.A., Coulthard, P., Yan, L., Popa, L., Ho, H.C.-T., Salter, C.C.: Xml mapping technology: making connections in an xml-centric world. IBM Syst. J. 45(2), 389–409 (2006)
Hernandez, M., Miller, R., Haas, L.: Clio: a semi-automatic tool for schema mapping. In: SIGMOD ’01: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, Inc., One Astor Plaza, 1515 Broadway, New York, NY, 10036-5701, USA (2001)
Raffio, A., Braga, D., Ceri, S., Papotti, P., Hernandez, M.: Clip: a visual language for explicit schema mappings. In: 24th International Conference on Data Engineering (2008)
Altova. Mapforce—graphical data mapping, conversion, and integration tool. http://www.altova.com/mapforce.html. Accessed 25 May 2011
IBM. Infosphere Data Architect. http://www-01.ibm.com/software/data/optim/data-architect/. Accessed 25 Oct 2010
Microsoft. Creating Maps Using Biztalk Mapper. http://msdn.microsoft.com/en-us/library/aa559261(v=BTS.70).aspx. Accessed 13 Apr 2011
Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 918–929. VLDB Endowment (2006)
Chaudhuri, S., Ganti, V., Kaushik, R., A primitive operator for similarity joins in data cleaning. In: Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference on, p. 5. IEEE (2006)
Xiao, C., Wang, W., Lin, X., Yu, J., Wang, G.: Efficient similarity joins for near-duplicate detection. ACM Trans. Database Syst. 36(3), 15 (2011)
Lakshmanan, L.V.S., Subramanian, S.N., Goyal, N., Krishnamurthy, R.: On query spreadsheets. In: ICDE ’98: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 134–141. IEEE Computer Society, Washington, DC, USA (1998)
Mecca, G., Papotti, P., Raunich, S.: Core schema mappings. In: SIGMOD (2009)
Robertson, G.G., Czerwinski, M.P., Churchill, J.E.: Visualization of mappings between schemas. In: CHI ’05: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 431–439. ACM, New York, NY, USA (2005)
Rice, F.: Creating xml mappings in excel 2003. Technical Report, Microsoft Corporation (2005)
Brauer, B.: Next evolution of data integration into microsoft excel. Technical Report, StrikeIron (2005)
Erwig, M., Abraham, R., Cooperstein, I., Kollmansberger, S.: Automatic generation and maintenance of correct spreadsheets. In: ICSE ’05: Proceedings of the 27th International Conference on Software Engineering, pp. 136–145. ACM, New York, NY, USA (2005)
Abraham, R., Erwig, M.: Inferring templates from spreadsheets. In: ICSE ’06: Proceedings of the 28th International Conference on Software Engineering, pp. 182–191. ACM, New York, NY, USA (2006)
Fisher, M., Rothermel, G.: The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms. In: ACM SIGSOFT Software Engineering Notes, vol. 30, no. 4, pp. 1–5. ACM (2005)
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19, 1–16 (2007)
Köpcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on real-world match problems. Proc. VLDB Endow. 3, 484–493 (2010)
Gravano, L., Ipeirotis, P., Jagadish, H., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: Proceedings of the International Conference on Very Large Data Bases, pp. 491–500 (2001)
Do, H.-H., Rahm, E.: COMA: a system for flexible combination of schema matching approaches. In: VLDB ’02: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 610–621. VLDB Endowment (2002)
Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.: Corpus-based schema matching. In: ICDE ’05: Proceedings of the 21st International Conference on Data Engineering, pp. 57–68. IEEE Computer Society, Washington, DC, USA (2005)
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J.: Very Large Data Bases 10(4), 334–350 (2001)
Aumueller, D., Do, H.-H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA++. In: SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908. ACM, New York, NY, USA (2005)
Saha, B., Stanoi, I., Clarkson, K.L.: Schema covering: a step towards enabling reuse in information integration. In: ICDE, pp. 285–296 (2010)
Fuxman, A., Hernandez, M.A., Ho, H., Miller, R.J., Papotti, P., Popa, L.: Nested mappings: schema mapping reloaded. In: VLDB ’06: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 67–78. VLDB Endowment (2006)
Rice, F.: Introducing the office (2007) open xml file formats. Technical Report, Microsoft Corporation (2006)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Hung, V., Benatallah, B., Lemos, A.L. (2014). Data Transformation Knowledge Reuse in Spreadsheet-Based Mashup Development Platform. In: Bouguettaya, A., Sheng, Q., Daniel, F. (eds) Web Services Foundations. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7518-7_25
Download citation
DOI: https://doi.org/10.1007/978-1-4614-7518-7_25
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7517-0
Online ISBN: 978-1-4614-7518-7
eBook Packages: Computer ScienceComputer Science (R0)