Data Transformation Knowledge Reuse in Spreadsheet-Based Mashup Development Platform

Hung, Vu; Benatallah, Boualem; Lemos, Angel Lagares

doi:10.1007/978-1-4614-7518-7_25

Vu Hung⁴,
Boualem Benatallah⁴ &
Angel Lagares Lemos⁴

2257 Accesses
2 Citations

Abstract

Data transformation is a key task in mashup development (e.g., access to heterogeneous services, data flow). It is considered as a labour-intensive and error-prone process. The possibility of reusing previously specified mappings promises a significant reduction in manual and time-consuming transformation tasks, nevertheless its potential has not been fully realized in current approaches and systems. In this chapter, we study the problem of data transformation logic reuse in mashup development platforms. We formulate the problem and propose a solution that features novel reuse abstractions and techniques including spreadsheet templates, mapping generalization, and similarity join. Given a spreadsheet instance that is being mapped to the target schema, we recommend a list of mapping formulas that can be potentially reused for the instance. We implemented a prototype of the proposed solution and evaluated its performance via synthetic datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Merrill, D.: Mashups: The new breed of web app. IBM Web Architecture Technical Library, pp. 1–13 (2006)
Google Scholar
Yu, J., Benatallah, B., Casati, F., Daniel, F.: Understanding mashup development. IEEE Internet Comput. 12(5), 44–52 (2008)
Article Google Scholar
Y. Corp. Yahoo! pipes. http://pipes.yahoo.com/pipes. Accessed 03 July 2012
Intel Mash Maker. http://mashmaker.intel.com. Accessed 15 June 2012
Kovanovic, V., Djuric, D.: Highway: a domain specific language for enterprise application integration. In: Proceedings of the 5th India Software Engineering Conference, pp. 33–36. ACM (2012)
Google Scholar
Kongdenfha, W., Benatallah, B., Vayssière, J., Saint-Paul, R., Casati, F.: Rapid development of spreadsheet-based web mashups. In: Proceedings of the 18th International Conference on World Wide Web, pp. 851–860. ACM (2009)
Google Scholar
Hung, V., Benatallah, B., Saint-Paul, R.: Spreadsheet-based complex data transformation. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM), pp. 1749–1754. ACM (2011)
Google Scholar
Scaffidi, C., Shaw, M., Myers, B.: Estimating the numbers of end users and end user programmers. In: VLHCC ’05: Proceedings of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, pp. 207–214. IEEE Computer Society, Washington, DC, USA (2005)
Google Scholar
Abraham, R., Erwig, M.: Header and unit inference for spreadsheets through spatial analyses. In: VLHCC ’04: Proceedings of the 2004 IEEE Symposium on Visual Languages—Human Centric Computing, pp. 165–172. IEEE Computer Society, Washington, DC, USA (2004)
Google Scholar
Jones, S., Blackwell, A., Burnett, M.: A user-centered approach to functions in excel. In: Proceedings of the 8th ACM SIGPLAN International Conference on Functional Programming, pp. 165–176. ACM Press (2003)
Google Scholar
Haas, L.M., Hernández, M.A., Ho, H., Popa, L., Roth, M.: Clio grows up: from research prototype to industrial tool. In: SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 805–810. ACM, New York, NY, USA (2005)
Google Scholar
Roth, M., Hernandez, M.A., Coulthard, P., Yan, L., Popa, L., Ho, H.C.-T., Salter, C.C.: Xml mapping technology: making connections in an xml-centric world. IBM Syst. J. 45(2), 389–409 (2006)
Article Google Scholar
Hernandez, M., Miller, R., Haas, L.: Clio: a semi-automatic tool for schema mapping. In: SIGMOD ’01: Proceedings of the 2001 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, Inc., One Astor Plaza, 1515 Broadway, New York, NY, 10036-5701, USA (2001)
Google Scholar
Raffio, A., Braga, D., Ceri, S., Papotti, P., Hernandez, M.: Clip: a visual language for explicit schema mappings. In: 24th International Conference on Data Engineering (2008)
Google Scholar
Altova. Mapforce—graphical data mapping, conversion, and integration tool. http://www.altova.com/mapforce.html. Accessed 25 May 2011
IBM. Infosphere Data Architect. http://www-01.ibm.com/software/data/optim/data-architect/. Accessed 25 Oct 2010
Microsoft. Creating Maps Using Biztalk Mapper. http://msdn.microsoft.com/en-us/library/aa559261(v=BTS.70).aspx. Accessed 13 Apr 2011
Arasu, A., Ganti, V., Kaushik, R.: Efficient exact set-similarity joins. In: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 918–929. VLDB Endowment (2006)
Google Scholar
Chaudhuri, S., Ganti, V., Kaushik, R., A primitive operator for similarity joins in data cleaning. In: Data Engineering, 2006. ICDE’06. Proceedings of the 22nd International Conference on, p. 5. IEEE (2006)
Google Scholar
Xiao, C., Wang, W., Lin, X., Yu, J., Wang, G.: Efficient similarity joins for near-duplicate detection. ACM Trans. Database Syst. 36(3), 15 (2011)
Article Google Scholar
Lakshmanan, L.V.S., Subramanian, S.N., Goyal, N., Krishnamurthy, R.: On query spreadsheets. In: ICDE ’98: Proceedings of the Fourteenth International Conference on Data Engineering, pp. 134–141. IEEE Computer Society, Washington, DC, USA (1998)
Google Scholar
Mecca, G., Papotti, P., Raunich, S.: Core schema mappings. In: SIGMOD (2009)
Google Scholar
Robertson, G.G., Czerwinski, M.P., Churchill, J.E.: Visualization of mappings between schemas. In: CHI ’05: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 431–439. ACM, New York, NY, USA (2005)
Google Scholar
Rice, F.: Creating xml mappings in excel 2003. Technical Report, Microsoft Corporation (2005)
Google Scholar
Brauer, B.: Next evolution of data integration into microsoft excel. Technical Report, StrikeIron (2005)
Google Scholar
Erwig, M., Abraham, R., Cooperstein, I., Kollmansberger, S.: Automatic generation and maintenance of correct spreadsheets. In: ICSE ’05: Proceedings of the 27th International Conference on Software Engineering, pp. 136–145. ACM, New York, NY, USA (2005)
Google Scholar
Abraham, R., Erwig, M.: Inferring templates from spreadsheets. In: ICSE ’06: Proceedings of the 28th International Conference on Software Engineering, pp. 182–191. ACM, New York, NY, USA (2006)
Google Scholar
Fisher, M., Rothermel, G.: The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms. In: ACM SIGSOFT Software Engineering Notes, vol. 30, no. 4, pp. 1–5. ACM (2005)
Google Scholar
Elmagarmid, A.K., Ipeirotis, P.G., Verykios, V.S.: Duplicate record detection: a survey. IEEE Trans. Knowl. Data Eng. 19, 1–16 (2007)
Article Google Scholar
Köpcke, H., Thor, A., Rahm, E.: Evaluation of entity resolution approaches on real-world match problems. Proc. VLDB Endow. 3, 484–493 (2010)
Google Scholar
Gravano, L., Ipeirotis, P., Jagadish, H., Koudas, N., Muthukrishnan, S., Srivastava, D.: Approximate string joins in a database (almost) for free. In: Proceedings of the International Conference on Very Large Data Bases, pp. 491–500 (2001)
Google Scholar
Do, H.-H., Rahm, E.: COMA: a system for flexible combination of schema matching approaches. In: VLDB ’02: Proceedings of the 28th International Conference on Very Large Data Bases, pp. 610–621. VLDB Endowment (2002)
Google Scholar
Madhavan, J., Bernstein, P.A., Doan, A., Halevy, A.: Corpus-based schema matching. In: ICDE ’05: Proceedings of the 21st International Conference on Data Engineering, pp. 57–68. IEEE Computer Society, Washington, DC, USA (2005)
Google Scholar
Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching. VLDB J.: Very Large Data Bases 10(4), 334–350 (2001)
Article MATH Google Scholar
Aumueller, D., Do, H.-H., Massmann, S., Rahm, E.: Schema and ontology matching with COMA++. In: SIGMOD ’05: Proceedings of the 2005 ACM SIGMOD International Conference on Management of Data, pp. 906–908. ACM, New York, NY, USA (2005)
Google Scholar
Saha, B., Stanoi, I., Clarkson, K.L.: Schema covering: a step towards enabling reuse in information integration. In: ICDE, pp. 285–296 (2010)
Google Scholar
Fuxman, A., Hernandez, M.A., Ho, H., Miller, R.J., Papotti, P., Popa, L.: Nested mappings: schema mapping reloaded. In: VLDB ’06: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 67–78. VLDB Endowment (2006)
Google Scholar
Rice, F.: Introducing the office (2007) open xml file formats. Technical Report, Microsoft Corporation (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

University of New South Wales, Sydney, Australia
Vu Hung, Boualem Benatallah & Angel Lagares Lemos

Authors

Vu Hung
View author publications
You can also search for this author in PubMed Google Scholar
Boualem Benatallah
View author publications
You can also search for this author in PubMed Google Scholar
Angel Lagares Lemos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vu Hung .

Editor information

Editors and Affiliations

School of Computer Science and Information Technology, RMIT University, Melbourne, Victoria, Australia
Athman Bouguettaya
School of Computer Science, University of Adelaide School of Computer Science, Adelaide, South Australia, Australia
Quan Z. Sheng
dell'Informazione, Università di Trento Dipto. Ingegneria e Scienza, Povo, Trento, Italy
Florian Daniel

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hung, V., Benatallah, B., Lemos, A.L. (2014). Data Transformation Knowledge Reuse in Spreadsheet-Based Mashup Development Platform. In: Bouguettaya, A., Sheng, Q., Daniel, F. (eds) Web Services Foundations. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7518-7_25

Download citation

DOI: https://doi.org/10.1007/978-1-4614-7518-7_25
Published: 04 September 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7517-0
Online ISBN: 978-1-4614-7518-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Data Transformation Knowledge Reuse in Spreadsheet-Based Mashup Development Platform