Skip to main content

A Reverse Engineering Process for Inferring Data Models from Spreadsheet-based Information Systems: An Automotive Industrial Experience

  • Conference paper
  • First Online:
Data Management Technologies and Applications (DATA 2014)

Abstract

Nowadays Spreadsheet-based Information Systems are widely used in industries to support different phases of their production processes. The intensive employment of Spreadsheets in industry is mainly due to their ease of use that allows the development of Information Systems even by not experienced programmers. The development of such systems is further aided by integrated scripting languages (e.g. Visual Basic for Applications, Libre Office Basic, JavaScript, etc.) that offer features for the implementation of Rapid Application Development processes. Although Spreadsheet-based Information Systems can be developed with a very short time to market, they are usually poorly documented or in some case not documented at all. As a consequence, they are very difficult to be comprehended, maintained or migrated towards other architectures, such as Database Oriented Information Systems or Web Applications. The abstraction of a data model from the source spreadsheet files represents a fundamental activity of the migration process towards different architectures. In our work we present an heuristic- based reverse engineering process for inferring a data model from an Excel based information system. The process is fully automatic and it is based on seven sequential steps. Both the applicability and the effectiveness of the proposed process have been assessed by an experiment we conducted in the automotive industrial context. The process was successfully used to obtain the UML class diagrams representing the conceptual data models of three different Spreadsheet-based Information Systems. The paper presents the results of the experiment and the lessons we learned from it.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://msdn.microsoft.com/en-us/library/wss56bz7.aspx.

  2. 2.

    https://www.talend.com/.

References

  1. Abraham, R., Erwig, M.: Header and unit inference for spreadsheets through spatial analyses. In: Proceedings of the IEEE International Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 165–172 (2004)

    Google Scholar 

  2. Abraham, R., Erwig, M.: Inferring templates from spreadsheets. In: Proceedings of the 28th International Conference on Software Engineering (ICSE), pp. 182–191. ACM, New York (2006)

    Google Scholar 

  3. Abraham, R., Erwig, M., Andrew, S.: A type system based on end-user vocabulary. In: Proceedings of the IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC), pp. 215–222. IEEE Computer Society, Washington, DC (2007)

    Google Scholar 

  4. Abraham, R., Erwig, M.: Mutation operators for spreadsheets. IEEE Trans. Softw. Eng. 35(1), 94–108 (2009)

    Article  Google Scholar 

  5. Ahmad, Y., Antoniu, T., Goldwater, S., Krishnamurthi S.: A type system for statically detecting spreadsheet errors. In: Proceedings of the IEEE International Conference on Automated Software Engineering, pp. 174–183. (2003)

    Google Scholar 

  6. Amalfitano, D., Fasolino, A.R., Maggio, V., Tramontana, P., Di Mare, G., Ferrara, F., Scala, S.: Migrating legacy spreadsheets-based systems to Web MVC architecture: an industrial case study. In: Proceedings of CSMR-WCRE, pp. 387–390 (2014)

    Google Scholar 

  7. Amalfitano, D., Fasolino, A.R., Maggio, V., Tramontana, P., De Simone, V.: Reverse engineering of data models from legacy spreadsheets-based systems: an Industrial Case Study. In: Proceedings of the 22nd Italian Symposium on Advanced Database System, pp. 123–130 (2014)

    Google Scholar 

  8. Amalfitano, D., Fasolino, A.R., Tramontana, P., De Simone, V., Di Mare, G., Scala, S.: Information extraction from legacy spreadsheet-based information system - an experience in the automotive context. In: DATA 2014, pp. 389–398 (2014)

    Google Scholar 

  9. Bovenzi, D., Canfora, G., Fasolino, A.R.: Enabling legacy system accessibility by Web heterogeneous clients. In: Proceedings of the Seventh European Conference on Software Maintenance and Reengineering, pp. 73–81. IEEE CS Press (2003)

    Google Scholar 

  10. Canfora, G., Fasolino, A.R., Frattolillo, G., Tramontana, P.: A wrapping approach for migrating legacy system interactive functionalities to service oriented architectures. Elsevier, J. Syst. Softw. 81(4), 463–480 (2008)

    Article  Google Scholar 

  11. Chen, Z., Cafarella, M.: Automatic web spreadsheet data extraction. In: Proceedings of the 3rd International Workshop on Semantic Search Over the Web (SS@ 2013), p. 8. ACM, New York (2013)

    Google Scholar 

  12. Cunha, J., Saraiva J., Visser, J.: From spreadsheets to relational databases and back. In: Proceedings of the 2009 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation, PEPM 2009, pp 179–188. ACM, New York (2009)

    Google Scholar 

  13. Cunha, J., Erwig, M., Saraiva, J.: Automatically inferring ClassSheet models from spreadsheets. In: Proceedings of the 2010 IEEE Symposium on Visual Languages and Human-Centric Computing, VLHCC 2010, pp 93–100. IEEE Computer Society (2010)

    Google Scholar 

  14. Cunha, J., Mendes J., Fernandes J.P., Saraiva J.: Embedding and evolution of spreadsheet models in spreadsheet systems. In: VL/HCC 2011: IEEE Symposium on Visual Languages and Human-Centric Computing, pp 186–201. IEEE Computer Society (2011)

    Google Scholar 

  15. Cunha, J., Fernandes, J.P., Mendes, J., Pacheco, H., Saraiva, J.: Bidirectional transformation of model-driven spreadsheets. In: Hu, Z., de Lara, J. (eds.) ICMT 2012. LNCS, vol. 7307, pp. 105–120. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  16. Cunha, J., Fernandes, J.P., Mendes, J., Saraiva, J.: MDSheet: A framework for model-driven spreadsheet engineering. In: Proceedings of the 34rd International Conference on Software Engineering, ICSE 2012, pp 1412–1415. ACM (2012)

    Google Scholar 

  17. Cunha, J., Fernandes, J.P., Mendes, J., Saraiva, J.: Towards an evaluation of bidirectional model-driven spreadsheets. In: User evaluation for Software Engineering Researchers, USER 2012, pp 25–28. ACM Digital Library (2012)

    Google Scholar 

  18. Cunha, J., Fernandes, J.P., Saraiva, J.: From relational ClassSheets to UML+OCL. In: The Software Engineering Track at the 27th Annual ACM Symposium on Applied Computing (SAC 2012), Riva del Garda (Trento), Italy, pp. 1151–1158. ACM (2012)

    Google Scholar 

  19. Cunha, J., Mendes, J., Saraiva, J., Visser, J.: Model-based programming environments for spreadsheets. Sci. Comput. Program. (SCP) 96(2), 254–275 (2014)

    Article  Google Scholar 

  20. Cunha, J., Fernandes, J., Mendes, J., Saraiva, J.: Embedding, evolution, and validation of model-driven spreadsheets. IEEE Trans. Softw. Eng. 41(3), 241–263 (2014)

    Article  Google Scholar 

  21. Cunha, J., Erwig, M., Mendes, J., Saraiva, J.: Model inference for spreadsheets. Autom. Softw. Eng., 1–32 (2014). Springer, USA

    Google Scholar 

  22. De Lucia, A., Francese, R., Scanniello, G., Tortora, G.: Developing legacy system migration methods and tools for technology transfer. Softw. Pract. Experience 38(13), 1333–1364 (2008). Wiley

    Article  Google Scholar 

  23. Di Lucca, G.A., Fasolino, A.R., De Carlini, U.: Recovering class diagrams from data-intensive legacy systems. In: Proceedings of International Conference on Software Maintenance, ICSM, pp. 52–62. IEEE CS Press (2000)

    Google Scholar 

  24. Fisher, M., Rothermel, G.: The EUSES spreadsheet corpus: a shared resource for supporting experimentation with spreadsheet dependability mechanisms. In: 1st Workshop on End-User Software Engineering, pp. 47–51 (2005)

    Google Scholar 

  25. Hermans, F., Pinzger, M., van Deursen, A.: Automatically extracting class diagrams from spreadsheets. In: D’Hondt, T. (ed.) ECOOP 2010. LNCS, vol. 6183, pp. 52–75. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  26. Hermans F., Pinzger, M., van Deursen, A.: Supporting professional spreadsheet users by generating leveled dataflow diagrams. In: Proceedings of the 33rd International Conference on Software Engineering (ICSE 2011), pp. 451–460. ACM, New York (2011)

    Google Scholar 

  27. Hung, V., Benatallah, B., Saint-Paul R.: Spreadsheet-based complex data transformation. In: Proceedings of the 20th ACM International Conference on Information and Knowledge management (CIKM 2011), pp. 1749–1754. ACM, New York (2011)

    Google Scholar 

  28. Janvrin, D., Morrison, J.: Using a structured design approach to reduce risks in end user spreadsheet development. Inf. Manag. 37(1), 1–12 (2000)

    Article  Google Scholar 

  29. Mittermeir, R., Clermont, M.: Finding high-level structures in spreadsheet programs. In: Proceedings of the Ninth Working Conference on Reverse Engineering (WCRE), pp. 221–232. IEEE Computer Society (2002)

    Google Scholar 

  30. Panko, R.R., Halverson, R.P.: Individual and group spreadsheet design: patterns of errors. In: Proceedings of the Hawaii International Conference on System Sciences (HICSS), pp. 4–10 (1994)

    Google Scholar 

  31. Ronen, B., Palley, M.A., Lucas, H.C.: Spreadsheet analysis and design. Commun. ACM 32, 84–93 (1989)

    Article  Google Scholar 

  32. Scaffidi, C., Shaw, M., Myers, B.: Estimating the numbers of end users and end user programmers. In: 2005 IEEE Symposium on Visual Languages and Human-Centric Computing, 20–24 September 2015, pp. 207–214 (2005)

    Google Scholar 

  33. Shokry, H., Hinchey, M.: Model-based verification of embedded software. IEEE Comput. 42(4), 53–59 (2009)

    Article  Google Scholar 

Download references

Acknowledgements

This work was carried out in the contexts of the research projects IESWECAN (Informatics for Embedded SoftWare Engineering of Construction and Agricultural machiNes - PON01-01516) and APPS4SAFETY (Active Preventive Passive Solutions for Safety - PON03PE_00159_3), both partially founded by the Italian Ministry for University and Research (MIUR).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Porfirio Tramontana .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Amalfitano, D., Fasolino, A.R., Tramontana, P., De Simone, V., Di Mare, G., Scala, S. (2015). A Reverse Engineering Process for Inferring Data Models from Spreadsheet-based Information Systems: An Automotive Industrial Experience. In: Helfert, M., Holzinger, A., Belo, O., Francalanci, C. (eds) Data Management Technologies and Applications. DATA 2014. Communications in Computer and Information Science, vol 178. Springer, Cham. https://doi.org/10.1007/978-3-319-25936-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25936-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25935-2

  • Online ISBN: 978-3-319-25936-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics