Skip to main content
Log in

Empirical validation of structural metrics for predicting understandability of conceptual schemas for data warehouse

  • Original Article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

Data warehouse (DW) quality depends on its data models (conceptual, logical and physical model). Multidimensional (MD) modeling has been widely recognized as the backbone of data modeling for DW. Recently, some of the authors have proposed a set of structural metrics to assess quality of MD conceptual models. They have found the significant relationship between metrics and understandability of DW conceptual schemas using various correlation analysis techniques such as Spearman’s, Pearson etc. However, advanced statistical and machine learning methods have not been used to predict effect of each metric on understandability. In this paper, our focus is on predicting the effect of structural metrics on understandability of conceptual schemas using (i) statistical method (logistic regression analysis) that include univariate and multivariate analysis, (ii) machine learning methods (Decision Trees, Naive Bayesian Classifier) and (iii) compare the performance of these statistical and machine learning methods. The results obtained show that some of the metrics individually have a significant effect on the understandability of MD conceptual schema. Further, few of the metrics have a significant combined effect on understandability of conceptual schema. The results also show that the performance of Naive Bayesian Classifier prediction method is better than logistic regression analysis and Decision Trees methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Abello A, Samos J, Saltor F (2001) A framework for the classification and description of multidimensional data models. In: Proceedings of 12th international conference on database and expert systems applications (DEXA’2001), Springer-Verlag, Munich

  • Abello A, Samos J, Saltor F (2002) YAM2 (yet another multidimensional model): an extension of UML. In: Proceedings of international database engineering and applications symposium (IDEAS’2002), IEEE Computer Society, Edmonton, pp 172–181

  • Aggarwal KK, Singh Y, Kaur A, Malhotra R (2009) Empirical analysis for investigating the effect of object-oriented metrics on fault proneness: a replicated case study. Softw Process Improv Pract 16(1):39–62

    Article  Google Scholar 

  • Basili V, Briand L, Melo W (1996) A validation of object-oriented design metrics as quality Indicators. IEEE Trans Softw Eng 22(10):751–761

    Article  Google Scholar 

  • Blaschka M, Sapia C, Hofling G, Dinter B (1998) Finding your way through multidimensional data models. In: 9th International conference on database and expert systems applications (DEXA’98), Springer-Verlag, Vienna, pp 198–203

  • Bouzeghoub M, Kedad Z (2002) Information and database quality, chapter 8. In: Piattini M, Calero C, Genero M (eds) Quality in data warehousing. Kluwer Academic Publishers, Boston, pp 163–198

    Google Scholar 

  • Briand L, El Emam K, Morasca S (1995) Theoretical and empirical validation of software product measures. Technical Report ISERN-95-03, International Software Engineering Research Network

  • Cherfil SS, Prat N (2003) Multidimensional schemas quality: assessing and balancing analyzability and simplicity. In: Proceedings of ER Workshops, Springer LNCS 2814, pp 140–151

  • El Emam K, Benlarbi S, Goel N, Rai S (1999) A validation of object-oriented metrics. NRC Technical report ERB-1063

  • English L (1996) Information quality improvement: principles, methods and management. Information Impact International, Inc., Brentwood

    Google Scholar 

  • Fenton N, Pfleeger S (1997) Software metrics: a rigorous approach. Chapman & Hall, London

    Google Scholar 

  • Golfarelli M, Rizzi S (1998) A methodological framework for data warehouse design. In: 1st International Workshop on Data Warehousing and OLAP (DOLAP’98), Bethesda, pp 3–9

  • Golfarelli M, Maio D, Rizzi S (1998) The dimensional fact model: a conceptual model for data warehouses. Int J Coop Inf Syst 7:215–247

    Article  Google Scholar 

  • Han J, Kamber M (2007) Data mining: concepts and techniques. Morgan Kaufman, San Francisco

    Google Scholar 

  • Harinarayan V, Rajaraman A, Ullman JD (1996) Implementing data cubes efficiently. In: Proceedings of ACM SIGMOD international conference on management of data, pp 205–216

  • Hosmer D, Lemeshow S (1989) Applied logistic regression. Wiley, New York

    Google Scholar 

  • Husemann B, Lechtenborger J, Vossen G (2000) Conceptual data warehouse design. In: Proceedings of the international workshop on design and management of data warehouses (DMDW’2000), Stockholm, pp 3–9

  • Inmon WH (2003) Building the data warehouse. Wiley, New York

    Google Scholar 

  • Jarke M, Lenzerini M, Vassiliou Y, Vassiliadis P (2002) Fundamentals of data warehouses. Springer-Verlag, Berlin

    Google Scholar 

  • Jeusfeld M, Quix C, Jarke M (1998) Design and analysis of quality information for data warehouses. In: Proceedings of 17th International conference on conceptual modeling, Singapore

  • Kimball R, Ross M (2002) The data warehouse toolkit. Wiley, New York

    Google Scholar 

  • Kohavi R (1995) The power of decision tables. In: Proceedings of eighth European conference on machine learning (ECML’1995), Heraklion, pp 174–189

  • Labio W, Quass D, Adelberg B (1997) Physical database design for data warehouses. In: Proceedings of 13th international conference on data engineering, IEEE Computer Society, Birmingham, pp 277–288

  • Lechtenborger J, Vossen G (2003) Multidimensional normal forms for data warehouse design. Inform Syst 28:415–434

    Article  Google Scholar 

  • Lehner W, Albretch J, Weekends H (1998) Normal forms for multidimensional databases. In: Proceedings of international conference on scientific and statistical database management, IEEE Press, pp 63–72

  • Lujan-Mora S, Trujillo J, Song IY (2002) Extending UML for multidimensional modeling. In: Proceedings of 5th international conference on the unified modeling language (UML 2002), LNCS 2460, Dresden, pp 290–304

  • Malhotra M, Kaur A, Singh Y (2010) Empirical validation of object-oriented metrics for predicting fault proneness at different severity levels using support vector machine. Int J Syst Assur Eng Manag 1(3):269–281

    Google Scholar 

  • OMG (2005) OMG unified modeling language specification, version 2.0. Object Management Group, Needham Heights

    Google Scholar 

  • Poels G, Dedene G (1999) DISTANCE: a framework for software measure construction. Research Report DTEW9937. Dept. Applied Economics, Katholieke Universiteit Leuven, Leuven

    Google Scholar 

  • Ross Q (1993) C4.5: programs for machine learning. Morgan Kaufman, San Mateo

    Google Scholar 

  • Sapia C (1999) On modeling and predicting query behaviour in OLAP systems. In: Proceedings of international workshop on design and management of data warehouses (DMDW’99), Heidelberg, pp 1–10

  • Sapia C, Blaschka M, Hofling G, Dinter B (1998) Extending the E/R model for the multidimensional paradigm. In: Proceedings of 1st international workshop on data warehouse and data mining (DWDM’98), Springer-Verlag, Singapore, pp 105–116

  • Serrano M (2004) Definition of a set of metrics for assuring data warehouse quality. University of Castilla, La Mancha

    Google Scholar 

  • Serrano M, Calero C, Piattini M (2002) Validating metrics for data warehouses. IEE Softw 149(5):161–166

    Article  Google Scholar 

  • Serrano M, Trujillo J, Calero C, Piattini M (2007) Metrics for data warehouse conceptual models understandability. Inf Softw Technol 49:851–870

    Article  Google Scholar 

  • Serrano M, Trujillo J, Calero C, Sahraouh HA, Piattini M (2008) Empirical studies to assess the understandability of data warehouse schemas using structural metrics. Softw Qual J 16(1):79–106

    Article  Google Scholar 

  • Singh Y, Kaur A, Malhotra M (2010) Empirical validation of object-oriented metrics for predicting fault proneness models. Softw Qual J 18:3–35

    Article  Google Scholar 

  • Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc 36:111–147

    MATH  Google Scholar 

  • Trujillo J, Palomar M, Gomez J, Song IY (2001) Designing data warehouses with OO conceptual models. IEEE Comput 34:66–75

    Article  Google Scholar 

  • Tryfona N, Busborg F, Christiansen J (1999) starER: a conceptual model for data warehouse design. In: Proceedings of the 2nd ACM international workshop on data warehousing and OLAP (DOLAP’99), Missouri, pp 3–8

  • Vassiliadis P (2000) Data warehouse modeling and quality issues. National Technical University of Athens, Athens

    Google Scholar 

  • Witten IH, Frank E (2011) Data mining: practical machine learning tools and techniques with java implementations. Morgan Kaufman/Addison-Wesley, San Francisco

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manoj Kumar.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kumar, M., Gosain, A. & Singh, Y. Empirical validation of structural metrics for predicting understandability of conceptual schemas for data warehouse. Int J Syst Assur Eng Manag 5, 291–306 (2014). https://doi.org/10.1007/s13198-013-0159-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-013-0159-4

Keywords

Navigation