Abstract
Data warehouses are powerful tools for making better and faster decisions in organizations where information is an asset of primary importance. Due to the complexity of data warehouses, metrics and procedures are required to continuously assure their quality. This article describes an empirical study and a replication aimed at investigating the use of structural metrics as indicators of the understandability, and by extension, the cognitive complexity of data warehouse schemas. More specifically, a four-step analysis is conducted: (1) check if individually and collectively, the considered metrics can be correlated with schema understandability using classical statistical techniques, (2) evaluate whether understandability can be predicted by case similarity using the case-based reasoning technique, (3) determine, for each level of understandability, the subsets of metrics that are important by means of a classification technique, and assess, by means of a probabilistic technique, the degree of participation of each metric in the understandability prediction. The results obtained show that although a linear model is a good approximation of the relation between structure and understandability, the associated coefficients are not significant enough. Additionally, classification analyses reveal respectively that prediction can be achieved by considering structure similarity, that extracted classification rules can be used to estimate the magnitude of understandability, and that some metrics such as the number of fact tables have more impact than others.
Similar content being viewed by others
References
Anahory, S., & Murray, D. (1997). Data warehousing in the real world. Harlow, UK: Addison-Wesley.
Basili, V. R., Shull, F., & Lanubille, F. (1999). Building knowledge through families of experiments. IEEE Transactions on Software Engineering, 25(4), 456–473.
Bouzeghoub, M., & Kedad, Z. (2002). Information and database quality, Chapter 8, Quality in data warehousing (pp. 163–198). Kluwer Academic Publishers.
Briand, L., Morasca, S., & Basili, V. (1996). Property-based software engineering measurement. IEEE Transactions on Software Engineering, 22(1), 68–86.
Briand, L., Ikonomovski, S., Lounis, H., & Wüst, J. (1998). A Comprehensive investigation of quality factors in object-oriented designs: An industrial case study, Technical Report ISERN-98-29. Germany: Fraunhofer Institute for Experimental Software Engineering.
Calero, C., Piattini, M., Pascual, C., & Serrano, M. (2001). Towards Data warehouse Quality Metrics, International Workshop on Design and Management of Data Warehouses (DMDW’01).
Carver, J., Jaccheri, L., Morasca, S., & Shull, F. (2003). Issues in using students in empirical studies in software engineering education. In Proceedings of 2003 International Symposium on software metrics (METRICS 2003). Sydney, Australia. September 2003, pp. 239–249.
Debevoise, N. T. (1999). The data warehouse method. NJ: Prentice Hall Upper Saddle River.
Fenton, N., & Pfleeger, S. (1997). Software metrics: A rigorous approach (2nd ed.). London: Chapman & Hall.
Flach, P., & Lachiche, N. (1999). 1BC: A First-Order Bayesian Classifier. In Proceedings of the Ninth International Workshop on inductive logic programming (ILP’99), volume 1634 of lecture notes in artificial intelligence, pp. 92–103.
Godin, R., Mineau, G., Missaoui, R., St-Germain, M., & Faraj, N. (1995). Applying concept formation methods to software reuse. International Journal of Knowledge Engineering and Software Engineering, 5(1), 119–142.
Grosser, D., Sahraoui, H. A., & Valtchev, P. (2003). An analogy-based approach for predicting design stability of Java classes. In International Symposium on Software Metrics (METRICS’03), pp. 252–262.
Hörst, M., Regnell, B., & Wohlin, C. (2000). Using students as subjects – A comparative study of students & professionals in lead-time impact assessment. In 4th Conference on empirical assessment & evaluation in software engineering, EASE, Keele University, UK.
Huang, K.-T., Lee, Y. W., & Wang, R. Y. (1999). Quality information and knowledge. Prentice Hall: Upper Saddle River.
Inmon, W. H. (1997). Building the data warehouse (2nd ed.). John Wiley and Sons.
ISO. (2001). Software product evaluation-quality characteristics and guidelines for their use. Geneva: ISO/IEC Standard 9126.
Jarke, M., LenzerinI, I. M., Vassilou, Y., & Vassiliadis, P. (2000). Fundamentals of data warehouses. Springer.
Kimball, R., Reeves, L., Ross, M., & Thornthwaite, W. (1998). The data warehouse lifecycle toolkit. John Wiley and Sons.
Kitchenham, B., Pfleegger, S., Pickard, L., Jones, P., Hoaglin, D., El-Emam, K., & Rosenberg, J. (2002). Preliminary guidelines for empirical research in software engineering. IEEE Transactions of Software Engineering, 28(8), 721–734.
Poels, G., & Dedene G. (1999). DISTANCE: A framework for software measure construction. Belgium: Dept. Applied Economics Katholieke Universiteit Leuven.
Ramoni, M., & Sebastiani, P. (1999). Bayesian methods for intelligent data analysis. In: M. Berthold & D. J. Hand (Eds.), An introduction to intelligent data analysis. Springer: New York.
Schneidewind, N. (2002). Body of knowledge for software quality measurement. IEEE Computer, 35(2), 77–83.
Serrano, M., Calero, C., & Piattini, M. (2002). Validating metrics for data warehouses. IEE Proceedings SOFTWARE, 149(5), 161–166.
Serrano, M., Calero, C., & Piattini, M. (2005). An experimental replication with data warehouse metrics. International Journal of Data Warehousing & Mining, 1(4), 1–21.
Wilson, D., & Martinez, T. (1997). Improved heterogeneous distance functions. Journal of Artificial Intelligence Research, 6, 1–34.
Wohlin, C., Runeson, P., Höst, M., Ohlson, M., Regnell, B., & Wesslén, A. (2000). Experimentation in software engineering: An introduction. Kluwer Academic Publishers.
Zuse, H. (1998). A framework of software measurement. Berlin: Walter de Gruyter.
Acknowledgements
This research is part of the CALIPO project, supported by Dirección General de Investigación of the Ministerio de Ciencia y Tecnologia (TIC2003-07804-C05-03). This research is also part of the ENIGMAS project, supported by Junta de Comunidades de Castilla – La Mancha – Consejería de Ciencia y Tecnología (PBI-05-058). This work was performed during the stay of Houari Sahraoui at the University of Castilla-La Mancha under the “Programa Nacional De Ayudas Para La Movilidad de Profesores en Régimen de año sabático”, from Spanish Ministerio de Educación y Ciencia, REF: 2004-0161. We would like to thank all of the volunteer subjects who participated in these experiments whose inestimable assistance helped us reach the conclusions in this paper. We also want to thank the reviewers for their valuable comments.
Author information
Authors and Affiliations
Corresponding author
Appendix: Collected time
Appendix: Collected time
Rights and permissions
About this article
Cite this article
Serrano, M.A., Calero, C., Sahraoui, H.A. et al. Empirical studies to assess the understandability of data warehouse schemas using structural metrics. Software Qual J 16, 79–106 (2008). https://doi.org/10.1007/s11219-007-9030-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11219-007-9030-7