Skip to main content
Log in

A hybrid decision support framework for generating & selecting causal explanatory regression splines models for information systems research

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Much of Information Systems research is of the behavioral science category that involves the analysis of quantitative data. Typically a confirmatory approach is taken where only unconditional hypotheses are specified based on existing theory and evaluated using techniques such as regression analysis. In this paper we present a knowledge discovery via data mining (KDDM) process model based multi-criteria framework for selecting the most appropriate causal explanatory model based on the researchers subjective preferences including accuracy, simplicity, the relative importance of variables in his/her tentative research model, relative preferences for inclusion of some causal relationships.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aguaron, J., & Moreno-Jimenez, J. (2003). The geometric consistency index: approximated thresholds. European Journal of Operational Research, 147, 137–145.

    Article  Google Scholar 

  • Andoh-Baidoo, F. K., Osei-Bryson, K.-M., & Amoako-Gyampah, K. (2012). Effects of firm and IT characteristics on the value of E-commerce initiatives: an inductive theoretical framework. Information Systems Frontiers, 14(2), 237–259.

    Article  Google Scholar 

  • Balshi, M. S., McGuire, A. D., Duffy, P., Flannigan, M., Walsh, J., & Melillo, J. (2009). Assessing the response of area burned to changing climate in Western Boreal North America using a Multivariate Adaptive Regression Splines (MARS) approach. Global Change Biology, 15(3), 578–600.

    Article  Google Scholar 

  • Behera, A. K., Verbert, J., Lauwers, B., & Duflou, J. R. (2012). Tool path compensation strategies for single point incremental sheet forming using multivariate adaptive regression splines. Computer-Aided Design.

  • Breiman, L., Friedman, J., Olshen, R., & Charles, S. (1984). Classification and regression trees, 1984, Wadsworth International Group.

  • Briand, L., Freimut, B., & Vollei, F. (2004). Using multiple adaptive regression splines to understand trends in inspection data and identify optimal inspection rates. Journal of Systems and Software, 73(2), 2–23.

    Article  Google Scholar 

  • Bryson, N. (1995). A goal programming for generating priority vectors. Journal of the Operational Research Society, 46, 641–648.

    Article  Google Scholar 

  • Bryson, N. K. M., & Joseph, A. (2000). Generating consensus priority interval vectors for group decision making in the AHP. Journal of Multi-Criteria Decision Analysis, 9(4), 127–137.

    Article  Google Scholar 

  • Choo, E., & Wedley, W. (2004). A common framework for deriving preference values from pairwise comparison matrices. Computers & Operations Research, 31, 893–908.

    Article  Google Scholar 

  • Cios, K., Teresinska, A., Konieczna, S., Potocka, J., & Sharma, S. (2000). Diagnosing myocardial perfusion from PECT Bull’s-eye maps—a knowledge discovery approach. IEEE Engineering in Medicine and Biology Magazine, 19(4), 17–25.

    Article  Google Scholar 

  • De Andrés, J., Lorca, P., de Cos Juez, F. J., & Sánchez-Lasheras, F. (2011). Bankruptcy forecasting: a hybrid approach using fuzzy c-means clustering and Multivariate Adaptive Regression Splines (MARS). Expert Systems with Applications, 38(3), 1866–1875.

    Article  Google Scholar 

  • De Jong, P. (1984). A statistical approach to Saaty’s scaling methods for priorities. Journal of Mathematical Psychology, 28, 467–478.

    Article  Google Scholar 

  • Deconinck, E., Coomans, D., & Vander Heyden, Y. (2007). Exploration of linear modelling techniques and their combination with multivariate adaptive regression splines to predict gastro-intestinal absorption of drugs. Journal of Pharmaceutical and Biomedical Analysis, 43(1), 119–130.

    Article  Google Scholar 

  • Fayyad, U., G. Piatetsky-Shapiro, & Smyth, P. (1996). From data mining to knowledge discovery: an overview. Advances in Knowledge Discovery and Data Mining (pp. 1–34). AAAI Press.

  • Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–141. 1pp.

    Article  Google Scholar 

  • Guo, W., Zhao, N., & Shao, H. (2010, March). IT investment efficiency analysis of equipment manufacturing industry based on two-stage nonparametric model: In Proceedings of IEEE 2010 International Conference on Challenges in Environmental Science and Computer Engineering Vol. 2 (pp. 21–24).

  • Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. New York: Morgan Kaufman.

    Google Scholar 

  • Hastie, T., & Tibshirani, R. (1990). Generalized additive model. London: Chapman and Hall.

    Google Scholar 

  • Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.

    Book  Google Scholar 

  • Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.

    Article  Google Scholar 

  • Hung, Y.-H., Chou, S.-C., & Tzeng, G.-H. (2011). Knowledge management adoption and assessment for SMEs by a novel MCDM approach. Decision Support Systems, 51, 270–291.

    Article  Google Scholar 

  • Ko, M., & Osei-Bryson, K. (2004). Using regression splines to assess the impact of information technology investments on productivity in the healthcare industry. Information Systems Journal, 14, 43–63.

    Article  Google Scholar 

  • Ko, M., Clark, J. G., & Ko, D. (2008). Revisiting the impact of information technology investments on productivity: an empirical investigation using multivariate adaptive regression splines. Information Resources Management Journal, 21(3), 1–23.

    Article  Google Scholar 

  • Kositanurit, B., Ngwenyama, O., & Osei-Bryson, K.-M. (2006). An exploration of factors that impact individual performance in an ERP environment: an analysis using multiple analytical techniques. European Journal of Information Systems, 15, 556–568.

    Article  Google Scholar 

  • Kositanurit, B., Osei-Bryson, K.-M., & Ngwenyama, O. (2011). An exploration of factors that impact individual performance in an ERP environment: an analysis using multiple analytical techniques. Expert Systems with Applications, 38(6), 7041–7050.

    Article  Google Scholar 

  • Kurgan, L., & Musilek, P. (2006). A survey of knowledge discovery and data mining process models. The Knowledge Engineering Review, 21(1), 1–24.

    Article  Google Scholar 

  • Leathwick, J. R., Rowe, D., Richardson, J., Elith, J., & Hastie, T. (2005). Using multivariate adaptive regression splines to predict the distributions of New Zealand’s freshwater diadromous fish. Freshwater Biology, 50(12), 2034–2052.

    Article  Google Scholar 

  • Lee, M., & Lee, J. (2012). The impact of information security failure on customer behaviors: a study on a large-scale hacking incident on the Internet. Information Systems Frontiers, 14(2), 375–393.

    Article  Google Scholar 

  • Lim, T.-S., Loh, W.-Y., & Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203–228.

    Article  Google Scholar 

  • Mansingh, G., Rao, L., Osei-Bryson, K.-M., & Mills, A. (2013). Profiling Internet banking users: a knowledge discovery in data mining process model based approach. Information Systems Frontiers. doi:10.1007/s10796-012-9397-2.

    Google Scholar 

  • Martin, A. (2011). A hybrid model for bankruptcy prediction using genetic algorithm, FUZZY C-MEANS and MARS. International Journal on Soft Computing, 2(1), 12–24.

    Article  Google Scholar 

  • Menon, N., Lee, B., & Eldenburg, L. (2000). Productivity of information systems in the healthcare industry. Information Systems Research, 11(1), 83–92.

    Article  Google Scholar 

  • Monti, S., & Carenini, G. (2000). Dealing with the expert inconsistency in probability elicitation. IEEE Transactions on Knowledge and Data Engineering, 12(4), 499–508.

    Article  Google Scholar 

  • Morawczynski, O., & Ngwenyama, O. (2007). Unraveling the impact of investments in ICT, education and health on development: an analysis of archival data of five West African countries using regression splines. Electronic Journal on Information Systems in Developing Countries, 29, 1–15.

    Google Scholar 

  • Mukkamala, S., Sung, A. H., Abraham, A., & Ramos, V. (2006). Intrusion detection systems using adaptive regression spines. In Enterprise Information Systems VI (pp. 211–218). Netherlands: Springer.

  • Ngai, E. (2003). Selection of web sites for online advertising using the AHP. Information and Management, 40, 233–242.

    Article  Google Scholar 

  • Obata, T., Shiraishi, S., Daigo, M., & Nakajima, N. (1999). Assessment for an incomplete comparison matrix and improvement of an inconsistent comparison: Computational experiments. ISAHP 1999, Kobe, Japan, August 12–14.

  • Osei-Bryson, K.-M. (2004). Evaluation of decision trees: a multi-criteria approach. Computers & Operations Research, 31(11), 1933–1945.

    Article  Google Scholar 

  • Osei-Bryson, K.-M. (2006). An action learning approach for assessing the consistency of pairwise comparison data. European Journal of Operational Research, 174(1), 234–244.

    Article  Google Scholar 

  • Osei-Bryson, K.-M., Dong, L., & Ngwenyama, O. (2008). Exploring managerial factors affecting ERP implementation: an investigation of the Klein-Sorra model using regression splines. Information Systems Journal, 18(5), 499–527.

    Article  Google Scholar 

  • Oztekin, A., Kong, Z., & Delen, D. (2011). Development of a structural equation modeling-based decision tree methodology for the analysis of lung transplantations. Decision Support Systems, 51, 155–166.

    Article  Google Scholar 

  • Park, C.-S., & Han, I. (2002). A case-based reasoning with the feature weights derived by analytic hierarchy process for bankruptcy prediction. Expert Systems with Applications, 23(3), 255–264.

    Article  Google Scholar 

  • Ramakrishnan, T., Jones, M., & Sidorova, A. (2012). Factors influencing Business Intelligence (BI) data collection strategies: an empirical investigation. Decision Support Systems, 52, 486–496.

    Article  Google Scholar 

  • Saaty, T. (1980). The analytic hierarchy process: Planning: Priority setting, resource allocation. New York: McGraw-Hill.

    Google Scholar 

  • Salo, A., & Hämäläinen, R. (1997). On the measurement of preferences in the analytic hierarchy process. Journal of Multi-Criteria Decision Analysis, 6, 309–343.

    Article  Google Scholar 

  • Shafer, J., Agrawal, R., & Mehta, M. (1996). SPRINT: A scalable parallel classifier for data mining. Proceedings of the 22nd International Conference on Very Large Data Bases (pp. 544–555).

  • Sharma, S., & Osei-Bryson, K.-M. (2010). Towards an integrated knowledge discovery and data mining process model. Knowledge Engineering Review, 25(1), 49–67.

    Article  Google Scholar 

  • Shearer, C. (2000). The CRISP-DM methodology: the new blueprint for data mining. Journal of Data Warehousing, 5(4), 13–22.

    Google Scholar 

  • Shin, Y. M., Lee, S. C., Shin, B., & Lee, H. G. (2010). Examining influencing factors of post-adoption usage fo mobile internet: focus on the user perception of supplier-side attributes. Information Systems Frontiers, 12(5), 595–606.

    Article  Google Scholar 

  • Whetten, D. (1989). What constitutes a theoretical contribution? Academy of Management Review, 14(4), 490–495.

    Article  Google Scholar 

  • Zhou, Y., & Leung, H. (2007). Predicting object-oriented software maintainability using multivariate adaptive regression splines. Journal of Systems and Software, 80(8), 1349–1361.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kweku-Muata Osei-Bryson.

Appendices

Appendix A: Overview on pairwise comparisons based

1.1 Weight generation techniques

Pairwise comparison (PC) information may be used to elicit preference information from the decision maker and to indirectly produce a corresponding weight vector (w). The preference information is represented numerically using a positive reciprocal PC matrix A = {aij} with aij = 1/aji, where aij is a rational number that is the numeric equivalent of the relative importance of between object “i” compared to object “j”. The weight vector w may then be obtained from the pairwise comparisons matrix A using a variety of techniques, including the right eigenvector method (EM), the logarithmic least squares method (e.g. De Jong 1984), and the logarithmic goal programming method (Bryson 1995). Choo and Wedley (2004) presented an overview of the major weight vector generation techniques.

Right Eigenvector Method

Aw = λMaxw

where λMax is the largest eigenvalue of A

Logarithmic Least Squares Method

Min ∑ij (aij - (wi/w)j)2

subject to ∑i wi = 1; wi ≥ 0.

Logarithmic Goal Programming Method

Min ∑ij |aij - (wi/wj) |

subject to ∑i wi = 1; wi ≥ 0.

The pairwise comparison matrix A is said to be consistent if for each triple of objects (i, j, k) the equality aij = (aik*akj) holds; otherwise it is said to be inconsistent. As noted by Obata et al. (1999), “When a pairwise comparison matrix contains seriously inconsistent comparisons, the priority weights calculated from such a wrong matrix are not reliable”. Now because the matrix A is often inconsistent, it is necessary to measure the level of inconsistency in order to determine if the resulting weight vector w will be meaningful.

Consistency indicators have been proposed by various researchers (e.g. Saaty 1980; Salo and Hämäläinen 1997; Aguaron and Moreno-Jimenez 2003; Osei-Bryson 2006) have proposed consistency indicators. The most popular of the Consistency indicators is Saaty’s Consistency Ratio (CR) which is defined as CR = CI/RI, where CI = (λMax - N), λMax is the largest eigenvalue of A, and RI. is similar to CI but based on random matrices, each with the same dimension as A, and using Saaty’s ‘rule of thumb’ the pairwise comparison matrix is deemed to be inconsistent only if CR > 0.10. Osei-Bryson (2006) also proposed interpretable consistency indicators.

Appendix B: Description of variables of the illustrative example

Table 8 Description of variables

Rights and permissions

Reprints and permissions

About this article

Cite this article

Osei-Bryson, KM. A hybrid decision support framework for generating & selecting causal explanatory regression splines models for information systems research. Inf Syst Front 17, 845–856 (2015). https://doi.org/10.1007/s10796-013-9469-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-013-9469-y

Keywords

Navigation