Abstract
Much of Information Systems research is of the behavioral science category that involves the analysis of quantitative data. Typically a confirmatory approach is taken where only unconditional hypotheses are specified based on existing theory and evaluated using techniques such as regression analysis. In this paper we present a knowledge discovery via data mining (KDDM) process model based multi-criteria framework for selecting the most appropriate causal explanatory model based on the researchers subjective preferences including accuracy, simplicity, the relative importance of variables in his/her tentative research model, relative preferences for inclusion of some causal relationships.
Similar content being viewed by others
References
Aguaron, J., & Moreno-Jimenez, J. (2003). The geometric consistency index: approximated thresholds. European Journal of Operational Research, 147, 137–145.
Andoh-Baidoo, F. K., Osei-Bryson, K.-M., & Amoako-Gyampah, K. (2012). Effects of firm and IT characteristics on the value of E-commerce initiatives: an inductive theoretical framework. Information Systems Frontiers, 14(2), 237–259.
Balshi, M. S., McGuire, A. D., Duffy, P., Flannigan, M., Walsh, J., & Melillo, J. (2009). Assessing the response of area burned to changing climate in Western Boreal North America using a Multivariate Adaptive Regression Splines (MARS) approach. Global Change Biology, 15(3), 578–600.
Behera, A. K., Verbert, J., Lauwers, B., & Duflou, J. R. (2012). Tool path compensation strategies for single point incremental sheet forming using multivariate adaptive regression splines. Computer-Aided Design.
Breiman, L., Friedman, J., Olshen, R., & Charles, S. (1984). Classification and regression trees, 1984, Wadsworth International Group.
Briand, L., Freimut, B., & Vollei, F. (2004). Using multiple adaptive regression splines to understand trends in inspection data and identify optimal inspection rates. Journal of Systems and Software, 73(2), 2–23.
Bryson, N. (1995). A goal programming for generating priority vectors. Journal of the Operational Research Society, 46, 641–648.
Bryson, N. K. M., & Joseph, A. (2000). Generating consensus priority interval vectors for group decision making in the AHP. Journal of Multi-Criteria Decision Analysis, 9(4), 127–137.
Choo, E., & Wedley, W. (2004). A common framework for deriving preference values from pairwise comparison matrices. Computers & Operations Research, 31, 893–908.
Cios, K., Teresinska, A., Konieczna, S., Potocka, J., & Sharma, S. (2000). Diagnosing myocardial perfusion from PECT Bull’s-eye maps—a knowledge discovery approach. IEEE Engineering in Medicine and Biology Magazine, 19(4), 17–25.
De Andrés, J., Lorca, P., de Cos Juez, F. J., & Sánchez-Lasheras, F. (2011). Bankruptcy forecasting: a hybrid approach using fuzzy c-means clustering and Multivariate Adaptive Regression Splines (MARS). Expert Systems with Applications, 38(3), 1866–1875.
De Jong, P. (1984). A statistical approach to Saaty’s scaling methods for priorities. Journal of Mathematical Psychology, 28, 467–478.
Deconinck, E., Coomans, D., & Vander Heyden, Y. (2007). Exploration of linear modelling techniques and their combination with multivariate adaptive regression splines to predict gastro-intestinal absorption of drugs. Journal of Pharmaceutical and Biomedical Analysis, 43(1), 119–130.
Fayyad, U., G. Piatetsky-Shapiro, & Smyth, P. (1996). From data mining to knowledge discovery: an overview. Advances in Knowledge Discovery and Data Mining (pp. 1–34). AAAI Press.
Friedman, J. H. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1), 1–141. 1pp.
Guo, W., Zhao, N., & Shao, H. (2010, March). IT investment efficiency analysis of equipment manufacturing industry based on two-stage nonparametric model: In Proceedings of IEEE 2010 International Conference on Challenges in Environmental Science and Computer Engineering Vol. 2 (pp. 21–24).
Han, J., & Kamber, M. (2001). Data mining: Concepts and techniques. New York: Morgan Kaufman.
Hastie, T., & Tibshirani, R. (1990). Generalized additive model. London: Chapman and Hall.
Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning: Data mining, inference, and prediction. New York: Springer.
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech, and Language Processing, 16(1), 229–238.
Hung, Y.-H., Chou, S.-C., & Tzeng, G.-H. (2011). Knowledge management adoption and assessment for SMEs by a novel MCDM approach. Decision Support Systems, 51, 270–291.
Ko, M., & Osei-Bryson, K. (2004). Using regression splines to assess the impact of information technology investments on productivity in the healthcare industry. Information Systems Journal, 14, 43–63.
Ko, M., Clark, J. G., & Ko, D. (2008). Revisiting the impact of information technology investments on productivity: an empirical investigation using multivariate adaptive regression splines. Information Resources Management Journal, 21(3), 1–23.
Kositanurit, B., Ngwenyama, O., & Osei-Bryson, K.-M. (2006). An exploration of factors that impact individual performance in an ERP environment: an analysis using multiple analytical techniques. European Journal of Information Systems, 15, 556–568.
Kositanurit, B., Osei-Bryson, K.-M., & Ngwenyama, O. (2011). An exploration of factors that impact individual performance in an ERP environment: an analysis using multiple analytical techniques. Expert Systems with Applications, 38(6), 7041–7050.
Kurgan, L., & Musilek, P. (2006). A survey of knowledge discovery and data mining process models. The Knowledge Engineering Review, 21(1), 1–24.
Leathwick, J. R., Rowe, D., Richardson, J., Elith, J., & Hastie, T. (2005). Using multivariate adaptive regression splines to predict the distributions of New Zealand’s freshwater diadromous fish. Freshwater Biology, 50(12), 2034–2052.
Lee, M., & Lee, J. (2012). The impact of information security failure on customer behaviors: a study on a large-scale hacking incident on the Internet. Information Systems Frontiers, 14(2), 375–393.
Lim, T.-S., Loh, W.-Y., & Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40, 203–228.
Mansingh, G., Rao, L., Osei-Bryson, K.-M., & Mills, A. (2013). Profiling Internet banking users: a knowledge discovery in data mining process model based approach. Information Systems Frontiers. doi:10.1007/s10796-012-9397-2.
Martin, A. (2011). A hybrid model for bankruptcy prediction using genetic algorithm, FUZZY C-MEANS and MARS. International Journal on Soft Computing, 2(1), 12–24.
Menon, N., Lee, B., & Eldenburg, L. (2000). Productivity of information systems in the healthcare industry. Information Systems Research, 11(1), 83–92.
Monti, S., & Carenini, G. (2000). Dealing with the expert inconsistency in probability elicitation. IEEE Transactions on Knowledge and Data Engineering, 12(4), 499–508.
Morawczynski, O., & Ngwenyama, O. (2007). Unraveling the impact of investments in ICT, education and health on development: an analysis of archival data of five West African countries using regression splines. Electronic Journal on Information Systems in Developing Countries, 29, 1–15.
Mukkamala, S., Sung, A. H., Abraham, A., & Ramos, V. (2006). Intrusion detection systems using adaptive regression spines. In Enterprise Information Systems VI (pp. 211–218). Netherlands: Springer.
Ngai, E. (2003). Selection of web sites for online advertising using the AHP. Information and Management, 40, 233–242.
Obata, T., Shiraishi, S., Daigo, M., & Nakajima, N. (1999). Assessment for an incomplete comparison matrix and improvement of an inconsistent comparison: Computational experiments. ISAHP 1999, Kobe, Japan, August 12–14.
Osei-Bryson, K.-M. (2004). Evaluation of decision trees: a multi-criteria approach. Computers & Operations Research, 31(11), 1933–1945.
Osei-Bryson, K.-M. (2006). An action learning approach for assessing the consistency of pairwise comparison data. European Journal of Operational Research, 174(1), 234–244.
Osei-Bryson, K.-M., Dong, L., & Ngwenyama, O. (2008). Exploring managerial factors affecting ERP implementation: an investigation of the Klein-Sorra model using regression splines. Information Systems Journal, 18(5), 499–527.
Oztekin, A., Kong, Z., & Delen, D. (2011). Development of a structural equation modeling-based decision tree methodology for the analysis of lung transplantations. Decision Support Systems, 51, 155–166.
Park, C.-S., & Han, I. (2002). A case-based reasoning with the feature weights derived by analytic hierarchy process for bankruptcy prediction. Expert Systems with Applications, 23(3), 255–264.
Ramakrishnan, T., Jones, M., & Sidorova, A. (2012). Factors influencing Business Intelligence (BI) data collection strategies: an empirical investigation. Decision Support Systems, 52, 486–496.
Saaty, T. (1980). The analytic hierarchy process: Planning: Priority setting, resource allocation. New York: McGraw-Hill.
Salo, A., & Hämäläinen, R. (1997). On the measurement of preferences in the analytic hierarchy process. Journal of Multi-Criteria Decision Analysis, 6, 309–343.
Shafer, J., Agrawal, R., & Mehta, M. (1996). SPRINT: A scalable parallel classifier for data mining. Proceedings of the 22nd International Conference on Very Large Data Bases (pp. 544–555).
Sharma, S., & Osei-Bryson, K.-M. (2010). Towards an integrated knowledge discovery and data mining process model. Knowledge Engineering Review, 25(1), 49–67.
Shearer, C. (2000). The CRISP-DM methodology: the new blueprint for data mining. Journal of Data Warehousing, 5(4), 13–22.
Shin, Y. M., Lee, S. C., Shin, B., & Lee, H. G. (2010). Examining influencing factors of post-adoption usage fo mobile internet: focus on the user perception of supplier-side attributes. Information Systems Frontiers, 12(5), 595–606.
Whetten, D. (1989). What constitutes a theoretical contribution? Academy of Management Review, 14(4), 490–495.
Zhou, Y., & Leung, H. (2007). Predicting object-oriented software maintainability using multivariate adaptive regression splines. Journal of Systems and Software, 80(8), 1349–1361.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Overview on pairwise comparisons based
1.1 Weight generation techniques
Pairwise comparison (PC) information may be used to elicit preference information from the decision maker and to indirectly produce a corresponding weight vector (w). The preference information is represented numerically using a positive reciprocal PC matrix A = {aij} with aij = 1/aji, where aij is a rational number that is the numeric equivalent of the relative importance of between object “i” compared to object “j”. The weight vector w may then be obtained from the pairwise comparisons matrix A using a variety of techniques, including the right eigenvector method (EM), the logarithmic least squares method (e.g. De Jong 1984), and the logarithmic goal programming method (Bryson 1995). Choo and Wedley (2004) presented an overview of the major weight vector generation techniques.
Right Eigenvector Method | Aw = λMaxw where λMax is the largest eigenvalue of A |
Logarithmic Least Squares Method | Min ∑i∑j (aij - (wi/w)j)2 subject to ∑i wi = 1; wi ≥ 0. |
Logarithmic Goal Programming Method | Min ∑i∑j |aij - (wi/wj) | subject to ∑i wi = 1; wi ≥ 0. |
The pairwise comparison matrix A is said to be consistent if for each triple of objects (i, j, k) the equality aij = (aik*akj) holds; otherwise it is said to be inconsistent. As noted by Obata et al. (1999), “When a pairwise comparison matrix contains seriously inconsistent comparisons, the priority weights calculated from such a wrong matrix are not reliable”. Now because the matrix A is often inconsistent, it is necessary to measure the level of inconsistency in order to determine if the resulting weight vector w will be meaningful.
Consistency indicators have been proposed by various researchers (e.g. Saaty 1980; Salo and Hämäläinen 1997; Aguaron and Moreno-Jimenez 2003; Osei-Bryson 2006) have proposed consistency indicators. The most popular of the Consistency indicators is Saaty’s Consistency Ratio (CR) which is defined as CR = CI/RI, where CI = (λMax - N), λMax is the largest eigenvalue of A, and RI. is similar to CI but based on random matrices, each with the same dimension as A, and using Saaty’s ‘rule of thumb’ the pairwise comparison matrix is deemed to be inconsistent only if CR > 0.10. Osei-Bryson (2006) also proposed interpretable consistency indicators.
Appendix B: Description of variables of the illustrative example
Rights and permissions
About this article
Cite this article
Osei-Bryson, KM. A hybrid decision support framework for generating & selecting causal explanatory regression splines models for information systems research. Inf Syst Front 17, 845–856 (2015). https://doi.org/10.1007/s10796-013-9469-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10796-013-9469-y