Skip to main content
Log in

A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Discovering knowledge from data means finding useful patterns in data, this process has increased the opportunity and challenge for businesses in the big data era. Meanwhile, improving the quality of the discovered knowledge is important for making correct decisions in an unpredictable environment. Various models have been developed in the past; however, few used both data quality and prior knowledge to control the quality of the discovery processes and results. In this paper, a multi-objective model of knowledge discovery in databases is developed, which aids the discovery process by utilizing prior process knowledge and different measures of data quality. To illustrate the model, association rule mining is considered and formulated as a multi-objective problem that takes into account data quality measures and prior process knowledge instead of a single objective problem. Measures such as confidence, support, comprehensibility and interestingness are used. A Pareto-based integrated multi-objective Artificial Bee Colony (IMOABC) algorithm is developed to solve the problem. Using well-known and publicly available databases, experiments are carried out to compare the performance of IMOABC with NSGA-II, MOPSO and Apriori algorithms, respectively. The computational results show that IMOABC outperforms NSGA-II, MOPSO and Apriori on different measures and it could be easily customized or tailored to be in line with user requirements and still generates high-quality association rules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Adomavicius, G., & Tuzhilin, A. (1999). User profiling in personalization applications through rule discovery and validation. Paper presented at the Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining.

  • Agarwal, R. C., Aggarwal, C. C., & Prasad, V. V. V. (2001). A tree projection algorithm for generation of frequent item sets. Journal of Parallel and Distributed Computing, 61(3), 350–371. doi:10.1006/jpdc.2000.1693.

    Article  Google Scholar 

  • Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules, Paper presented at the Proc. 20th Int. VLDB: Conf. Very Large Data Bases.

    Google Scholar 

  • Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. Paper presented at the ACM SIGMOD Record.

    Book  Google Scholar 

  • Alatas, B., & Akin, E. (2009). Multi-objective rule mining using a chaotic particle swarm optimization algorithm. Knowledge-Based Systems, 22(6), 455–460. doi:10.1016/j.knosys.2009.06.004.

    Article  Google Scholar 

  • Alatas, B., Akin, E., & Karci, A. (2008). MODENAR: Multi-objective differential evolution algorithm for mining numeric association rules. Applied Soft Computing, 8(1), 646–656.

    Article  Google Scholar 

  • Alcalá-Fdez, J., Sánchez, L., García, S., del Jesús, M. J., Ventura, S., Garrell, J., et al. (2009). KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 13(3), 307–318.

    Article  Google Scholar 

  • Alhajj, R., & Kaya, M. (2008). Multi-objective genetic algorithms based automated clustering for fuzzy association rules mining. Journal of Intelligent Information Systems, 31(3), 243–264. doi:10.1007/s10844-007-0044-1.

    Article  Google Scholar 

  • Alpaydin, E., & Kaynak, C. (1998). Optical Recognition of Handwritten Digits Data Set UCI repository of machine learning databases. Retrieved from http://www.cs.uci.edu/~mlearn/MLRepository.html

  • Batista, M. D. C. M., & Salgado, A. C. (2007). Information Quality Measurement in Data Integration Schemas. Paper presented at the QDB.

    Google Scholar 

  • Beiranvand, V., Mobasher-Kashani, M., & Abu Bakar, A. (2014). Multi-objective PSO algorithm for mining numerical association rules without a priori discretization. Expert Systems with Applications, 41(9), 4259–4273.

    Article  Google Scholar 

  • Bendoly, E. (2003). Theory and support for process frameworks of knowledge discovery and data mining from ERP systems. Information Management, 40(7), 639–647.

    Article  Google Scholar 

  • Bose, I., & Mahapatra, R. K. (2001). Business data mining—a machine learning perspective. Information Management, 39(3), 211–225.

    Article  Google Scholar 

  • Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. Paper presented at the ACM SIGMOD Record.

    Book  Google Scholar 

  • Busygin, S., Prokopyev, O., & Pardalos, P. M. (2008). Biclustering in data mining. Computers & Operations Research, 35(9), 2964–2987.

    Article  Google Scholar 

  • Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., Zanasi, A., I. B. M. C., &., I. T. S. O. (1998). Discovering data mining: from concept to implementation (Vol. 1): Prentice Hall Upper Saddle River, NJ.

  • Ceglar, A., & Roddick, J. F. (2006). Association mining. ACM Computing Surveys, 38(2). doi:10.1145/1132956/1132958.

  • Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-DM 1.0 Step-by-step data mining guide.

  • Chen, M. S., Han, J. W., & Yu, P. S. (1996). Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866–883.

    Article  Google Scholar 

  • Chen, G., Liu, H., Yu, L., Wei, Q., & Zhang, X. (2006). A new approach to classification based on association rule mining. Decision Support Systems, 42(2), 674–689.

    Article  Google Scholar 

  • Coello, C. A. C., Pulido, G. T., & Lechuga, M. S. (2004). Handling multiple objectives with particle swarm optimization. Evolutionary Computation, IEEE Transactions on, 8(3), 256–279.

    Article  Google Scholar 

  • Coenen, F., Leng, P., & Ahmed, S. (2004). Data structure for association rule mining: T-trees and P-trees. IEEE Transactions on Knowledge and Data Engineering, 16(6), 774–778.

    Article  Google Scholar 

  • Corne, D., Dhaenens, C., & Jourdan, L. (2012). Synergies between operations research and data mining: The emerging use of multi-objective approaches. European Journal of Operational Research, 221(3), 469–479. doi:10.1016/j.ejor.2012.03.039.

    Article  Google Scholar 

  • Cui, J., Li, Q., & Yang, L.-P. (2011). Fast Algorithm for Mining Association Rules Based on Vertically Distributed Data in Large Dense Databases. Computer Science, 38(4), 216.

    Google Scholar 

  • Das, S., & Saha, B. (2009). Data Quality Mining using Genetic Algorithm. International Journal of Computer Science and Security, 3(2), 105–112.

    Google Scholar 

  • Davidson, I., & Tayi, G. (2009). Data preparation using data quality matrices for classification mining. European Journal of Operational Research, 197(2), 764–772.

    Article  Google Scholar 

  • De Falco, I., Della Cioppa, A., & Tarantino, E. (2002). Discovering interesting classification rules with genetic programming. Applied Soft Computing, 1(4), 257–269.

    Article  Google Scholar 

  • de la Iglesia, B., Richards, G., Philpott, M. S., & Rayward-Smith, V. J. (2006). The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification. European Journal of Operational Research, 169(3), 898–917. doi:10.1016/j.ejor.2004.08.025.

    Article  Google Scholar 

  • Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2), 182–197. doi:10.1109/4235.996017.

    Article  Google Scholar 

  • Derrac, J., García, S., Molina, D., & Herrera, F. (2011). A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation, 1(1), 3–18.

    Article  Google Scholar 

  • Evangelopoulos, N., Zhang, X., & Prybutok, V. R. (2010). Latent Semantic Analysis: five methodological recommendations. European Journal of Information Systems, 21(1), 70–86. doi:10.1057/ejis.2010.61.

    Article  Google Scholar 

  • Fayyad, U., PiatetskyShapiro, G., & Smyth, P. (1996). The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11), 27–34. doi:10.1145/240455.240464.

    Article  Google Scholar 

  • Feelders, A., Daniels, H., & Holsheimer, M. (2000). Methodological and practical aspects of data mining. Information Management, 37(5), 271–281.

    Article  Google Scholar 

  • Fidelis, M. V., Lopes, H., & Freitas, A. (2000). Discovering comprehensible classification rules with a genetic algorithm. Paper presented at the Evolutionary Computation, 2000. Proceedings of the 2000 Congress on.

  • Fisher, C. W., & Kingma, B. R. (2001). Criticality of data quality as exemplified in two disasters. Information Management, 39(2), 109–116. doi:10.1016/S0378-7206(01)00083-0.

    Article  Google Scholar 

  • Freitas, A. A. (2002). Data mining and knowledge discovery with evolutionary algorithms: Springer.

    Book  Google Scholar 

  • Geng, L. Q., & Hamilton, H. J. (2006). Interestingness measures for data mining: A survey. ACM Computing Surveys, 38(3). doi 10.1145/1132960.1132963

  • Gertosio, C., & Dussauchoy, A. (2004). Knowledge discovery from industrial databases. Journal of Intelligent Manufacturing, 15(1), 29–37.

    Article  Google Scholar 

  • Ghosh, A., & Nath, B. (2004). Multi-objective rule mining using genetic algorithms. Information Sciences, 163(1), 123–133.

    Article  Google Scholar 

  • Gray, B., & Orlowska, M. E. (1998). CCAIIA: Clustering categorical attributes into interesting association rules. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 132–143). Germany: Springer Berlin Heidelberg.

  • Guerra-García, C., Caballero, I., & Piattini, M. (2013). Capturing data quality requirements for web applications by means of DQ_WebRE. Information Systems Frontiers, 15(3), 433–445.

    Article  Google Scholar 

  • Han, J., & Kamber, M. (2006). Data mining: concepts and techniques. San Francisco: Morgan Kaufmann Publishers Inc.

  • Hipp, J., Guntzer, U., & Grimmer, U. (2001). Data quality mining-making a virtue of necessity. Paper presented at the Workshop on Research Issues in Data Mining and Knowledge Discovery DMKD, Santa Barbara, CA, http://www.cs.cornell.edu/johannes/papers/dmkd2001-papers/p5_hipp.pdf.

  • Hofmann, H. (1994). Statlog (German Credit Data) Data Set UCI repository of machine learning databases. Retrieved from http://www.cs.uci.edu/~mlearn/MLRepository.html

  • Houtsma, M., & Swami, A. (1995). Set-oriented mining for association rules in relational databases. Paper presented at the Data Engineering, 1995. Proceedings of the Eleventh International Conference on.

  • Hui, S. C., & Jha, G. (2000). Data mining for customer service support. Information Management, 38(1), 1–13.

    Article  Google Scholar 

  • Janjua, N. K., Hussain, F. K., & Hussain, O. K. (2013). Semantic information and knowledge integration through argumentative reasoning to support intelligent decision making. Information Systems Frontiers, 15(2), 167–192.

    Article  Google Scholar 

  • Karaboga, D. (2005). An idea based on honey bee swarm for numerical optimization. Techn. Rep. TR06, Erciyes Univ. Press, Erciyes.

  • Karaboga, D., & Akay, B. (2009). A comparative study of Artificial Bee Colony algorithm. Applied Mathematics and Computation, 214(1), 108–132. doi:10.1016/j.amc.2009.03.090.

    Article  Google Scholar 

  • Karaboga, D., & Basturk, B. (2007). Artificial Bee Colony (ABC) optimization algorithm for solving constrained optimization problems. Foundations of Fuzzy Logic and Soft Computing, Proceedings, 4529, 789–798.

    Article  Google Scholar 

  • Karaboga, D., & Basturk, B. (2008). On the performance of artificial bee colony (ABC) algorithm. Applied Soft Computing, 8(1), 687–697. doi:10.1016/j.asoc.2007.05.007.

    Article  Google Scholar 

  • Kim, I. Y., & De Weck, O. (2005). Adaptive weighted-sum method for bi-objective optimization: Pareto front generation. Structural and Multidisciplinary Optimization, 29(2), 149–158.

    Article  Google Scholar 

  • Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., & Verkamo, A. I. (1994). Finding interesting rules from large sets of discovered association rules. Paper presented at the Proceedings of the third international conference on Information and knowledge management.

  • Kros, J. F., Lin, M., & Brown, M. L. (2006). Effects of the neural network s-Sigmoid function on KDD in the presence of imprecise data. Computers & Operations Research, 33(11), 3136–3149.

    Article  Google Scholar 

  • Kurgan, L. A., & Musilek, P. (2006). A survey of knowledge discovery and data mining process models. Knowledge Engineering Review, 21(1), 1–24. doi:10.1017/S0269888906000738.

    Article  Google Scholar 

  • Lahiri, A., & Dey, D. (2013). Effects of piracy on quality of information goods. Management Science, 59(1), 245–264.

    Article  Google Scholar 

  • Lee, Y. W. (2006). Journey to data quality. Cambridge: MIT Press.

    Google Scholar 

  • Lee, J., & Prékopa, A. (2013). Properties and calculation of multivariate risk measures: MVaR and MCVaR. Annals of Operations Research, 211(1), 225–254.

    Article  Google Scholar 

  • Li, T., Ruan, D., Geert, W., Song, J., & Xu, Y. (2007). A rough sets based characteristic relation approach for dynamic attribute generalization in data mining. Knowledge-Based Systems, 20(5), 485–494.

    Article  Google Scholar 

  • Lin, Q.-Y., Chen, Y.-L., Chen, J.-S., & Chen, Y.-C. (2003). Mining inter-organizational retailing knowledge for an alliance formed by competitive firms. Information Management, 40(5), 431–442.

    Article  Google Scholar 

  • Liu, D.-R., & Shih, Y.-Y. (2005). Integrating AHP and data mining for product recommendation based on customer lifetime value. Information Management, 42(3), 387–400.

    Article  Google Scholar 

  • Lui, C.-L., & Chung, F.-L. (2000). Discovery of generalized association rules with multiple minimum supports Principles of Data Mining and Knowledge Discovery (pp. 510-515): Springer.

  • Madnick, S., & Zhu, H. (2006). Improving data quality through effective use of data semantics. Data & Knowledge Engineering, 59(2), 460–475. doi:10.1016/j.datak.2005.10.001.

    Article  Google Scholar 

  • Manyika, J., Institute, M. G., Chui, M., Brown, B., Bughin, J., Dobbs, R., Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity: McKinsey Global Institute.

  • Mariscal, G., Marban, O., & Fernandez, C. (2010). A survey of data mining and knowledge discovery process models and methodologies. Knowledge Engineering Review, 25(2), 137–166. doi:10.1017/S0269888910000032.

    Article  Google Scholar 

  • Maximiano, M. D., Vega-Rodriguez, M. A., Gomez-Pulido, J. A., & Sanchez-Perez, J. M. (2012). Multiobjective metaheuristics for frequency assignment problem in mobile networks with large-scale real-world instances. Engineering Computations, 29(1–2), 144–172. doi:10.1108/02644401211206034.

    Article  Google Scholar 

  • Nasiri, M., Taghavi, L. S., & Minaee, B. (2010). Multi-Objective Rule Mining using Simulated Annealing Algorithm. Journal of Convergence Information Technology, 5(1), 60–68.

    Article  Google Scholar 

  • Noda, E., Freitas, A. A., & Lopes, H. S. (1999). Discovering interesting prediction rules with a genetic algorithm. Paper presented at the Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on.

  • Parssian, A., Sarkar, S., & Jacob, V. S. (2004). Assessing Data Quality for Information Products: Impact of Selection, Projection, and Cartesian Product. Management Science, 50(7), 967–982. doi:10.1287/mnsc.1040.0237.

    Article  Google Scholar 

  • Piatetskyshapiro, G. (1991). Knowledge Discovery in Databases. Ieee Expert-Intelligent Systems & Their Applications, 6(5), 74–76.

    Google Scholar 

  • Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211. doi:10.1145/505248.506010.

    Article  Google Scholar 

  • Popovic, T., Kezunovic, M., & Krstajic, B. (2015). Smart grid data analytics for digital protective relay event recordings. Information Systems Frontiers, 17(3), 591–600.

    Article  Google Scholar 

  • Qodmanan, H. R., Nasiri, M., & Minaei-Bidgoli, B. (2011). Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence. Expert Systems with Applications, 38(1), 288–298.

    Article  Google Scholar 

  • Rak, R., Kurgan, L., & Reformat, M. (2008). A tree-projection-based algorithm for multi-label recurrent-item associative-classification rule generation. Data & Knowledge Engineering, 64(1), 171–197. doi:10.1016/j.datak.2007.05.006.

    Article  Google Scholar 

  • Reynolds, A. P., & de la Iglesia, B. (2009). A multi-objective GRASP for partial classification. Soft Computing, 13(3), 227–243. doi:10.1007/s00500-008-0320-1.

    Article  Google Scholar 

  • Sheng, V. S., Provost, F., & Ipeirotis, P. G. (2008). Get another label? improving data quality and data mining using multiple, noisy labelers. Paper presented at the Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining.

  • Sikora, R., & Piramuthu, S. (2007). Framework for efficient feature selection in genetic algorithm based data mining. European Journal of Operational Research, 180(2), 723–737. doi:10.1016/j.ejor.2006.02.040.

    Article  Google Scholar 

  • Soler, S. V., & Yankelevich, D. (2001). Quality Mining: A Data Mining Based Method for Data Quality Evaluation. Paper presented at the Processing of the Sixth international Conference on Data Quality, MIT.

  • Srinivasan, S., & Ramakrishnan, S. (2011). Evolutionary multi objective optimization for rule mining: a review. Artificial Intelligence Review, 36(3), 205–248. doi:10.1007/s10462-011-9212-3.

    Article  Google Scholar 

  • Szeto, W., Wu, Y., & Ho, S. C. (2011). An artificial bee colony algorithm for the capacitated vehicle routing problem. European Journal of Operational Research, 215(1), 126–135.

    Article  Google Scholar 

  • Tan, P.-N., & Kumar, V. (2000). Interestingness measures for association patterns: A perspective. Paper presented at the Proc. of Workshop on Postprocessing in Machine Learning and Data Mining.

  • Tew, C., Giraud-Carrier, C., Tanner, K., & Burton, S. (2014). Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Mining and Knowledge Discovery, 28(4), 1004–1045.

    Article  Google Scholar 

  • Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 5–33.

  • Wickramaratna, K., Kubat, M., & Premaratne, K. (2009). Predicting Missing Items in Shopping Carts. IEEE Transactions on Knowledge and Data Engineering, 21(7), 985–998. doi:10.1109/Tkde.2008.229.

    Article  Google Scholar 

  • Winkler, W. E. (2004). Methods for evaluating and creating data quality. Information Systems, 29(7), 531–550.

    Article  Google Scholar 

  • Yang, Q., & Wu, X. D. (2006). 10 Challenging problems in data mining research. International Journal of Information Technology and Decision Making, 5(4), 597–604. doi:10.1142/S0219622006002258.

    Article  Google Scholar 

  • Zitzler, E., Laumanns, M., & Thiele, L. (2001). SPEA2: Improving the strength Pareto evolutionary algorithm: Eidgenössische Technische Hochschule Zürich (ETH), Institut für Technische Informatik und Kommunikationsnetze (TIK).

Download references

Acknowledgments

The research presented in this paper is supported by the National Natural Science Foundation Project of China (71390333 & 71572145), the National Social Science Foundation Project of China (12&ZD070), Supported by Program for New Century Excellent Talents in University (NCET-13-0460), and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giri Kumar Tayi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Q., Feng, G., Wang, N. et al. A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge. Inf Syst Front 20, 401–416 (2018). https://doi.org/10.1007/s10796-016-9690-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10796-016-9690-6

Keywords

Navigation