A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge

Liu, Qi; Feng, Gengzhong; Wang, Nengmin; Tayi, Giri Kumar

doi:10.1007/s10796-016-9690-6

A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge

Published: 18 August 2016

Volume 20, pages 401–416, (2018)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Qi Liu^1,2,
Gengzhong Feng^1,2,
Nengmin Wang^1,2 &
…
Giri Kumar Tayi ORCID: orcid.org/0000-0001-5505-0986³

807 Accesses
16 Citations
Explore all metrics

Abstract

Discovering knowledge from data means finding useful patterns in data, this process has increased the opportunity and challenge for businesses in the big data era. Meanwhile, improving the quality of the discovered knowledge is important for making correct decisions in an unpredictable environment. Various models have been developed in the past; however, few used both data quality and prior knowledge to control the quality of the discovery processes and results. In this paper, a multi-objective model of knowledge discovery in databases is developed, which aids the discovery process by utilizing prior process knowledge and different measures of data quality. To illustrate the model, association rule mining is considered and formulated as a multi-objective problem that takes into account data quality measures and prior process knowledge instead of a single objective problem. Measures such as confidence, support, comprehensibility and interestingness are used. A Pareto-based integrated multi-objective Artificial Bee Colony (IMOABC) algorithm is developed to solve the problem. Using well-known and publicly available databases, experiments are carried out to compare the performance of IMOABC with NSGA-II, MOPSO and Apriori algorithms, respectively. The computational results show that IMOABC outperforms NSGA-II, MOPSO and Apriori on different measures and it could be easily customized or tailored to be in line with user requirements and still generates high-quality association rules.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on ensemble learning

Article 30 August 2019

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

Article 19 January 2024

An Intensive and Comprehensive Overview of JAYA Algorithm, its Versions and Applications

Article 27 May 2021

References

Adomavicius, G., & Tuzhilin, A. (1999). User profiling in personalization applications through rule discovery and validation. Paper presented at the Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining.
Agarwal, R. C., Aggarwal, C. C., & Prasad, V. V. V. (2001). A tree projection algorithm for generation of frequent item sets. Journal of Parallel and Distributed Computing, 61(3), 350–371. doi:10.1006/jpdc.2000.1693.
Article Google Scholar
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules, Paper presented at the Proc. 20th Int. VLDB: Conf. Very Large Data Bases.
Google Scholar
Agrawal, R., Imieliński, T., & Swami, A. (1993). Mining association rules between sets of items in large databases. Paper presented at the ACM SIGMOD Record.
Book Google Scholar
Alatas, B., & Akin, E. (2009). Multi-objective rule mining using a chaotic particle swarm optimization algorithm. Knowledge-Based Systems, 22(6), 455–460. doi:10.1016/j.knosys.2009.06.004.
Article Google Scholar
Alatas, B., Akin, E., & Karci, A. (2008). MODENAR: Multi-objective differential evolution algorithm for mining numeric association rules. Applied Soft Computing, 8(1), 646–656.
Article Google Scholar
Alcalá-Fdez, J., Sánchez, L., García, S., del Jesús, M. J., Ventura, S., Garrell, J., et al. (2009). KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Computing, 13(3), 307–318.
Article Google Scholar
Alhajj, R., & Kaya, M. (2008). Multi-objective genetic algorithms based automated clustering for fuzzy association rules mining. Journal of Intelligent Information Systems, 31(3), 243–264. doi:10.1007/s10844-007-0044-1.
Article Google Scholar
Alpaydin, E., & Kaynak, C. (1998). Optical Recognition of Handwritten Digits Data Set UCI repository of machine learning databases. Retrieved from http://www.cs.uci.edu/~mlearn/MLRepository.html
Batista, M. D. C. M., & Salgado, A. C. (2007). Information Quality Measurement in Data Integration Schemas. Paper presented at the QDB.
Google Scholar
Beiranvand, V., Mobasher-Kashani, M., & Abu Bakar, A. (2014). Multi-objective PSO algorithm for mining numerical association rules without a priori discretization. Expert Systems with Applications, 41(9), 4259–4273.
Article Google Scholar
Bendoly, E. (2003). Theory and support for process frameworks of knowledge discovery and data mining from ERP systems. Information Management, 40(7), 639–647.
Article Google Scholar
Bose, I., & Mahapatra, R. K. (2001). Business data mining—a machine learning perspective. Information Management, 39(3), 211–225.
Article Google Scholar
Brin, S., Motwani, R., Ullman, J. D., & Tsur, S. (1997). Dynamic itemset counting and implication rules for market basket data. Paper presented at the ACM SIGMOD Record.
Book Google Scholar
Busygin, S., Prokopyev, O., & Pardalos, P. M. (2008). Biclustering in data mining. Computers & Operations Research, 35(9), 2964–2987.
Article Google Scholar
Cabena, P., Hadjinian, P., Stadler, R., Verhees, J., Zanasi, A., I. B. M. C., &., I. T. S. O. (1998). Discovering data mining: from concept to implementation (Vol. 1): Prentice Hall Upper Saddle River, NJ.
Ceglar, A., & Roddick, J. F. (2006). Association mining. ACM Computing Surveys, 38(2). doi:10.1145/1132956/1132958.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., & Wirth, R. (2000). CRISP-DM 1.0 Step-by-step data mining guide.
Chen, M. S., Han, J. W., & Yu, P. S. (1996). Data mining: An overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering, 8(6), 866–883.
Article Google Scholar
Chen, G., Liu, H., Yu, L., Wei, Q., & Zhang, X. (2006). A new approach to classification based on association rule mining. Decision Support Systems, 42(2), 674–689.
Article Google Scholar
Coello, C. A. C., Pulido, G. T., & Lechuga, M. S. (2004). Handling multiple objectives with particle swarm optimization. Evolutionary Computation, IEEE Transactions on, 8(3), 256–279.
Article Google Scholar
Coenen, F., Leng, P., & Ahmed, S. (2004). Data structure for association rule mining: T-trees and P-trees. IEEE Transactions on Knowledge and Data Engineering, 16(6), 774–778.
Article Google Scholar
Corne, D., Dhaenens, C., & Jourdan, L. (2012). Synergies between operations research and data mining: The emerging use of multi-objective approaches. European Journal of Operational Research, 221(3), 469–479. doi:10.1016/j.ejor.2012.03.039.
Article Google Scholar
Cui, J., Li, Q., & Yang, L.-P. (2011). Fast Algorithm for Mining Association Rules Based on Vertically Distributed Data in Large Dense Databases. Computer Science, 38(4), 216.
Google Scholar
Das, S., & Saha, B. (2009). Data Quality Mining using Genetic Algorithm. International Journal of Computer Science and Security, 3(2), 105–112.
Google Scholar
Davidson, I., & Tayi, G. (2009). Data preparation using data quality matrices for classification mining. European Journal of Operational Research, 197(2), 764–772.
Article Google Scholar
De Falco, I., Della Cioppa, A., & Tarantino, E. (2002). Discovering interesting classification rules with genetic programming. Applied Soft Computing, 1(4), 257–269.
Article Google Scholar
de la Iglesia, B., Richards, G., Philpott, M. S., & Rayward-Smith, V. J. (2006). The application and effectiveness of a multi-objective metaheuristic algorithm for partial classification. European Journal of Operational Research, 169(3), 898–917. doi:10.1016/j.ejor.2004.08.025.
Article Google Scholar
Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002). A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2), 182–197. doi:10.1109/4235.996017.
Article Google Scholar
Derrac, J., García, S., Molina, D., & Herrera, F. (2011). A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm and Evolutionary Computation, 1(1), 3–18.
Article Google Scholar
Evangelopoulos, N., Zhang, X., & Prybutok, V. R. (2010). Latent Semantic Analysis: five methodological recommendations. European Journal of Information Systems, 21(1), 70–86. doi:10.1057/ejis.2010.61.
Article Google Scholar
Fayyad, U., PiatetskyShapiro, G., & Smyth, P. (1996). The KDD process for extracting useful knowledge from volumes of data. Communications of the ACM, 39(11), 27–34. doi:10.1145/240455.240464.
Article Google Scholar
Feelders, A., Daniels, H., & Holsheimer, M. (2000). Methodological and practical aspects of data mining. Information Management, 37(5), 271–281.
Article Google Scholar
Fidelis, M. V., Lopes, H., & Freitas, A. (2000). Discovering comprehensible classification rules with a genetic algorithm. Paper presented at the Evolutionary Computation, 2000. Proceedings of the 2000 Congress on.
Fisher, C. W., & Kingma, B. R. (2001). Criticality of data quality as exemplified in two disasters. Information Management, 39(2), 109–116. doi:10.1016/S0378-7206(01)00083-0.
Article Google Scholar
Freitas, A. A. (2002). Data mining and knowledge discovery with evolutionary algorithms: Springer.
Book Google Scholar
Geng, L. Q., & Hamilton, H. J. (2006). Interestingness measures for data mining: A survey. ACM Computing Surveys, 38(3). doi 10.1145/1132960.1132963
Gertosio, C., & Dussauchoy, A. (2004). Knowledge discovery from industrial databases. Journal of Intelligent Manufacturing, 15(1), 29–37.
Article Google Scholar
Ghosh, A., & Nath, B. (2004). Multi-objective rule mining using genetic algorithms. Information Sciences, 163(1), 123–133.
Article Google Scholar
Gray, B., & Orlowska, M. E. (1998). CCAIIA: Clustering categorical attributes into interesting association rules. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 132–143). Germany: Springer Berlin Heidelberg.
Guerra-García, C., Caballero, I., & Piattini, M. (2013). Capturing data quality requirements for web applications by means of DQ_WebRE. Information Systems Frontiers, 15(3), 433–445.
Article Google Scholar
Han, J., & Kamber, M. (2006). Data mining: concepts and techniques. San Francisco: Morgan Kaufmann Publishers Inc.
Hipp, J., Guntzer, U., & Grimmer, U. (2001). Data quality mining-making a virtue of necessity. Paper presented at the Workshop on Research Issues in Data Mining and Knowledge Discovery DMKD, Santa Barbara, CA, http://www.cs.cornell.edu/johannes/papers/dmkd2001-papers/p5_hipp.pdf.
Hofmann, H. (1994). Statlog (German Credit Data) Data Set UCI repository of machine learning databases. Retrieved from http://www.cs.uci.edu/~mlearn/MLRepository.html
Houtsma, M., & Swami, A. (1995). Set-oriented mining for association rules in relational databases. Paper presented at the Data Engineering, 1995. Proceedings of the Eleventh International Conference on.
Hui, S. C., & Jha, G. (2000). Data mining for customer service support. Information Management, 38(1), 1–13.
Article Google Scholar
Janjua, N. K., Hussain, F. K., & Hussain, O. K. (2013). Semantic information and knowledge integration through argumentative reasoning to support intelligent decision making. Information Systems Frontiers, 15(2), 167–192.
Article Google Scholar
Karaboga, D. (2005). An idea based on honey bee swarm for numerical optimization. Techn. Rep. TR06, Erciyes Univ. Press, Erciyes.
Karaboga, D., & Akay, B. (2009). A comparative study of Artificial Bee Colony algorithm. Applied Mathematics and Computation, 214(1), 108–132. doi:10.1016/j.amc.2009.03.090.
Article Google Scholar
Karaboga, D., & Basturk, B. (2007). Artificial Bee Colony (ABC) optimization algorithm for solving constrained optimization problems. Foundations of Fuzzy Logic and Soft Computing, Proceedings, 4529, 789–798.
Article Google Scholar
Karaboga, D., & Basturk, B. (2008). On the performance of artificial bee colony (ABC) algorithm. Applied Soft Computing, 8(1), 687–697. doi:10.1016/j.asoc.2007.05.007.
Article Google Scholar
Kim, I. Y., & De Weck, O. (2005). Adaptive weighted-sum method for bi-objective optimization: Pareto front generation. Structural and Multidisciplinary Optimization, 29(2), 149–158.
Article Google Scholar
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., & Verkamo, A. I. (1994). Finding interesting rules from large sets of discovered association rules. Paper presented at the Proceedings of the third international conference on Information and knowledge management.
Kros, J. F., Lin, M., & Brown, M. L. (2006). Effects of the neural network s-Sigmoid function on KDD in the presence of imprecise data. Computers & Operations Research, 33(11), 3136–3149.
Article Google Scholar
Kurgan, L. A., & Musilek, P. (2006). A survey of knowledge discovery and data mining process models. Knowledge Engineering Review, 21(1), 1–24. doi:10.1017/S0269888906000738.
Article Google Scholar
Lahiri, A., & Dey, D. (2013). Effects of piracy on quality of information goods. Management Science, 59(1), 245–264.
Article Google Scholar
Lee, Y. W. (2006). Journey to data quality. Cambridge: MIT Press.
Google Scholar
Lee, J., & Prékopa, A. (2013). Properties and calculation of multivariate risk measures: MVaR and MCVaR. Annals of Operations Research, 211(1), 225–254.
Article Google Scholar
Li, T., Ruan, D., Geert, W., Song, J., & Xu, Y. (2007). A rough sets based characteristic relation approach for dynamic attribute generalization in data mining. Knowledge-Based Systems, 20(5), 485–494.
Article Google Scholar
Lin, Q.-Y., Chen, Y.-L., Chen, J.-S., & Chen, Y.-C. (2003). Mining inter-organizational retailing knowledge for an alliance formed by competitive firms. Information Management, 40(5), 431–442.
Article Google Scholar
Liu, D.-R., & Shih, Y.-Y. (2005). Integrating AHP and data mining for product recommendation based on customer lifetime value. Information Management, 42(3), 387–400.
Article Google Scholar
Lui, C.-L., & Chung, F.-L. (2000). Discovery of generalized association rules with multiple minimum supports Principles of Data Mining and Knowledge Discovery (pp. 510-515): Springer.
Madnick, S., & Zhu, H. (2006). Improving data quality through effective use of data semantics. Data & Knowledge Engineering, 59(2), 460–475. doi:10.1016/j.datak.2005.10.001.
Article Google Scholar
Manyika, J., Institute, M. G., Chui, M., Brown, B., Bughin, J., Dobbs, R., Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity: McKinsey Global Institute.
Mariscal, G., Marban, O., & Fernandez, C. (2010). A survey of data mining and knowledge discovery process models and methodologies. Knowledge Engineering Review, 25(2), 137–166. doi:10.1017/S0269888910000032.
Article Google Scholar
Maximiano, M. D., Vega-Rodriguez, M. A., Gomez-Pulido, J. A., & Sanchez-Perez, J. M. (2012). Multiobjective metaheuristics for frequency assignment problem in mobile networks with large-scale real-world instances. Engineering Computations, 29(1–2), 144–172. doi:10.1108/02644401211206034.
Article Google Scholar
Nasiri, M., Taghavi, L. S., & Minaee, B. (2010). Multi-Objective Rule Mining using Simulated Annealing Algorithm. Journal of Convergence Information Technology, 5(1), 60–68.
Article Google Scholar
Noda, E., Freitas, A. A., & Lopes, H. S. (1999). Discovering interesting prediction rules with a genetic algorithm. Paper presented at the Evolutionary Computation, 1999. CEC 99. Proceedings of the 1999 Congress on.
Parssian, A., Sarkar, S., & Jacob, V. S. (2004). Assessing Data Quality for Information Products: Impact of Selection, Projection, and Cartesian Product. Management Science, 50(7), 967–982. doi:10.1287/mnsc.1040.0237.
Article Google Scholar
Piatetskyshapiro, G. (1991). Knowledge Discovery in Databases. Ieee Expert-Intelligent Systems & Their Applications, 6(5), 74–76.
Google Scholar
Pipino, L. L., Lee, Y. W., & Wang, R. Y. (2002). Data quality assessment. Communications of the ACM, 45(4), 211. doi:10.1145/505248.506010.
Article Google Scholar
Popovic, T., Kezunovic, M., & Krstajic, B. (2015). Smart grid data analytics for digital protective relay event recordings. Information Systems Frontiers, 17(3), 591–600.
Article Google Scholar
Qodmanan, H. R., Nasiri, M., & Minaei-Bidgoli, B. (2011). Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence. Expert Systems with Applications, 38(1), 288–298.
Article Google Scholar
Rak, R., Kurgan, L., & Reformat, M. (2008). A tree-projection-based algorithm for multi-label recurrent-item associative-classification rule generation. Data & Knowledge Engineering, 64(1), 171–197. doi:10.1016/j.datak.2007.05.006.
Article Google Scholar
Reynolds, A. P., & de la Iglesia, B. (2009). A multi-objective GRASP for partial classification. Soft Computing, 13(3), 227–243. doi:10.1007/s00500-008-0320-1.
Article Google Scholar
Sheng, V. S., Provost, F., & Ipeirotis, P. G. (2008). Get another label? improving data quality and data mining using multiple, noisy labelers. Paper presented at the Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining.
Sikora, R., & Piramuthu, S. (2007). Framework for efficient feature selection in genetic algorithm based data mining. European Journal of Operational Research, 180(2), 723–737. doi:10.1016/j.ejor.2006.02.040.
Article Google Scholar
Soler, S. V., & Yankelevich, D. (2001). Quality Mining: A Data Mining Based Method for Data Quality Evaluation. Paper presented at the Processing of the Sixth international Conference on Data Quality, MIT.
Srinivasan, S., & Ramakrishnan, S. (2011). Evolutionary multi objective optimization for rule mining: a review. Artificial Intelligence Review, 36(3), 205–248. doi:10.1007/s10462-011-9212-3.
Article Google Scholar
Szeto, W., Wu, Y., & Ho, S. C. (2011). An artificial bee colony algorithm for the capacitated vehicle routing problem. European Journal of Operational Research, 215(1), 126–135.
Article Google Scholar
Tan, P.-N., & Kumar, V. (2000). Interestingness measures for association patterns: A perspective. Paper presented at the Proc. of Workshop on Postprocessing in Machine Learning and Data Mining.
Tew, C., Giraud-Carrier, C., Tanner, K., & Burton, S. (2014). Behavior-based clustering and analysis of interestingness measures for association rule mining. Data Mining and Knowledge Discovery, 28(4), 1004–1045.
Article Google Scholar
Wang, R. Y., & Strong, D. M. (1996). Beyond accuracy: What data quality means to data consumers. Journal of Management Information Systems, 5–33.
Wickramaratna, K., Kubat, M., & Premaratne, K. (2009). Predicting Missing Items in Shopping Carts. IEEE Transactions on Knowledge and Data Engineering, 21(7), 985–998. doi:10.1109/Tkde.2008.229.
Article Google Scholar
Winkler, W. E. (2004). Methods for evaluating and creating data quality. Information Systems, 29(7), 531–550.
Article Google Scholar
Yang, Q., & Wu, X. D. (2006). 10 Challenging problems in data mining research. International Journal of Information Technology and Decision Making, 5(4), 597–604. doi:10.1142/S0219622006002258.
Article Google Scholar
Zitzler, E., Laumanns, M., & Thiele, L. (2001). SPEA2: Improving the strength Pareto evolutionary algorithm: Eidgenössische Technische Hochschule Zürich (ETH), Institut für Technische Informatik und Kommunikationsnetze (TIK).

Download references

Acknowledgments

The research presented in this paper is supported by the National Natural Science Foundation Project of China (71390333 & 71572145), the National Social Science Foundation Project of China (12&ZD070), Supported by Program for New Century Excellent Talents in University (NCET-13-0460), and the Fundamental Research Funds for the Central Universities.

Author information

Authors and Affiliations

School of Management, Xi’an JiaoTong University, NO. 28 Xianning Road, Xi’an Shaanxi, 710049, China
Qi Liu, Gengzhong Feng & Nengmin Wang
The key lab of the ministry of education for process control and efficiency engineering, NO.28 Xianning Road, Xi’an Shaanxi, 710049, China
Qi Liu, Gengzhong Feng & Nengmin Wang
School of Business, SUNY at Albany, Albany, NY, 12222, USA
Giri Kumar Tayi

Authors

Qi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Gengzhong Feng
View author publications
You can also search for this author in PubMed Google Scholar
Nengmin Wang
View author publications
You can also search for this author in PubMed Google Scholar
Giri Kumar Tayi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giri Kumar Tayi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Q., Feng, G., Wang, N. et al. A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge. Inf Syst Front 20, 401–416 (2018). https://doi.org/10.1007/s10796-016-9690-6

Download citation

Published: 18 August 2016
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10796-016-9690-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge

Abstract

Access this article

Similar content being viewed by others

A survey on ensemble learning

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

An Intensive and Comprehensive Overview of JAYA Algorithm, its Versions and Applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A multi-objective model for discovering high-quality knowledge based on data quality and prior knowledge

Abstract

Access this article

Similar content being viewed by others

A survey on ensemble learning

Puma optimizer (PO): a novel metaheuristic optimization algorithm and its application in machine learning

An Intensive and Comprehensive Overview of JAYA Algorithm, its Versions and Applications

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation