Abstract
Data mining techniques can be used for discovering interesting patterns in complicated manufacturing processes. These patterns are used to improve manufacturing quality. Classical representations of quality data mining problems usually refer to the operations settings and not to their sequence. This paper examines the effect of the operation sequence on the quality of the product using data mining techniques. For this purpose a novel decision tree framework for extracting sequence patterns is developed. The proposed method is capable to mine sequence patterns of any length with operations that are not necessarily immediate precedents. The core induction algorithmic framework consists of four main steps. In the first step, all manufacturing sequences are represented as string of tokens. In the second step a large set of regular expression-based patterns are induced by employing a sequence patterns. In the third step we use feature selection methods to filter out the initial set, and leave only the most useful patterns. In the last stage, we transform the quality problem into a classification problem and employ a decision tree induction algorithm. A comparative study performed on benchmark databases illustrates the capabilities of the proposed framework.
Similar content being viewed by others
References
Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases, in Proceedings of the International Conference on Large Databases, pp. 478–499.
Braha D. and Shmilovici A. (2003). On the use of decision tree induction for discovery of interactions in a photolithographic process. IEEE Transactions on Semiconductor Manufacturing 16(4): 644–652
Chizi, B., & Maimon, O. (2005). Dimension reduction and feature selection, the data mining and knowledge discovery handbook. In O. Maimon & L. Rokach (Eds.), (pp. 93–111), Springer.
da Cunha C., Agard B. and Kusiak A. (2006). Data mining for improvement of product quality. International Journal of Production Research 44(18–19): 4027–4041
Damashek M. (1995). Gauging similarity with n-grams: language independent categorization of text. Science 267(5199): 843–848
Frank, E., Hall, M., Holmes, G., Kirkby, R., & Pfahringer, B. (2005). WEKA – A Machine Learning Workbench for Data Mining. In O. Maimon & L. Rokach (Eds.), The data mining and knowledge discovery handbook. Springer, pp. 1305–1314.
Freitag, D. (1998) Toward general-purpose learning for information extraction. Proceedings of the thirty-sixth annual meeting of the association for computational linguistics and seventeenth international conference on computational linguistics, pp. 404–408.
GNU Diff (2003). Retrieved October 31, 2006 from http://www.bmsi.com/java/#diff.
Hall, M. (1999). Correlation-based feature selection for machine learning, Phd Thesis, University of Waikato.
Hand D. (1998). Data Mining – reaching beyond statistics. Research in Official Statistics 1(2): 5–17
Kusiak A. (2006). Data mining: Manufacturing and service applications. International Journal of Production Research 44(18–19): 4175–4191
Lafferty, J., McCallum, A., & Pereira, F. (2001). Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the eighteenth international conference on machine learning (ICML-2001), pp. 282–289.
Myers E.W. (1986). An O(ND) difference algorithm and its variations. Algorithmica 1(1): 251–266
Quinlan, J. R. (1993). C4.5: Programs for machine learning. Morgan Kaufmann
Rabiner L.R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2): 257–286
Rigoutsos I. and Floratos A. (1998). Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm. Bioinformatics 14(1): 55–67
Rakotomalala, R. (2005). TANAGRA: a free software for research and academic purposes. In Proceedings of EGC’2005, RNTI-E-3, Vol. 2, pp.697–702.
Rokach L. (2008). Mining manufacturing data using genetic algorithm-based feature set decomposition. IJISTA 4(1): 57–78
Rokach L. and Maimon O. (2005). Top-down induction of decision trees classifiers - a survey. IEEE Transactions on Systems, Man and Cybernetics, Part C 35(4): 476–487
Rokach L. and Maimon O. (2006). Data mining for improving the quality of manufacturing: A feature set decomposition approach. Journal of Intelligent Manufacturing 17(3): 285–299
Sebastiani F. (2002). Machine learning in automated text categorization. ACM Comp. Surv 34(1): 1–47
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Rokach, L., Romano, R. & Maimon, O. Mining manufacturing databases to discover the effect of operation sequence on the product quality. J Intell Manuf 19, 313–325 (2008). https://doi.org/10.1007/s10845-008-0084-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10845-008-0084-6