Abstract
The effective control of the power consistency, which is one of the most important quality indicators of diesel engine, plays a decisive role for improving the competitiveness of the products. The widely used sensors and other data acquisition equipment make the “data-driven quality control” become possible. However, how to determine the highly related parameters with the engine power from massive captured manufacturing data and effectively discriminated the direct and indirect dependencies between these variables are still challenging. This paper proposed a feature selection algorithm named NMI-ND which uses network deconvolution (ND) to infer causal correlations among various diesel engine manufacturing parameters from the observed correlations based on normalized mutual information (NMI). The proposed algorithm is thoroughly evaluated through the experimental study by comparing it with other representative feature selection algorithms. The comparison demonstrates that NMI-ND performs better in both effectiveness and efficiency.
Similar content being viewed by others
References
Alaeddini, A., & Dogan, I. (2011). Using Bayesian networks for root cause analysis in statistical process control. Expert Systems with Applications, 38(9), 11230–11243. https://doi.org/10.1016/j.eswa.2011.02.171.
Arturo Garza-Reyes, J., Flint, A., Kumar, V., Antony, J., & Soriano-Meier, H. (2014). A DMAIRC approach to lead time reduction in an aerospace engine assembly process. Journal of Manufacturing Technology Management, 25(1), 27–48.
Bai, Y., Sun, Z., Zeng, B., Long, J., Li, L., & Oliveira, J. V. D., et al. (2018). A comparison of dimension reduction techniques for support vector machine modeling of multi-parameter manufacturing quality prediction. Journal of Intelligent Manufacturing, 1–12.
Barzel, B., & Barabási, A. L. (2013). Network link prediction by global silencing of indirect correlations. Nature Biotechnology, 31(8), 720–725.
Çaydaş, U., & Ekici, S. (2012). Support vector machines models for surface roughness prediction in cnc turning of aisi 304 austenitic stainless steel. Journal of Intelligent Manufacturing, 23(3), 639–650.
Chang, W., Gao, C., Xiao, Y., & Zhou, S. (2016). Mining approximate dependencies from diesel engine assembling data using clustering-based rough sets theory. In Control and decision conference (CCDC), 2016 Chinese (pp. 5683–5687). IEEE.
De La Fuente, A., Bing, N., Hoeschele, I., & Mendes, P. (2004). Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics, 20(18), 3565–3574.
Du, S., Lv, J., & Xi, L. (2010a). An integrated system for on-line intelligent monitoring and identifying process variability and its application. International Journal of Computer Integrated Manufacturing, 23(6), 529–542.
Du, S., Lv, J., & Xi, L. (2012). A robust approach for root causes identification in machining processes using hybrid learning algorithm and engineering knowledge. New York: Springer.
Estvez, P. A., Tesmer, M., Perez, C. A., & Zurada, J. M. (2009). Normalized mutual information feature selection. IEEE Transactions on Neural Networks, 20(2), 189–201.
Feizi, S., Marbach, D., Médard, M., & Kellis, M. (2013). Network deconvolution as a general method to distinguish direct dependencies in networks. Nature Biotechnology, 31(8), 726.
Friedman, N. (2004). Inferring cellular networks using probabilistic graphical models. Science, 303(5659), 799–805.
Hall, M. A. (1998). Correlation-based feature subset selection for machine learning. Thesis submitted in partial fulfillment of the requirements of the degree of Doctor of Philosophy at the University of Waikato.
Han, X., Shen, Z., Wang, W. X., & Di, Z. (2015). Robust reconstruction of complex networks from sparse data. Physical Review Letters, 114(2), 028701.
Hopf, T. A., Colwell, L. J., Sheridan, R., Rost, B., Sander, C., & Marks, D. S. (2012). Three-dimensional structures of membrane proteins from genomic sequencing. Cell, 149(7), 1607–1621.
Jia Q. (2012). Research and application of multivariate correlation and data processing engine, M.S. thesis, Dept. Mechanical Eng., Shanghai Jiao Tong University, Shanghai.
Jones, D. T., Buchan, D. W., Cozzetto, D., & Pontil, M. (2011). PSICOV: Precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics, 28(2), 184–190.
Kong, D., Ding, C., Huang, H., & Zhao, H. (2012). Multi-label relieff and f-statistic feature selections for image annotation. In 2012 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2352–2359). IEEE.
Le Novere, N. (2015). Quantitative and logic modelling of gene and molecular networks. Nature Reviews Genetics, 16(3), 146.
Li, Z., Wang, Y., & Wang, K. (2017a). A data-driven method based on deep belief networks for backlash error prediction in machining centers. Journal of Intelligent Manufacturing. https://doi.org/10.1007/s10845-017-1380-9.
Li, C., Liu, S., Zhang, H., & Hu, Y. (2017b). Machinery condition prediction based on wavelet and support vector machine. Journal of Intelligent Manufacturing, 28(4), 1–11.
Neyman, J., & Pearson, E. S. (1992). On the problem of the most efficient tests of statistical hypotheses. In Breakthroughs in statistics (pp. 73–108). New York: Springer. https://link.springer.com/chapter/10.1007/978-1-4612-0919-5_6.
Sun, H. P., Huang, Y., Wang, X. F., Zhang, Y., & Shen, H. B. (2015). Improving accuracy of protein contact prediction using balanced network deconvolution. Proteins: Structure, Function, and Bioinformatics, 83(3), 485–496.
Veiga, D. F. T., Vicente, F. F. R., Grivet, M., De la Fuente, A., & Vasconcelos, A. T. R. (2007). Genome-wide partial correlation analysis of Escherichia coli microarray data. Genetics and Molecular Research, 6(4), 730–742.
Wainwright, M. J., & Jordan, M. I. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends® in Machine Learning, 1(1–2), 1–305.
Wang, J., & Zhang, J. (2016). Big data analytics for forecasting cycle time in semiconductor wafer fabrication system. International Journal of Production Research, 54(23), 7231–7244.
Wang, J. L., Zhang, J., & Wang, X. X. (2018). A data driven cycle time prediction with feature selection in a semiconductor wafer fabrication system. IEEE Transactions on Semiconductor Manufacturing,. https://doi.org/10.1109/TSM.2017.2788501.
Weigt, M., White, R. A., Szurmant, H., Hoch, J. A., & Hwa, T. (2009). Identification of direct residue contacts in protein-protein interaction by message passing. Proceedings of the National Academy of Sciences, 106(1), 67–72.
Yanai, T., Kurashige, Y., Mizukami, W., Chalupský, J., Lan, T. N., & Saitow, M. (2015). Density matrix renormalization group for ab initio Calculations and associated dynamic correlation methods: A review of theory and applications. International Journal of Quantum Chemistry, 115(5), 283–299.
Yu, L., & Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th international conference on machine learning (ICML-03) (pp. 856–863).
Yu, J., Lee, H., Im, Y., Kim, M. S., & Park, D. (2010). Real-time classification of internet application traffic using a hierarchical multi-class SVM. KSII Transactions on Internet & Information Systems, 4(5), 859–876.
Yu, L., & Liu, H. (2004). Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research, 5, 1205–1224.
Zhang, X., Zhao, J., Hao, J. K., Zhao, X. M., & Chen, L. (2014). Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks. Nucleic Acids Research, 43(5), e31–e31.
Zhou, D., Gozolchiani, A., Ashkenazy, Y., & Havlin, S. (2015). Teleconnection paths via climate network direct link detection. Physical Review Letters, 115(26), 268501.
Zhou, X., & Jiang, P. (2014). Variation source identification for deep hole boring process of cutting-hard workpiece based on multi-source information fusion using evidence theory. Journal of Intelligent Manufacturing, 28, 1–16.
Acknowledgements
This work was supported by financial support of National Science Foundation of China (Nos. 51435009, 51775348), National Technology Support Program of China (No. 2015BAF12B02) and Shanghai Aerospace Science and Technology Innovation Fund (No. SAST2016048).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qin, W., Zha, D. & Zhang, J. An effective approach for causal variables analysis in diesel engine production by using mutual information and network deconvolution. J Intell Manuf 31, 1661–1671 (2020). https://doi.org/10.1007/s10845-018-1397-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10845-018-1397-8