Abstract
A layer-wise neural network architecture is proposed for classification and regression of time series data where multiple instances have a single output. This data format is encountered in the manufacturing industry where parts are produced in batches—due to the short production cycle—and labelled as a whole for defects. The end-to-end neural network approach is benchmarked against a previously proposed feature engineering method based upon mean shift clustering and K nearest neighbours with dynamic time warping, and a naive approach of flattening the instances and training a support vector machine. An ablation study is performed on a layer-wise 1D-convolutional neural network (CNN) to understand which of the architectural design choices are critical for prediction performance. Based on a transfer moulding production dataset, it is found that the layer-wise 1D-CNN and multilayer perceptron (MLP) have the best performance across most of the common classification and regression metrics, but the layer-wise MLP has a lower computational cost. Finally, it is shown that the proposed parameter sharing in the dense layers of both networks is key to reducing the number of parameters and improving prediction performance.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The purpose of the threshold is to ensure that the cluster is not an anomalous result but rather a distinct pattern associated with zero defects.
The two clusters enable separation of sequences associated with zero and non-zero defects.
References
Akbilgic, O., Bozdogan, H., & Balaban, M. E. (2014). A novel hybrid RBF neural networks model as a forecaster. Statistics and Computing, 24(3), 365–375. https://doi.org/10.1007/s11222-013-9375-7.
Andrzejak, R. G., Lehnertz, K., Mormann, F., Rieke, C., David, P., & Elger, C. E. (2001). Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Physical Review E, 64(6), 061907. https://doi.org/10.1103/PhysRevE.64.061907.
Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations (ICLR 2015)—Conference Track Proceedings (pp. 1–15). arxiv:1409.0473.
Bennett, K. P., & Bredensteiner, E. J. (2000). Duality and geometry in SVM classifiers. In Proceedings of the 17th International Conference on Machine Learning (ICML 2000) (pp. 57–64). Morgan Kaufmann Publishers Inc.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual Workshop on Computational Learning Theory (COLT 1992) (pp. 144–152). Association for Computing Machinery. https://doi.org/10.1145/130385.130401.
Chen, D., Sain, S. L., & Guo, K. (2012). Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining. Journal of Database Marketing & Customer Strategy Management, 19(3), 197–208. https://doi.org/10.1057/dbm.2012.17.
Chollet, F., et al. (2015). Keras. Retrieved from https://keras.io.
Chu, S., Keogh, E., Hart, D., & Pazzani, M. (2002). Iterative deepening dynamic time warping for time series. In Proceedings of the 2002 SIAM International Conference on Data Mining (SDM 2002) (pp. 195–212). Society for Industrial and Applied Mathematics. https://doi.org/10.1137/1.9781611972726.12.
Comaniciu, D., & Meer, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(5), 603–619. https://doi.org/10.1109/34.1000236.
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. https://doi.org/10.1007/BF00994018.
Dau, H. A., Keogh, E., Kamgar, K., Yeh, C.-C.M., Zhu, Y., Gharghabi, S., Ratanamahatana, C. A., Chen, Y., Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G., & Hexagon-M. L. (2019). The UCR time series classification archive. Retrieved from https://www.cs.ucr.edu/~eamonn/time_series_data_2018/.
Defferrard, M., Benzi, K., Vandergheynst, P., & Bresson, X. (2017). FMA: A dataset for music analysis. In Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017) pp. 316–323.
Drucker, H., Surges, C. J. C., Kaufman, L., Smola, A., & Vapnik, V. (1996). Support vector regression machines. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems (Vol. 9, pp. 155–161). MIT Press.
Dua, D., & Graff, C. (2017). UCI machine learning repository. Retrieved from http://archive.ics.uci.edu/ml.
Ferreira, R. P., Martiniano, A., Ferreira, A., Ferreira, A., & Sassi, R. J. (2016). Study on daily demand forecasting orders using artificial neural network. IEEE Latin America Transactions, 14(3), 1519–1525. https://doi.org/10.1109/TLA.2016.7459644.
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014) (pp. 580–587). Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/CVPR.2014.81.
Hébrail, G., Hugueney, B., Lechevallier, Y., & Rossi, F. (2010). Exploratory analysis of functional data via clustering and optimal segmentation. Neurocomputing, 73(7–9), 1125–1141. https://doi.org/10.1016/j.neucom.2009.11.022.
Helwig, N., Pignanelli, E., & Schütze, A. (2015). Condition monitoring of a complex hydraulic system using multivariate statistics. In 2015 IEEE International Instrumentation and Measurement Technology Conference Proceedings (I2MTC 2015) (pp. 210–215). Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/I2MTC.2015.7151267.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
Hua, J., Xiong, Z., Lowey, J., Suh, E., & Dougherty, E. R. (2005). Optimal number of features as a function of sample size for various classification rules. Bioinformatics, 21(8), 1509–1515. https://doi.org/10.1093/bioinformatics/bti171.
Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., & Muller, P.-A. (2019). Deep learning for time series classification: A review. Data Mining and Knowledge Discovery, 33(4), 917–963. https://doi.org/10.1007/s10618-019-00619-1.
Kaluža, B., Mirchevska, V., Dovgan, E., Luštrek, M., & Gams, M. (2010). An agent-based approach to care in independent living. In: B. de Ruyter, R. Wichert, D. V. Keyson, P. Markopoulos, N. Streitz, M. Divitini, N. Georgantas, & A. M. Gomez (Eds.) Ambient Intelligence. AmI 2010. Lecture Notes in Computer Science (Vol. 6439, pp. 177–186). Springer. https://doi.org/10.1007/978-3-642-16917-5_18.
Kate, R. J. (2016). Using dynamic time warping distances as features for improved time series classification. Data Mining and Knowledge Discovery, 30(2), 283–312. https://doi.org/10.1007/s10618-015-0418-x.
Kawala, F., Douzal-Chouakria, A., Gaussier, E., & Dimert, E. (2013). Prédictions d’activité dans les réseaux sociaux en ligne [Activity predictions in online social networks]. In 4ième Conférence sur les Modèles et L’analyse des Réseaux: Approches Mathématiques et Informatiques [The 4th Conference on Network Modeling and Analysis] (pp. 1–16). Retrieved from https://hal.archives-ouvertes.fr/hal-00881395/document.
Kingma, D. P., & Ba, J. L. (2014). Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations (ICLR 2015)—Conference Track Proceedings (pp. 1–15). http://arxiv.org/abs/1412.6980.
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324. https://doi.org/10.1109/5.726791.
Lee, K. J., Yapp, E. K. Y., & Li, X. (2020). Unsupervised probability matching for quality estimation with partial information in a multiple-instances, single-output scenario. In The 15th IEEE Conference on Industrial Electronics and Applications (ICIEA 2020) (pp. 1432–1437). Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/ICIEA48937.2020.9248430.
Liang, X., Zou, T., Guo, B., Li, S., Zhang, H., Zhang, S., et al. (2015). Assessing Beijing’s PM 2.5 pollution: Severity, weather impact, APEC and winter heating. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 471(2182), 20150257. https://doi.org/10.1098/rspa.2015.0257.
Lipton, Z. C., Kale, D. C., Elkan, C., & Wetzel, R. (2016). Learning to diagnose with LSTM recurrent neural networks. In 4th International Conference on Learning Representations (ICLR 2016)—Conference Track Proceedings (pp. 1–18).
Lucas, D. D., Yver Kwok, C., Cameron-Smith, P., Graven, H., Bergmann, D., Guilderson, T. P., et al. (2015). Designing optimal greenhouse gas observing networks that consider performance and cost. Geoscientific Instrumentation, Methods and Data Systems, 4(1), 121–137. https://doi.org/10.5194/gi-4-121-2015.
Nanopoulos, A., Alcock, R., & Manolopoulos, Y. (2001). Feature-based classification of time-series data. In N. Mastorakis & S. D. Nikolopoulos (Eds.), Information Processing and Technology (pp. 49–61). Nova Science Publishers Inc.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830 arXiv:1201.0490.
Prechelt, L. (1998). Early stopping— But when? In G. B. Orr, & K.-R. Müller (Eds.) Neural Networks: Tricks of the Trade, Lecture Notes in Computer Science (Vol. 1524, pp. 55–69). Springer. https://doi.org/10.1007/3-540-49430-8_3.
Rao, P. N. (2018). Manufacturing Technology—Foundry, Forming and Welding (5th ed., Vol. I). McGraw Hill Education.
Rodríguez, J. J., & Alonso, C. J. (2004). Interval and dynamic time warping-based decision trees. In Proceedings of the 2004 ACM Symposium on Applied Computing (SAC 2004) (pp. 548–552). Association for Computing Machinery. https://doi.org/10.1145/967900.968015.
Rodríguez, J. J., Alonso, C. J., & Boström, H. (2001). Boosting interval based literals. Intelligent Data Analysis, 5(3), 245–262. https://doi.org/10.3233/IDA-2001-5305.
Rosato, D. V., Rosato, D. V., & Rosato, M. G. (2000). Injection Molding Handbook, 3 edn. (Vol. I). Springer. https://doi.org/10.1007/978-1-4615-4597-2.
Solberg, A. H. S., & Solberg, R. (1996). A large-scale evaluation of features for automatic detection of oil spills in ERS SAR images. In 1996 International Geoscience and Remote Sensing Symposium (IGARSS 1996) (Vol. 3, pp. 1484–1486). Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/IGARSS.1996.516705.
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15, 1929–1958.
Sykacek, P., & Roberts, S. (2001). Bayesian time series classification. In T. G. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in Neural Information Processing Systems (Vol. 14, pp. 937–944). MIT Press.
Vapnik, V. N., & Lerner, A. Y. (1963). Pattern recognition using generalized portraits. Automation and Remote Control, 24(6), 774–780.
Wu, Y., & Chang, E. Y. (2004). Distance-function design and fusion for sequence data. In Proceedings of the 13th ACM Conference on Information and Knowledge Management (CIKM 2004) (pp. 324–333). Association for Computing Machinery. https://doi.org/10.1145/1031171.1031238.
Yapp, E. K. Y., Li, X., Lu, W. F., & Tan, P. S. (2020). Comparison of base classifiers for multi-label learning. Neurocomputing, 394, 51–60. https://doi.org/10.1016/j.neucom.2020.01.102.
Zhang, K., Fan, W., Yuan, X., Davidson, I., & Li, X. (2006). Forecasting skewed biased stochastic ozone days: Analyses and solutions. In 6th International Conference on Data Mining (ICDM 2006) (pp. 753–764). Institute of Electrical and Electronics Engineers.https://doi.org/10.1109/ICDM.2006.73.
Acknowledgements
This work was supported by the RIE2020 Advanced Manufacturing and Engineering (AME) IAF-PP (A19C1a0018).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yapp, E.K.Y., Gupta, A. & Li, X. A layer-wise neural network for multi-item single-output quality estimation. J Intell Manuf 34, 3131–3141 (2023). https://doi.org/10.1007/s10845-022-01995-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10845-022-01995-0