Abstract
Sparsity and noisy labels occur inherently in real-world data. Previously, strong assumptions were made by domain experts to use their experience and expertise to select parameters for their models. Similar approach has been adopted in machine learning for hyper-parameter setting. However, these assumptions are often subjective and are not necessarily the optimal choice. To address this problem, we propose a data-driven approach to automate model parameter learning via a Bayesian nonparametric formulation. We propose hierarchical Dirichlet process mixture model (HDPMM) as a multi-task learning framework. It is used to learn the common parameters across different datasets in the same industry. In our experiments, we verified the capability of HDPMM for multi-task learning in infrastructure failure predictions. It was done by combining HDPMM with hierarchical beta process, which is our failure prediction model. In particular, multi-task learning was used to gain additional knowledge from failure records of water supply networks managed by other utility companies to improve prediction accuracy of our model. Notably, we have achieved superior accuracy for sparse predictions than previous state-of-the-art models. Moreover, we have demonstrated the capability of our proposed model in supporting preventive maintenance of critical infrastructure.
Similar content being viewed by others
References
Bishop, C.M., et al.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Bonilla, E.V., Chai, K.M.A., Williams, C.K.: Multi-task gaussian process prediction. NIPs 20, 153–160 (2007)
Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Self-taught clustering. In: Proceedings of the 25th international conference on Machine Learning, pp. 200–207. ACM (2008)
David, C.R., et al.: Regression models and life tables (with discussion). J. R. Stat. Soc. 34, 187–220 (1972)
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Gupta, S., Phung, D., Venkatesh, S.: Factorial multi-task learning: a Bayesian nonparametric approach. In: International Conference on Machine Learning, pp. 657–665 (2013)
Hjort, N.L., et al.: Nonparametric bayes estimators based on beta processes in models for life history data. Ann. Stat. 18(3), 1259–1294 (1990)
Huelsenbeck, J.P., Jain, S., Frost, S.W., Pond, S.L.K.: A dirichlet process model for detecting positive selection in protein-coding DNA sequences. Proc. Natl. Acad. Sci. 103(16), 6263–6268 (2006)
Ibrahim, J.G., Chen, M.H., Sinha, D.: Bayesian Survival Analysis. Wiley Online Library, New York (2005)
Kabir, G., Tesfamariam, S., Sadiq, R.: Predicting water main failures using bayesian model averaging and survival modelling approach. Reliab. Eng. Syst. Saf. 142, 498–514 (2015)
Kemp, C., Tenenbaum, J.B., Griffiths, T.L., Yamada, T., Ueda, N.: Learning systems of concepts with an infinite relational model. In: AAAI, vol. 3, p. 5 (2006)
Kettler, A., Goulter, I.: An analysis of pipe breakage in urban water distribution networks. Can. J. Civ. Eng. 12(2), 286–293 (1985)
Kleiner, Y., Rajani, B.: Comprehensive review of structural deterioration of water mains: statistical models. Urban Water 3(3), 131–150 (2001)
Kumar, A., Rizvi, S.A.A., Brooks, B., Vanderveld, R.A., Wilson, K.H., Kenney, C., Edelstein, S., Finch, A., Maxwell, A., Zuckerbraun, J., et al.: Using machine learning to assess the risk of and prevent water main breaks. (2018). arXiv preprint arXiv:1805.03597
Le Gat, Y., Eisenbeis, P.: Using maintenance records to forecast failures in water networks. Urban Water 2(3), 173–181 (2000)
Li, B., Zhang, B., Li, Z., Wang, Y., Chen, F., Vitanage, D.: Prioritising water pipes for condition assessment with data analytics. Australia’s International Water Conference & Exhibition (OzWater) (2015)
Li, Z., Zhang, B., Wang, Y., Chen, F., Taib, R., Whiffin, V., Wang, Y.: Water pipe condition assessment: a hierarchical beta process approach for sparse incident data. Mach. Learn. 95(1), 11–26 (2014)
Lin, P., Zhang, B., Wang, Y., Li, Z., Li, B., Wang, Y., Chen, F.: Data driven water pipe failure prediction: a Bayesian nonparametric approach. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp 193–202. ACM (2015)
Luo, S., Chu, V.W., Zhou, J., Chen, F., Wong, R.K., Huang, W.: A multivariate clustering approach for infrastructure failure predictions. In: 2017 IEEE International Congress on Big Data (BigData Congress), pp. 274–281. IEEE (2017)
Luo, S., Chu, V.W., Li, Z., Wang, Y., Zhou, J., Chen, F., Wong, R.K.: Multitask learning for sparse failure prediction. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 3–14. Springer, Berlin (2019)
Mailhot, A., Pelletier, G., Noël, J.F., Villeneuve, J.P.: Modeling the evolution of the structural state of water pipe networks with brief recorded pipe break histories: methodology and application. Water Resources Res. 36(10), 3053–3062 (2000)
Mavin, K.: Predicting the Failure Performance of Individual Water Mains. Urban Water Research Association of Australia, Sydney (1996)
Misiūnas, D.: Failure monitoring and asset condition assessment in water supply systems. Vilniaus Gedimino technikos universitetas, Vilnius (2008)
Morris Jr., R.: Principal causes and remedies of water main breaks. J. Am. Water Works Assoc. 59(7), 782–798 (1967)
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)
Pelletier, G., Mailhot, A., Villeneuve, J.P.: Modeling water pipe breaks–three case studies. J. Water Resources Plan. Manag. 129(2), 115–123 (2003)
Pitman, J., Yor, M., et al.: The two-parameter poisson-dirichlet distribution derived from a stable subordinator. Ann. Probab. 25(2), 855–900 (1997)
Schwaighofer, A., Tresp, V., Yu, K.: Learning Gaussian process kernels via hierarchical bayes. In: Advances in Neural Information Processing Systems 17 (NIPS 2004), pp. 1209–1216 (2005)
Sethuraman, J.: A constructive definition of Dirichlet priors. Stat. Sin. 4, 639–650 (1994)
Shamir, U., Howard, C., et al.: An analytical approach to scheduling pipe replacement. J. Am. Water Works Assoc. 71(5), 248–258 (1979)
Teh, Y.W., Jordan, M.I., Beal, M.J., Blei, D.M.: Hierarchical dirichlet processes. J. Am. Stat. Assoc. 101(476), 1566–1581 (2006)
Thibaux, R., Jordan, M.I.: Hierarchical beta processes and the indian buffet process. AISTATS 2, 564–571 (2007)
Xue, Y., Liao, X., Carin, L., Krishnapuram, B.: Multi-task learning for classification with dirichlet process priors. J. Mach. Learn. Res. 8(Jan), 35–63 (2007)
Zhang, Y., Yang, Q.: A survey on multi-task learning. arXiv preprint arXiv:1707.08114 (2017)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Luo, S., Chu, V.W., Li, Z. et al. Multi-task learning by hierarchical Dirichlet mixture model for sparse failure prediction. Int J Data Sci Anal 12, 15–29 (2021). https://doi.org/10.1007/s41060-020-00219-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41060-020-00219-z