A parameter-level parallel optimization algorithm for large-scale spatio-temporal data mining

Liu, Zhiqiang; Shi, Xuanhua; He, Ligang; Yu, Dongxiao; Jin, Hai; Yu, Chen; Dai, Hulin; Feng, Zezhao

doi:10.1007/s10619-020-07287-x

A parameter-level parallel optimization algorithm for large-scale spatio-temporal data mining

Published: 25 February 2020

Volume 38, pages 739–765, (2020)
Cite this article

Distributed and Parallel Databases Aims and scope Submit manuscript

Zhiqiang Liu¹,
Xuanhua Shi¹,
Ligang He²,
Dongxiao Yu¹,
Hai Jin ORCID: orcid.org/0000-0002-3934-7605¹,
Chen Yu¹,
Hulin Dai¹ &
…
Zezhao Feng¹

350 Accesses
5 Citations
Explore all metrics

Abstract

The goal of spatio-temporal data mining is to discover previously unknown but useful patterns from the spatial and temporal data. However, explosive growth of the spatiotemporal data emphasizes the need for developing novel computationally efficient methods for large-scale data mining applications. Since lots of spatiotemporal data mining problems can be converted to an optimization problem, in this paper, we propose an efficient parameter-level parallel optimization algorithm for large-scale spatiotemporal data mining. In detail, most of previous optimization methods are based on gradient descent methods, which iteratively update the model and provide model-level convergence control for all parameters. Namely, they treat all parameters equally and keep updating all parameters until every parameter has converged. However, we find that during the iterative process, the convergence rates of model parameters are different from each other. This may cause redundant computation and reduce the performance. To solve this problem, we propose a parameter-level stochastic gradient descent (plpSGD), in which the convergence of each parameter is considered independently and only unconvergent parameters are updated in each iteration. Moreover, the updating of model parameters are parallelized in plpSGD to further improve the performance of SGD. We have conducted extensive experiments to evaluate the performance of plpSGD. The experimental results show that compared to previous SGD methods, plpSGD can significantly accelerate the convergence of SGD and achieve the excellent scalability with little sacrifice of the solution accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent Advances in Scaling Up Gaussian Process Predictive Models for Large Spatiotemporal Data

Data-Driven Approaches for Spatio-Temporal Analysis: A Survey of the State-of-the-Arts

Article 29 May 2020

GWmodelS: A High-Performance Computing Framework for Geographically Weighted Models

Notes

References

Zhang, J., Zheng, Y., Qi, D.: Deep spatio-temporal residual networks for citywide crowd flows prediction. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 1655–1661 (2017)
Yuan, Z., Zhou, X., Yang, T.: Hetero-convlstm: a deep learning approach to traffic accident prediction on heterogeneous spatio-temporal data. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 984–992 (2018)
Kurth, T., Treichler, S., Romero, J., Mudigonda, M., Luehr, N., Phillips, E., Mahesh, A., Matheson, M., Jack, D., Massimiliano, F., Prabhat, M.: Exascale deep learning for climate analytics. In: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis, pp. 51:1–51:12 (2018)
Culotta, A.: Towards detecting influenza epidemics by analyzing twitter messages. In: Proceedings of the First Workshop on Social Media Analytics, pp. 115–122 (2010)
Atluri, G., Karpatne, A., Kumar, V.: Spatio-temporal data mining: a survey of problems and methods. ACM Comput. Surv. 51(4), 83:1–83:41 (2018)
Article Google Scholar
Jun, G., Ghosh, J.: Spatially adaptive classification of land cover with remote sensing data. IEEE Trans. Geosci. Remote Sens. 49(7), 2662–2673 (2011)
Article Google Scholar
Ginsberg, J., Mohebbi, M.H., Patel, R.S., Brammer, L., Smolinski, M.S., Brilliant, L.: Detecting influenza epidemics using search engine query data. Nature 457(7232), 1012–1014 (2009)
Article Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR’14, pp. 1725–1732 (2014)
Kumar, S., Madria, S., Linderman, M.: M-grid: a distributed framework for multidimensional indexing and querying of location based data. Distrib. Parallel Databases 35(1), 55–81 (2017)
Article Google Scholar
Villarroya, S., Viqueira, J.R., Regueiro, M.A., Taboada, J.A., Cotos, J.M.: Soda: a framework for spatial observation data analysis. Distrib. Parallel Databases 34(1), 65–99 (2016)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Chilimbi, T., Suzue, Y., Apacible, J., Kalyanaraman, K.: Project adam: building an efficient and scalable deep learning training system. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, vol. 14, pp. 571–582 (2014)
Zhou, J., Li, X., Zhao, P., Chen, C., Li, L., Yang, X., Cui, Q., Yu, J., Chen, X., Ding, Y., Qi, Y.A.: Kunpeng: parameter server based distributed learning systems and its applications in alibaba and ant financial. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1693–1702 (2017)
Dean, J., Corrado, G.S., Monga, R., Chen, K., Devin, M., Le, Q.V., Mao, M.Z., Ranzato, M., Senior, A., Tucker, P., Yang, K., Ng, A.Y.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems, pp. 1223–1231 (2012)
Cotter, A., Shamir, O., Srebro, N., Sridharan, K.: Better mini-batch algorithms via accelerated gradient methods. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 1647–1655 (2011)
Teo, C.H., Vishwanthan, S.V.N., Smola, A.J., Le, Q.V.: Bundle methods for regularized risk minimization. J. Mach. Learn. Res. 11(1), 311–365 (2010)
MathSciNet MATH Google Scholar
Zinkevich, M., Langford, J., Smola, A.J.: Slow learners are fast. In: Proceedings of the 23rd International Conference on Neural Information Processing Systems, pp. 2331–2339 (2009)
Zinkevich, M., Weimer, M., Smola, A.J., Li, L.: Parallelized stochastic gradient descent. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, pp. 2595–2603 (2010)
Bradley, J.K., Kyrola, A., Bickson, D., Guestrin, C.: Parallel coordinate descent for l1-regularized loss minimization. In: Proceedings of the 28th International Conference on International Conference on Machine Learning, pp. 321–328 (2011)
Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y.Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: Proceedings of the 19th International Conference on Neural Information Processing Systems, pp. 281–288 (2006)
Li, M., Andersen, D.G., Smola, A., Yu, K.: Communication efficient distributed machine learning with the parameter server. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, pp. 19–27 (2014)
Huo, Z., Huang, H.: Asynchronous mini-batch gradient descent with variance reduction for non-convex optimization. In: Proceedings of the 21st AAAI Conference on Artificial Intelligence, pp. 2043–2049 (2017)
Agarwal, A., Duchi, J.C.: Distributed delayed stochastic optimization. In: Proceedings of the 25th International Conference on Neural Information Processing Systems, pp. 873–881 (2011)
Namkoong, H., Sinha, A., Yadlowsky, S., Duchi, J.C.: Adaptive sampling probabilities for non-smooth optimization. In: Proceedings of the 34th International Conference on Machine Learning, pp. 2574–2583 (2017)
Zhao, P., Zhang, T.: Stochastic optimization with importance sampling for regularized loss minimization. In: Proceedings of the 32nd International Conference on Machine Learning, pp. 1–9 (2015)
Gopal, S.: Adaptive sampling for sgd by exploiting side information. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 364–372 (2016)
Reddi, S.J., Hefny, A., Sra, S., Póczos, B., Smola, A.: On variance reduction in stochastic gradient descent and its asynchronous variants. In: Proceedings of the 29th International Conference on Neural Information Processing Systems, pp. 2629–2637 (2015)
Zhao, P., Zhang, T.: Accelerating minibatch stochastic gradient descent using stratified sampling. arXiv:1405.3080 (2014)
Li, M., Zhang, T., Chen, Y., Smola, A.J.: Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 661–670 (2014)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Shamir, O., Zhang, T.: Stochastic gradient descent for non-smooth optimization: convergence results and optimal averaging schemes. In: Proceedings of the 30th International Conference on Machine Learning, pp. 71–79 (2013)
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014)
Article Google Scholar
Niu, F., Recht, B., Re, C., Wright, S.H.: Hogwild: a lock-free approach to parallelizing stochastic gradient descent. In: Proceedings of the 24th International Conference on Neural Information Processing Systems, pp. 693–701 (2011)
Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: Rcv1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5(Apr), 361–397 (2004)
Google Scholar
Needell, D., Srebro, N., Ward, R.: Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, pp. 1017–1025 (2014)
Vainsencher, D., Liu, H., Zhang, T.: Local smoothness in variance reduced optimization. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, pp. 2170–2178 (2015)
Li, M., Andersen, D.G., Park, J.W., Smola, A.J., Ahmed, A., Josifovski, V., Long, J., Shekita, E.J., Su, B.-Y.: Scaling distributed machine learning with the parameter server. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, pp. 583–598 (2014)
Xing, E.P., Ho, Q., Dai, W., Kim, J.-K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A., Yu, Y.: Petuum: a new platform for distributed machine learning on big data. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1335–1344 (2015)
Li, H., Kadav, A., Kruus, E., Ungureanu, C.: Malt: distributed data-parallelism for existing ml applications. In: Proceedings of the 10th European Conference on Computer Systems, p. 3 (2015)
Liu, J., Wright, S.J., Ré, C., Bittorf, V., Sridhar, Srikrishna: An asynchronous parallel stochastic coordinate descent algorithm. J. Mach. Learn. Res. 16(1), 285–322 (2015)
MathSciNet MATH Google Scholar
Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S.S., Sundararajan, S.: A dual coordinate descent method for large-scale linear svm. In: Proceedings of the 25th international conference on Machine learning, pp. 408–415. ACM (2008)
Jothimurugesan, E., Tahmasbi, A., Gibbons, P.B., Tirthapura, S.: Variance-reduced stochastic gradient descent on streaming data. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, pp. 9928–9937 (2018)
Yuan, K., Ying, B., Sayed, A.H.: Cover: a cluster-based variance reduced method for online learning. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3102–3106 (2019)
Bifet, A., Frank, E.: Sentiment knowledge discovery in twitter streaming data. In: Proceedings of the 13th International Conference on Discovery Science, pp. 1–15 (2010)

Download references

Acknowledgements

This work is supported by the National Key Research and Development Plan (Nos. 2017YFC0803700) and NSFC (Nos. 61772218 and 61832006).

Author information

Authors and Affiliations

National Engineering Research Center for Big Data Technology and System, Service Computing Technology and System Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Zhiqiang Liu, Xuanhua Shi, Dongxiao Yu, Hai Jin, Chen Yu, Hulin Dai & Zezhao Feng
Department of Computer Science, University of Warwick, Coventry, UK
Ligang He

Authors

Zhiqiang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xuanhua Shi
View author publications
You can also search for this author in PubMed Google Scholar
Ligang He
View author publications
You can also search for this author in PubMed Google Scholar
Dongxiao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jin
View author publications
You can also search for this author in PubMed Google Scholar
Chen Yu
View author publications
You can also search for this author in PubMed Google Scholar
Hulin Dai
View author publications
You can also search for this author in PubMed Google Scholar
Zezhao Feng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuanhua Shi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Z., Shi, X., He, L. et al. A parameter-level parallel optimization algorithm for large-scale spatio-temporal data mining. Distrib Parallel Databases 38, 739–765 (2020). https://doi.org/10.1007/s10619-020-07287-x

Download citation

Published: 25 February 2020
Issue Date: September 2020
DOI: https://doi.org/10.1007/s10619-020-07287-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A parameter-level parallel optimization algorithm for large-scale spatio-temporal data mining

Abstract

Access this article

Similar content being viewed by others

Recent Advances in Scaling Up Gaussian Process Predictive Models for Large Spatiotemporal Data

Data-Driven Approaches for Spatio-Temporal Analysis: A Survey of the State-of-the-Arts

GWmodelS: A High-Performance Computing Framework for Geographically Weighted Models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A parameter-level parallel optimization algorithm for large-scale spatio-temporal data mining

Abstract

Access this article

Similar content being viewed by others

Recent Advances in Scaling Up Gaussian Process Predictive Models for Large Spatiotemporal Data

Data-Driven Approaches for Spatio-Temporal Analysis: A Survey of the State-of-the-Arts

GWmodelS: A High-Performance Computing Framework for Geographically Weighted Models

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation