ABSTRACT
Consider a set of black-box models – each of them independently trained on a different dataset – answering the same predictive spatio-temporal query. Being built in isolation, each model traverses its own life-cycle until it is deployed to production, learning data patterns from different datasets and facing independent hyper-parameter tuning. In order to answer the query, the set of black-box predictors has to be ensembled and allocated to the spatio-temporal query region. However, computing an optimal ensemble is a complex task that involves selecting the appropriate models and defining an effective allocation strategy that maps the models to the query region. In this paper we present DJEnsemble, a cost-based strategy for the automatic selection and allocation of a disjoint ensemble of black-box predictors to answer predictive spatio-temporal queries. We conduct a set of extensive experiments that evaluate DJEnsemble and highlight its efficiency, selecting model ensembles that are almost as efficient as the optimal solution. When compared against the traditional ensemble approach, DJEnsemble achieves up to 4X improvement in execution time and almost 9X improvement in prediction accuracy.
- Saeed Aghabozorgi, Ali Seyed Shirkhorshidi, and Teh Ying Wah. 2015. Time-Series Clustering—A Decade Review. Information Systems 53, C (2015).Google Scholar
- L. Ambrogioni, Y. Berezutskaya, U. Guclu, E.W.P. van den Borne, Y. Gucluturk, M.A.J. van Gerven, and E.G.G. Maris. 2017. Bayesian Model Ensembling Using Meta-trained Recurrent Neural Networks. In Proceedings of 2017 NIPS Conference on Neural Information Processing Systems.Google Scholar
- P. Brazdil, C. Giraud-Carrier, C. Soares, and R. Vilalta. 2009. Metalearning: Applications to Data Mining. Springer.Google ScholarCross Ref
- Y. Chalabi and W. Diethelm. 2012. Flexible Distribution Modeling with the Generalized Lambda Distribution. ETH Econohysics Working and White Papers Series (2012).Google Scholar
- Xingyi Cheng, Ruiqing Zhang, and Wei Xu. 2018. DeepTransport: Learning Spatial-Temporal Dependency for Traffic Condition Forecasting. In Proceedings of 2018 IJCNN International Joint Conference on Neural Networks. 1–8.Google ScholarCross Ref
- Noy Cohen-Shapira, Lior Rokach, Bracha Shapira, Gilad Katz, and Roman Vainshtein. 2019. AutoGRD: Model Recommendation Through Graphical Dataset Representation. In Proceedings of 2019 ACM CIKM International Conference on Information and Knowledge Management. 821–830.Google ScholarDigital Library
- Daniel Crankshaw, Xin Wang, Guilio Zhou, Michael J. Franklin, Joseph E. Gonzalez, and Ion Stoica. 2017. Clipper: A Low-Latency Online Prediction Serving System. In Proceedings of 2017 NSDI USENIX Symposium on Networked Systems Design and Implementation. 613–627.Google Scholar
- Tobias Domhan, Jost Tobias Springenberg, and Frank Hutter. 2015. Speeding Up Automatic Hyperparameter Optimization of Deep Neural Networks by Extrapolation of Learning Curves. In Proceedings of 2015 IJCAI International Joint Conference on Artificial Intelligence. 3460–3468.Google Scholar
- P. Furtado and P. Baumann. 1999. Storage of Multidimensional Arrays Based on Arbitrary Tiling. In Proceedings of 1999 IEEE ICDE International Conference on Data Engineering.Google Scholar
- Ping Hu, Dongqi Cai, Shandong Wang, Anbang Yao, and Yurong Chen. 2017. Learning Supervised Scoring Ensemble for Emotion Recognition in the Wild. In Proceedings of 2017 ACM ICMI International Conference on Multimodal Interaction.Google ScholarDigital Library
- G. Huffman, D. Bolvin, D. Braithwaite, K. Hsu, R. Joyce, and P. Xie. 2014. NASA Global Precipitation Measurement (GPM) Integrated Multi-satellitE Retrievals for GPM (IMERG) v5.2. NASA (2014).Google Scholar
- F. Hutter, L. Kotthoff, and J. Vanschoren. 2019. Automated Machine Learning: Methods, Systems, Challenges. Springer.Google Scholar
- Daniel Kang, Raghavan Deepti, Peter Bailis, and Matei Zaharia. 2019. Model Assertion for Monitoring and Improving ML Models. In Proceedings of 2019 SysML Conference.Google ScholarCross Ref
- Ji Liu, Noel Moreno Lemus, Esther Pacitti, Fábio Porto, and Patrick Valduriez. 2020. Parallel Computation of PDFs on Big Spatial Data Using Spark. Distributed and Parallel Databases 38, 1 (2020), 63–100.Google ScholarDigital Library
- Hermano Lourenço Souza Lustosa, Anderson Chaves da Silva, Daniel Nascimento Ramos da Silva, Patrick Valduriez, and Fabio Porto. 2020. SAVIME: An Array DBMS for Simulation Analysis and ML Models Prediction. Journal of Information Data Management 11, 3 (2020).Google Scholar
- Yania Molina Souto, Fabio Porto, Ana Maria C. Moura, and E. Bezerra. 2018. A Spatiotemporal Ensemble Approach to Rainfall Forecasting. In Proceedings of 2018 IJCNN International Joint Conference on Neural Networks. 1–8.Google Scholar
- Minard Muller. 2007. Information Retrieval for Music and Motion. Springer.Google Scholar
- NCAR. 2010. NCEP Climate Forecast System Reanalysis (CFSR) 6-hourly Products, January 1979 to December 2010. https://doi.org/10.5065/D69K487Google Scholar
- John S. Ramberg and Bruce W. Schmeiser. 1974. An Approximate Method for Generating Asymmetric Random Variables. Commun. ACM 17, 2 (1974), 78–82.Google ScholarDigital Library
- Wei Wang, Jinyang Gao, Meihui Zhang, Sheng Wang, Gang Chen, Teck Khim Ng, Beng Chin Ooi, Jie Shao, and Moaz Reyad. 2018. Rafiki: Machine Learning as an Analytics Service System. Proc. VLDB Endow. 12, 2 (2018), 128–140.Google ScholarDigital Library
- Cha Zhang and Yunqian Ma. 2012. Ensemble Machine Learning: Methods and Applications. Springer.Google Scholar
- X. Zheng, J. Ye, Y. Chen, S. Wistar, J. Li, J. A. Piedra Fernández, M. A. Steinberg, and J. Z. Wang. 2019. Detecting Comma-Shaped Clouds for Severe Weather Forecasting Using Shape and Motion. IEEE Transactions on Geoscience and Remote Sensing 57, 6 (2019), 3788–3801.Google ScholarCross Ref
Recommendations
Analysis of predictive spatio-temporal queries
Given a set of objects S, a spatio-temporal window query q retrieves the objects of S that will intersect the window during the (future) interval qT. A nearest neighbor query q retrieves the objects of S closest to q during qT. Given a threshold d, a ...
Cost-based holistic twig joins
An evaluation of XML queries such as XQuery or XPath expressions represents a challenging task due to its complexity. Many algorithms have been introduced to cope with this problem. Some of them, called binary joins, evaluate separated parts of a query ...
Query optimization for spatio-temporal data stream management systems
Location-detection devices are used ubiquitously in moving objects due to the everyday decreasing cost and simplified technology. Usually, these devices will send the moving objects' location information to a spatio-temporal data stream management ...
Comments