Abstract
Edge analytics receives an ever-increasing interest since processing streaming data closer to where they are produced, rather than transferring them to the cloud, ensures lower latency while also addresses data privacy issues. In this work, we deal with the placement of analytic tasks to heterogeneous geo-distributed edge devices while targeting three objectives, namely latency, quality of results, and resource utilization. In addition, we investigate this multi-objective problem in a multi-query setting, where we jointly optimize multiple analytic jobs while dynamically adjusting task placement decisions. We explore multiple solutions that we thoroughly evaluate; interestingly, in a multi-query setting, all three objectives can be improved simultaneously by our proposals in many cases. Furthermore, we develop a proof-of-concept prototype using Apache Storm. Our solutions are thoroughly evaluated and shown to yield improvements by more than 50% compared to advanced baselines targeting only latency. Moreover, our software prototype managed to achieve speedups of up to 6\(\times\) over the Resource Aware Apache Storm scheduler, with an average speedup of 2.76\(\times\), when deployed over a small-scale infrastructure.
















Similar content being viewed by others
Data availability
The scripts to generate the datasets analysed during the current study are publicly available in the following Github repository: https://github.com/annavalentina/Multi-Query-Optimization.
Notes
In the remainder, we refer to edge computing as covering both edge and fog devices although in general fog computing may be deemed as distinct from edge computing [3].
It is trivial to extend our notation and solution to consider task types and the requirements to refer to task types rather than individual tasks; for simplicity and to reduce notation, we do not consider task types.
Again, for simplicity, we assume that tuples in all queries are of the same length; it is straightforward to account for variable-length tuples across different task outputs.
https://storm.apache.org/
https://flink.apache.org/
https://github.com/apache/incubator-heron.
https://github.com/thombashi/tcconfig.
References
Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L.: Edge computing: vision and challenges. IEEE Internet Things J. 3(5), 637–646 (2016). https://doi.org/10.1109/JIOT.2016.2579198
Asghari, P., Rahmani, A.M., Javadi, H.H.S.: Internet of things applications: a systematic review. Comput. Netw. 148, 241–261 (2019). https://doi.org/10.1016/j.comnet.2018.12.008
Iorga, M., Feldman, L., Barton, R., Martin, M.J., Goren, N.S., Mahmoudi, C., et al.: Fog computing conceptual model. Special Publication (NIST SP)-500-325 (2018)
Zhao, G., Xu, H., Zhao, Y., Qiao, C., Huang, L.: Offloading dependent tasks in mobile edge computing with service caching. In: 39th IEEE Conference on Computer Communications, INFOCOM 2020, Toronto, ON, Canada, July 6–9, 2020, pp. 1997–2006 (2020). https://doi.org/10.1109/INFOCOM41043.2020.9155396
Nardelli, M., Cardellini, V., Grassi, V., Presti, F.L.: Efficient operator placement for distributed data stream processing applications. IEEE Trans. Parallel Distrib. Syst. 30(8), 1753–1767 (2019). https://doi.org/10.1109/TPDS.2019.2896115
Michailidou, A., Gounaris, A., Symeonides, M., Trihinas, D.: EQUALITY: quality-aware intensive analytics on the edge. Inf. Syst. 105, 101953 (2022). https://doi.org/10.1016/j.is.2021.101953
Khan, W.Z., Ahmed, E., Hakak, S., Yaqoob, I., Ahmed, A.: Edge computing: a survey. Future Gen. Comput. Syst. 97, 219–235 (2019). https://doi.org/10.1016/j.future.2019.02.050
Ahmed, E., Rehmani, M.H.: Mobile edge computing: opportunities, solutions, and challenges. Future Gen. Comput. Syst. 70, 59–63 (2017). https://doi.org/10.1016/j.future.2016.09.015
Mao, Y., Zhang, J., Song, S., Letaief, K.B.: Power-delay tradeoff in multi-user mobile-edge computing systems. In: 2016 IEEE Global Communications Conference, GLOBECOM 2016, Washington, DC, USA, December 4–8, 2016, pp. 1–6 (2016). https://doi.org/10.1109/GLOCOM.2016.7842160
Motaghare, O., Pillai, A.S., Ramachandran, K.I.: Predictive maintenance architecture. In: 2018 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), pp. 1–4 (2018). https://doi.org/10.1109/ICCIC.2018.8782406
Amruthnath, N., Gupta, T.: A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance. In: 2018 5th International Conference on Industrial Engineering and Applications (ICIEA), pp. 355–361 (2018). https://doi.org/10.1109/IEA.2018.8387124
Zhao, P., Kurihara, M., Tanaka, J., Noda, T., Chikuma, S., Suzuki, T.: Advanced correlation-based anomaly detection method for predictive maintenance. In: 2017 IEEE International Conference on Prognostics and Health Management, ICPHM 2017, Dallas, TX, USA, June 19–21, 2017, pp. 78–83 (2017). https://doi.org/10.1109/ICPHM.2017.7998309
Albers, T., Lazovik, E., Yousefi, M.H.N., Lazovik, A.: Adaptive on-the-fly changes in distributed processing pipelines. Front. Big Data 4, 666174 (2021). https://doi.org/10.3389/fdata.2021.666174
Hiessl, T., Karagiannis, V., Hochreiner, C., Schulte, S., Nardelli, M.: Optimal placement of stream processing operators in the fog. In: 3rd IEEE International Conference on Fog and Edge Computing, ICFEC 2019, Larnaca, Cyprus, May 14–17, 2019, pp. 1–10 (2019). https://doi.org/10.1109/CFEC.2019.8733147
Skarlat, O., Nardelli, M., Schulte, S., Dustdar, S.: Towards qos-aware fog service placement. In: 1st IEEE International Conference on Fog and Edge Computing, ICFEC 2017, Madrid, Spain, May 14–15, 2017, pp. 89–96 (2017). https://doi.org/10.1109/ICFEC.2017.12
Xu, X., Cao, H., Geng, Q., Liu, X., Dai, F., Wang, C.: Dynamic resource provisioning for workflow scheduling under uncertainty in edge computing environment. Concurr. Comput. Pract. Exp 34, 14 (2022). https://doi.org/10.1002/cpe.5674
Renart, E.G., Veith, A.D.S., Balouek-Thomert, D., de Assunção, M.D., Lefèvre, L., Parashar, M.: Distributed operator placement for iot data analytics across edge and cloud resources. In: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2019, Larnaca, Cyprus, May 14–17, 2019, pp. 459–468 (2019). https://doi.org/10.1109/CCGRID.2019.00060
Trihinas, D., Pallis, G., Dikaiakos, M.D.: Adam: an adaptive monitoring framework for sampling and filtering on iot devices. In: 2015 IEEE International Conference on Big Data (IEEE BigData 2015), Santa Clara, CA, USA, October 29–November 1, 2015, pp. 717–726 (2015). https://doi.org/10.1109/BigData.2015.7363816
Wen, Z., Quoc, D.L., Bhatotia, P., Chen, R., Lee, M.: Approxiot: approximate analytics for edge computing. In: 38th IEEE International Conference on Distributed Computing Systems, ICDCS 2018, Vienna, Austria, July 2–6, 2018, pp. 411–421 (2018). https://doi.org/10.1109/ICDCS.2018.00048
Cardellini, V., Grassi, V., Presti, F.L., Nardelli, M.: Optimal operator placement for distributed stream processing applications. In: Proceedings of the 10th ACM International Conference on Distributed and Event-Based Systems, DEBS ’16, Irvine, CA, USA, June 20–24, 2016, pp. 69–80 (2016). https://doi.org/10.1145/2933267.2933312
Gurobi Optimization, LLC: Gurobi Optimizer Reference Manual (2022). https://www.gurobi.com
Li, J., Deshpande, A., Khuller, S.: Minimizing communication cost in distributed multi-query processing. In: Ioannidis, Y.E., Lee, D.L., Ng, R.T. (eds.) Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29 2009–April 2 2009, Shanghai, China, pp. 772–783 (2009). https://doi.org/10.1109/ICDE.2009.85
Kougka, G., Gounaris, A., Tsichlas, K.: Practical algorithms for execution engine selection in data flows. Future Gen. Comput. Syst. 45, 133–148 (2015). https://doi.org/10.1016/j.future.2014.11.011
Peng, B., Hosseini, M., Hong, Z., Farivar, R., Campbell, R.H.: R-storm: resource-aware scheduling in storm. In: Lea, R., Gopalakrishnan, S., Tilevich, E., Murphy, A.L., Blackstock, M. (eds.) Proceedings of the 16th Annual Middleware Conference, Vancouver, BC, Canada, December 07–11, 2015, pp. 149–161 (2015). https://doi.org/10.1145/2814576.2814808
Bordin, M.V., Griebler, D., Mencagli, G., Geyer, C.F.R., Fernandes, L.G.L.: Dspbench: a suite of benchmark applications for distributed data stream processing systems. IEEE Access 8, 222900–222917 (2020). https://doi.org/10.1109/ACCESS.2020.3043948
Aït-Salaht, F., Desprez, F., Lebre, A.: An overview of service placement problem in fog and edge computing. ACM Comput. Surv. 53(3), 65–16535 (2020). https://doi.org/10.1145/3391196
Sonkoly, B., Czentye, J., Szalay, M., Németh, B., Toka, L.: Survey on placement methods in the edge and beyond. IEEE Commun. Surv. Tutor. 23(4), 2590–2629 (2021). https://doi.org/10.1109/COMST.2021.3101460
Zhu, C., Pastor, G., Xiao, Y., Li, Y., Ylä-Jääski, A.: Fog following me: latency and quality balanced task allocation in vehicular fog computing. In: 15th Annual IEEE International Conference on Sensing, Communication, and Networking, SECON 2018, Hong Kong, China, June 11–13, 2018, pp. 298–306 (2018). https://doi.org/10.1109/SAHCN.2018.8397129
Benamer, A.R., Teyeb, H., Hadj-Alouane, N.B.: Latency-aware placement heuristic in fog computing environment. In: Panetto, H., Debruyne, C., Proper, H.A., Ardagna, C.A., Roman, D., Meersman, R. (eds.) On the Move to Meaningful Internet Systems. OTM 2018 Conferences - Confederated International Conferences: CoopIS, C &TC, and ODBASE 2018, Valletta, Malta, October 22–26, 2018, Proceedings, Part II. Lecture Notes in Computer Science, vol. 11230, pp. 241–257 (2018). https://doi.org/10.1007/978-3-030-02671-4_14
Xia, Y., Etchevers, X., Letondeur, L., Coupaye, T., Desprez, F.: Combining hardware nodes and software components ordering-based heuristics for optimizing the placement of distributed iot applications in the fog. In: Haddad, H.M., Wainwright, R.L., Chbeir, R. (eds.) Proceedings of the 33rd Annual ACM Symposium on Applied Computing, SAC 2018, Pau, France, April 09–13, 2018, pp. 751–760 (2018). https://doi.org/10.1145/3167132.3167215
Wan, Z., Deng, X., Cao, Z., Zhang, H.: Mobile resource aware scheduling for mobile edge environment. In: 2018 IEEE International Conference on Communications, ICC 2018, Kansas City, MO, USA, May 20–24, 2018, pp. 1–6 (2018). https://doi.org/10.1109/ICC.2018.8422631
Sajjad, H.P., Danniswara, K., Al-Shishtawy, A., Vlassov, V.: Spanedge: towards unifying stream processing over central and near-the-edge data centers. In: IEEE/ACM Symposium on Edge Computing, SEC 2016, Washington, DC, USA, October 27–28, 2016, pp. 168–178 (2016). https://doi.org/10.1109/SEC.2016.17
Filatov, M., Kantere, V.: Multi-workflow optimization in PAW. In: Proceedings of the 20th International Conference on Extending Database Technology, EDBT 2017, Venice, Italy, March 21–24, 2017, pp. 566–569 (2017). https://doi.org/10.5441/002/edbt.2017.71
Jonathan, A., Chandra, A., Weissman, J.B.: Multi-query optimization in wide-area streaming analytics. In: Proceedings of the ACM Symposium on Cloud Computing, SoCC 2018, Carlsbad, CA, USA, October 11–13, 2018, pp. 412–425 (2018). https://doi.org/10.1145/3267809.3267842
Dökeroglu, T., Bayir, M.A., Cosar, A.: Robust heuristic algorithms for exploiting the common tasks of relational cloud database queries. Appl. Soft Comput. 30, 72–82 (2015). https://doi.org/10.1016/j.asoc.2015.01.026
Michiardi, P., Carra, D., Migliorini, S.: In-memory caching for multi-query optimization of data-intensive scalable computing workloads. In: Proceedings of the Workshops of the EDBT/ICDT 2019 Joint Conference, EDBT/ICDT 2019, Lisbon, Portugal, March 26, 2019. CEUR Workshop Proceedings, vol. 2322 (2019). http://ceur-ws.org/Vol-2322/DARLIAP_2.pdf
Zhao, H., Sakellariou, R.: Scheduling multiple dags onto heterogeneous systems. In: 20th International Parallel and Distributed Processing Symposium (IPDPS 2006), Proceedings, 25–29 April 2006, Rhodes Island, Greece (2006). https://doi.org/10.1109/IPDPS.2006.1639387
Liu, B., Xu, X., Qi, L., Ni, Q., Dou, W.: Task scheduling with precedence and placement constraints for resource utilization improvement in multi-user MEC environment. J. Syst. Archit. 114, 101970 (2021). https://doi.org/10.1016/j.sysarc.2020.101970
Hülsmann, J., Traub, J., Markl, V.: Demand-based sensor data gathering with multi-query optimization. Proc. VLDB Endow. 13(12), 2801–2804 (2020). https://doi.org/10.14778/3415478.3415479
Georgiou, Z., Symeonides, M., Trihinas, D., Pallis, G., Dikaiakos, M.D.: Streamsight: a query-driven framework for streaming analytics in edge computing. In: Sill, A., Spillner, J. (eds.) 11th IEEE/ACM International Conference on Utility and Cloud Computing, UCC 2018, Zurich, Switzerland, December 17–20, 2018, pp. 143–152 (2018). https://doi.org/10.1109/UCC.2018.00023
Rabkin, A., Arye, M., Sen, S., Pai, V.S., Freedman, M.J.: Aggregation and degradation in jetstream: streaming analytics in the wide area. In: Mahajan, R., Stoica, I. (eds.) Proceedings of the 11th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2014, Seattle, WA, USA, April 2–4, 2014, pp. 275–288 (2014)
Li, Y., Chen, Y., Lan, T., Venkataramani, G.: Mobiqor: Pushing the envelope of mobile edge computing via quality-of-result optimization. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), pp. 1261–1270 (2017). https://doi.org/10.1109/ICDCS.2017.54
Aniello, L., Baldoni, R., Querzoni, L.: Adaptive online scheduling in storm. In: Chakravarthy, S., Urban, S.D., Pietzuch, P.R., Rundensteiner, E.A. (eds.) The 7th ACM International Conference on Distributed Event-Based Systems, DEBS ’13, Arlington, TX, USA, June 29–July 03, 2013, pp. 207–218. ACM. https://doi.org/10.1145/2488222.2488267
Xu, J., Chen, Z., Tang, J., Su, S.: T-storm: traffic-aware online scheduling in storm. In: IEEE 34th International Conference on Distributed Computing Systems, ICDCS 2014, Madrid, Spain, June 30–July 3, 2014, pp. 535–544 (2014). https://doi.org/10.1109/ICDCS.2014.61
Liu, X., Buyya, R.: D-storm: dynamic resource-efficient scheduling of stream processing applications. In: 23rd IEEE International Conference on Parallel and Distributed Systems, ICPADS 2017, Shenzhen, China, December 15–17, 2017, pp. 485–492 (2017). https://doi.org/10.1109/ICPADS.2017.00070
Eskandari, L., Mair, J., Huang, Z., Eyers, D.M.: T3-scheduler: a topology and traffic aware two-level scheduler for stream processing systems in a heterogeneous cluster. Future Gen. Comput. Syst. 89, 617–632 (2018). https://doi.org/10.1016/j.future.2018.07.011
Nasiri, H., Nasehi, S., Divband, A., Goudarzi, M.: A scheduling algorithm to maximize storm throughput in heterogeneous cluster. J. Big Data 10, 103 (2023). https://doi.org/10.1186/s40537-023-00771-y
Hadian, H., Farrokh, M., Sharifi, M., Jafari, A.: An elastic and traffic-aware scheduler for distributed data stream processing in heterogeneous clusters. J. Supercomput. 79, 461–498 (2023). https://doi.org/10.1007/s11227-022-04669-z
Hadian, H., Sharifi, M.: GT-scheduler: a hybrid graph-partitioning and tabu-search based task scheduler for distributed data stream processing systems. Clust. Comput. (2024). https://doi.org/10.1007/s10586-023-04260-y
Acknowledgements
The research work was supported by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I. Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment grant (Project Number:1052, Project Name: DataflowOpt).
Funding
The research work was supported by the Hellenic Foundation for Research and Innovation (H.F.R.I.) under the “First Call for H.F.R.I. Research Projects to support Faculty members and Researchers and the procurement of high-cost research equipment grant (Project Number:1052, Project Name: DataflowOpt).
Author information
Authors and Affiliations
Contributions
A-V.M conducted the main research, which was supervised by A.G. The system implementation was performed by A-V.M and C.B. A-V.M and A.G both contributed to the writing. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Conflict of interest
The authors have no Conflict of interest to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Michailidou, AV., Bellas, C. & Gounaris, A. Optimizing task allocation in multi-query edge analytics. Cluster Comput 27, 8289–8306 (2024). https://doi.org/10.1007/s10586-024-04427-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-024-04427-1