Skip to main content
Log in

Guaranteeing the response deadline for general aggregation trees

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

It is essential to provide responses to queries within time deadlines, even if not exact and complete. To reduce the query latency, systems usually partition large-scale data computations as a series of tasks over many processes and aggregate them to reduce the response time by using aggregation trees. An obstacle is that the involved processes of a query usually differ in their speeds, thus not all processes can complete their tasks in time. This would directly degrade the response quality (the number of outputs received by the root of an aggregation tree). In this paper, we propose a general aggregation tree model, Tarot, to maximize the response quality by systematically addressing the following challenging issues: (1) fine-grained partition of the query deadline along the multi-level aggregation tree; (2) learning the distribution of durations at each level in the aggregation tree to optimize the wait durations at aggregators; (3) adaptively reassigning tasks over processes according to their status; (4) performing periodic aggregation of received outputs from the low level to avoid missing the deadline. The prior model does not consider the four aspects simultaneously. Extensive evaluations indicate that Tarot can adapt to multi-level trees and considerably improve the response quality compared to prior work while guaranteeing the query deadline.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Guo D, Xie J, Zhou X, Zhu X, Wei W, Luo X. Exploiting efficient and scalable shuffle transfers in future data center networks. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(4): 997–1009

    Article  Google Scholar 

  2. Yuan Y, Wang G, Chen L, Wang H. Efficient keyword search on uncertain graph data. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(12): 2767–2779

    Article  Google Scholar 

  3. Yuan Y, Wang G, Chen L, Wang H. Graph similarity search on large uncertain graph databases. The International Journal on Very Large Data Bases, 2015, 24(2): 271–296

    Article  Google Scholar 

  4. Agarwal S, Iyer A P, Panda A, Madden S, Mozafari B, Stoica I. Blink and it’s done: interactive queries on very large data. Proceedings of the VLDB Endowment, 2012, 5(12): 1902–1905

    Article  Google Scholar 

  5. Abe T, Ueda T, Abe K, Ishibashi H, Matsuura T. Aggregation skip graph: a skip graph extension for efficient aggregation query over P2P networks. International Journal on Advances in Internet Technology, 2012, 4(3–4): 103–110

    Google Scholar 

  6. Ananthanarayanan G, Hung M C, Ren X, Stoica I, Wierman A, Yu M. GRASS: trimming stragglers in approximation analytics. In: Proceedings of the 11th USENIX Conference on Networked Systems Design and Implementation. 2014, 289–302

  7. Ding Z, Guo D, Liu X, Luo X, Chen G. A mapreduce-supported network structure for data centers. Concurrency and Computation: Practice and Experience, 2012, 24(12): 1271–1295

    Article  Google Scholar 

  8. Naimi A I, Daniel W. Big data: a revolution that will transform how we live, work, and think. American Journal of Epidemiology. 2014, 179(9): 1143–1144

    Article  Google Scholar 

  9. Yuan Y, Wang G, Yu X J, Chen L. Efficient distributed subgraph similarity matching. The International Journal on Very Large Data Bases, 2015, 24: 369–394

    Article  Google Scholar 

  10. Kumar G, Ananthanarayanan G, Ratnasamy S, Stoica I. Hold’ em or fold’ em?: aggregation queries under performance variations. In: Proceedings of the 11th European Conference on Computer Systems. 2016

  11. Dean J, Barroso L A. The tail at scale. Communications of the ACM, 2013, 56(2): 74–80

    Article  Google Scholar 

  12. Guo D, Li M. Set reconciliation via counting bloom filters. IEEE Transactions on Knowledge and Data Engineering, 2013, 25(10): 2367–2380

    Article  Google Scholar 

  13. David H A. Order Statistics; 3rd ed. USA: Wiley, 2003

    Book  Google Scholar 

  14. Guo D, Wu J, Liu Y, Jin H, Chen H, Chen T. Quasi-kautz digraphs for peer-to-peer networks. IEEE Transactions on Parallel and Distributed Systems, 2010, 22(6): 1042–1055

    Google Scholar 

  15. Luo L, Guo D, Ma R T B, Rottenstreich O, Luo X. Optimizing bloom filter: challenges, solutions, and comparisons. IEEE Communications Surveys and Tutorials, 2019, 21(2): 1912–1949

    Article  Google Scholar 

  16. Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of the 6th Symposium on Operating Systems Design and Implementation. 2004

  17. Zaharia M, Konwinski A, Joseph A D, Katz R, Stoica I. Improving mapreduce performance in heterogeneous environments. In: Proceedings of USENIX Conference on Operating Systems Design and Implementation. 2008, 29–42

  18. Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In: Proceedings of IEEE Symposium on Mass Storage Systems and Technologies. 2010, 1–10

  19. Asanovic K, Bodík R, Demmel J, Keaveny T, Keutzer K, Kubiatowicz J, Morgan N, Patterson D, Sen K, Wawrzynek J, Wessel D, Yelick K A. A view of the parallel computing landscape. Communications of the ACM, 2009, 52(10): 56–67

    Article  Google Scholar 

  20. Ding Z, Guo D, Xue L, Luo X, Chen G. A mapreduce-supported network structure for data centers. Concurrency and Computation Practice and Experience, 2012, 24(12): 1271–1295

    Article  Google Scholar 

  21. Yuan Y, Lian X, Chen L, Sun Y, Wang G. RSkNN: kNN search on road networks by incorporating social influence. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6): 1575–1588

    Article  Google Scholar 

  22. Liao S, Chen L, Li J, Xiong W, Wu Q. A spatiotemporal aggregation query method using multi-thread parallel technique based on regional division. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences, 2015, 2(4): 1

    Article  Google Scholar 

  23. Tao Y, Kollios G, Considine J, Li F, Papadias D. Spatio-temporal aggregation using sketches. In: Proceedings of International Conference on Data Engineering. 2004, 214–225

  24. Zhang Z, Hui J, Xie X, Pan H, Feng X. An online approximate aggregation query processing method based on hadoop. In: Proceedings of International Conference on Computer Supported Cooperative Work in Design. 2016, 117–122

  25. Yuan Y, Lian X, Chen L, Yu J, Wang G, Sun Y. Keyword search over distributed graphs. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(6): 1212–1225

    Article  Google Scholar 

  26. Zhang D, Chan C Y, Tan K L. Processing spatial keyword query as a top-k aggregation query. In: Proceedings of International ACM SIGIR Conference on Research and Development in Information Retrieval. 2014, 355–364

  27. Rogge-Solti A, Weske M. Prediction of remaining service execution time using stochastic petri nets with arbitrary firing delays. In: Proceedings of International Conference on Service-Oriented Computing. 2013, 389–403

  28. Alinia B, Hajiesmaili M H, Khonsari A, Crespi N. Maximum-quality tree construction for deadline-constrained aggregation in WSNs. IEEE Sensors Journal, 2017, 17(12): 3930–3943

    Article  Google Scholar 

  29. Xu Y, Musgrave Z, Noble B, Bailey M. Bobtail: avoiding long tails in the cloud. In: Proceedings of USENIX Conference on Networked Systems Design and Implementation. 2013, 329–342

  30. Alizadeh M, Greenberg A G, Maltz D A, Padhye J, Patel P, Prabhakar B, Sengupta S, Sridharan M. Data center TCP (DCTCP). In: Proceedings of the ACM Special Interest Group on Data Communication. 2010, 63–74

  31. Ananthanarayanan G, Ghodsi A, Warfield A, Borthakur D, Kandula S, Shenker S, Stoica I. Pacman: coordinated memory caching for parallel jobs. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation. 2012, 267–280

  32. Isard M, Prabhakaran V, Currey J, Wieder U, Talwar K, Goldberg A. Quincy: fair scheduling for distributed computing clusters. In: Proceeds of IEEE International Conference on Recent Trends in Information Systems. 2009, 261–276

  33. Kavulya S, Tan J, Gandhi R, Narasimhan P. An analysis of traces from a production mapreduce cluster. In: Proceedings of IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. 2010, 94–103

  34. Wilson C, Ballani H, Karagiannis T, Rowstron A I T. Better never than late: meeting deadlines in datacenter networks. In: Proceedings of the ACM Special Interest Group on Data Communication. 2011, 50–61

  35. Xiao W, Bao W, Zhu X, Liu L. Cost-aware big data processing across geo-distributed datacenters. IEEE Transactions on Parallel and Distributed Systems, 2017, 28(11): 3114–3127

    Article  Google Scholar 

  36. Tang G, Wu K, Brunner R. Rethinking cdn design with distributed time-varying traffic demands. In: Proceedings of International Conference on Computer Communications. 2017, 1–9

  37. Tang G, Wang H, Wu K, Guo D. Tapping the knowledge of dynamic traffic demands for optimal CDN design. IEEE/ACM Transactions on Networking, 2019, 27(1): 98–111

    Article  Google Scholar 

  38. Melnik S, Gubarev A, Long J J, Romer G, Shivakumar S, Tolton M, Vassilakis T. Dremel: interactive analysis of web-scale datasets. Proceedings of the VLDB Endowment, 2010, 3(1–2): 330–339

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the National Natural Science Foundation of China (Grant No. 61772544), National Basic Research Program (973 program) (2014CB347800), the Hunan Provincial Natural Science Fund for Distinguished Young Scholars (2016JJ1002), and the Guangxi Cooperative Innovation Center of Cloud Computing and Big Data (YD16507 and YD17X11).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deke Guo.

Additional information

Jiangfan Li received the BS degree in command information system engineering from National University of Defense Technology, China in 2018. He is currently working towards the MS degree at the College of Systems Engineering, National University of Defense Technology, China. His main research interests include cloud computing and edge computing.

Chendie Yao received the BS degree in command information system engineering from National University of Defense Technology, China in 2018. She is currently working towards the MS degree at the College of Systems Engineering, National University of Defense Technology, China. Her main research interests include cloud computing and edge computing.

Junxu Xia received the BS degree in management science and engineering from National University of Defense Technology, China in 2018. He is currently working towards the MS degree at the College of Systems Engineering, National University of Defense Technology, China. His main research interests include data centers, cloud computing and distributed system.

Deke Guo received the BS degree in industry engineering from the Beijing University of Aeronautics and Astronautics, China in 2001, and the PhD degree in management science and engineering from the National University of Defense Technology, China in 2008. He is currently a professor with the College of Systems Engineering, National University of Defense Technology, China. His research interests include distributed systems, software-defined networking, data center networking, wireless and mobile systems, and interconnection networks. He is a member of the ACM.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, J., Yao, C., Xia, J. et al. Guaranteeing the response deadline for general aggregation trees. Front. Comput. Sci. 14, 146504 (2020). https://doi.org/10.1007/s11704-019-8437-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-019-8437-1

Keywords

Navigation