Abstract
Microservers (MSs, ARM-based mobile devices) with built-in sensors and network connectivity have become increasingly pervasive and their computational capabilities continue to be improved. Many works present that the heterogeneous clusters, consist of the low-power MSs and high-performance nodes (x86-based servers), can provide competitive performance and energy efficiency. However, they make simple modifications in existing distributed computing systems for adaptation, which have been proven not to fully exploit the various heterogeneous resources. In this paper, we argue that these heterogeneous clusters also call for flexible and efficient computational resource sharing and scheduling. We then present Aries, a platform to support abstracting, sharing and scheduling the cluster resources, scaling from embedded devices to high performance servers, between multiple distributed computing frameworks (Hadoop, Spark, etc.). In Aries, we propose a two-layer scheduling mechanism to enhance the resource utilization of these heterogeneous clusters. Specifically, the resource abstraction layer in Aries is constructed for overall coordination of resources, which provide computation and energy management. A hybrid resource abstraction approach is designed to manage HS and MS resources in fine and coarse granularity separately in this layer to support efficient resource offer based on “resource slot”. And the task schedule layer supports various sophisticated schedulers of existing distributed frameworks and decides how many resources to offer computing frameworks. Furthermore, Aries adopts a novel strategy to support smart switch in three system models for energy-saving effectiveness. We evaluate Aries by a variety of typical data center workloads and datasets, and the result shows that Aries can achieve more efficient utilization of resources when sharing the heterogeneous cluster among diverse frameworks.
Similar content being viewed by others
Notes
Throughout this paper, we use the term “microserver” (MS) in a broad sense to ARM-based mobile devices and “high performance servers” (HS) to x86-based servers.
References
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation, pp. 15–28. USENIX Association (2012)
Murray, D.G., McSherry, F., Isaacs, R., Isard, M., Barham, P., Abadi, M.: Naiad: a timely dataflow system. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pp. 439–455. ACM, New York (2013)
Low, Y., Bickson, D., Gonzalez, J., Guestrin, C., Kyrola, A., Hellerstein, J.M.: Distributed GraphLab: a framework for machine learning and data mining in the cloud. Proc. VLDB Endow. 5(8), 716–727 (2012)
Honjo, T., Oikawa, K.: Hardware acceleration of Hadoop MapReduce. In: 2013 IEEE International Conference on Big Data, pp. 118–124. IEEE (2013)
Kaewkasi, C., Srisuruk, W.: A study of big data processing constraints on a low-power Hadoop cluster. In: 2014 International Computer Science and Engineering Conference (ICSEC), pp. 267–272. IEEE (2014)
Neshatpour, K., Malik, M., Ghodrat, M.A., Sasan, A., Homayoun, H.: Energy-efficient acceleration of big data analytics applications using FPGAs. In: 2015 IEEE International Conference on Big Data, pp. 115–123. IEEE (2015)
Malik, M., Rafatirah, S., Sasan, A., Homayoun, H.: System and architecture level characterization of big data applications on big and little core server architectures. In: 2015 IEEE International Conference on Big Data (Big Data), pp. 85–94. IEEE (2015)
Scott, J., Bernheim Brush, A.J., Krumm, J., Meyers, B., Hazas, M., Hodges, S., Villar, N.: PreHeat: controlling home heating using occupancy prediction. In: Proceedings of the 13th International Conference on Ubiquitous Computing, pp. 281–290. ACM, New York (2011)
Brush, A.J., Jung, J., Mahajan, R., Martinez, F.: Digital neighborhood watch: investigating the sharing of camera data amongst neighbors. In: Proceedings of the 2013 Conference on Computer Supported Cooperative Work, pp. 693–700. ACM, New York (2013)
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A.D., Katz, R.H., Shenker, S., Stoica, I.: Mesos: a platform for fine-grained resource sharing in the data center. In: NSDI, vol. 11, pp. 295–308 (2011)
Vavilapalli, V.K., Murthy, A.C., Douglas, C., Agarwal, S., Konar, M., Evans, R., Graves, T., Lowe, J., Shah, H., Seth, S., et al.: Apache Hadoop YARN: yet another resource negotiator. In: Proceedings of the 4th Annual Symposium on Cloud Computing, pp. 5:1–5:16. ACM, New York (2013)
Leverich, J., Kozyrakis, C.: On the energy (in) efficiency of Hadoop clusters. ACM SIGOPS Oper. Syst. Rev. 44(1), 61–65 (2010)
Junqueira, F., Reed, B.: ZooKeeper: Distributed Process Coordination. O’Reilly Media, Inc., Sebastopol (2013)
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, pp. 165–178. ACM, New York (2009)
Jung, Y.H., Neill, R., Carloni, L.P.: A broadband embedded computing system for MapReduce utilizing Hadoop. In: 2012 IEEE 4th International Conference on Cloud Computing Technology and Science (CloudCom), pp. 1–9. IEEE (2012)
Beloglazov, A., Buyya, R., Lee, Y.C., Zomaya, A., et al.: A taxonomy and survey of energy-efficient data centers and cloud computing systems. Adv. Comput. 82(2), 47–111 (2011)
Neshatpour, K., Malik, M., Homayoun, H.: Accelerating machine learning kernel in Hadoop using FPGAs. In: 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 1151–1154. IEEE (2015)
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M., Levenberg, J., Monga, R., Moore, S., Murray, D.G., Steiner, B., Tucker, P., Vasudevan, V., Warden, P., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: a system for large-scale machine learning. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), GA, pp. 265–283. USENIX Association (2016)
Bonomi, F., Milito, R., Zhu, J., Addepalli, S.: Fog computing and its role in the internet of things. In: Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing, pp. 13–16. ACM, New York (2012)
Vaquero, L.M., Rodero-Merino, L.: Finding your way in the fog: towards a comprehensive definition of fog computing. ACM SIGCOMM Comput. Commun. Rev. 44(5), 27–32 (2014)
Yi, S., Li, C., Li, Q.: A survey of fog computing: concepts, applications and issues. In: Proceedings of the 2015 Workshop on Mobile Big Data, pp. 37–42. ACM (2015)
Stojmenovic, I., Wen, S., Huang, X., Luan, H.: An overview of fog computing and its security issues. In: Concurrency and Computation: Practice and Experience. Wiley, Chichester (2015)
Dubey, H., Yang, J., Constant, N., Amiri, A.M., Yang, Q., Makodiya, K.: Fog data: enhancing telehealth big data through fog computing. In: Proceedings of the ASE BigData and SocialInformatics 2015, pp. 14:1–14:6. ACM, New York (2015)
Qian, Z., He, Y., Su, C., Wu, Z., Zhu, H., Zhang, T., Zhou, L., Yu, Y., Zhang, Z.: TimeStream: reliable stream computation in the cloud. In: Proceedings of the 8th ACM European Conference on Computer Systems, pp. 1–14. ACM, New York (2013)
Stojmenovic, I.: Fog computing: a cloud to the ground support for smart things and machine-to-machine networks. In: 2014 Australasian Telecommunication Networks and Applications Conference (ATNAC), pp. 117–122. IEEE, Piscataway (2014)
Jonathan, A., Chandra, A., Weissman, J.: Awan: locality-aware resource manager for geo-distributed data-intensive applications. In: 2016 IEEE International Conference on Cloud Engineering (IC2E), pp. 32–41. IEEE (2016)
Chandra, A., Weissman, J., Heintz, B.: Decentralized edge clouds. IEEE Internet Comput. 17(5), 70–73 (2013)
Zheng, X.: Load Sharing in Large-Scale, Heterogeneous Distributed Systems (1992)
Rabkin, A., Arye, M., Sen, S., Pai, V.S., Freedman, M.J.: Aggregation and degradation in jetstream: streaming analytics in the wide area. In: NSDI (2014)
BeagleBone. http://beagleboard.org/bone
Kreps, J., Narkhede, N., Rao, J., et al.: Kafka: a distributed messaging system for log processing. In: Proceedings of the NetDB, pp. 1–7 (2011)
Meng, X., Bradley, J., Yuvaz, B., Sparks, E., Shivaram, V., Liu, D., Freeman, J., Tsai, D., Amde, M., Owen, S., et al.: MLlib: machine learning in Apache Spark. JMLR 17(34), 1–7 (2016)
Gonzalez, J.E., Xin, R.S., Dave, A., Crankshaw, D., Franklin, M.J., Stoica, I.: GraphX: graph processing in a distributed dataflow framework. In: 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pp. 599–613 (2014)
Hull, B., Bychkovsky, V., Zhang, Y., Chen, K., Goraczko, M., Miu, A., Shih, E., Balakrishnan, H., Madden, S.: CarTel: a distributed mobile sensor computing system. In: Proceedings of the 4th International Conference on Embedded Networked Sensor Systems, pp. 125–138. ACM, New York (2006)
Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of things (IoT): a vision, architectural elements, and future directions. Future Gener. Comput. Syst. 29(7), 1645–1660 (2013)
Gupta, T., Singh, R.P., Phanishayee, A., Jung, J., Mahajan, R.: Bolt: data management for connected homes. In: 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14), pp. 243–256 (2014)
Zhang, H., Hao, C., Wu, Y., Li, M.: Macaca: a scalable and energy-efficient platform for coupling cloud computing with distributed embedded computing. In: 2016 IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 1785–1788. IEEE (2016)
Bekkerman, R., Bilenko, M., Langford, J.: Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, New York (2011)
Che, S., Li, J., Sheaffer, J.W., Skadron, K., Lach, J.: Accelerating compute-intensive applications with GPUs and FPGAs. In: Symposium on Application Specific Processors, 2008. SASP 2008, pp. 101–107. IEEE (2008)
Qureshi, A., Weber, R., Balakrishnan, H., Guttag, J., Maggs, B.: Cutting the electric bill for internet-scale systems. ACM SIGCOMM Comput. Commun. Rev. 39(4), 123–134 (2009)
Ghodsi, A., Zaharia, M., Hindman, B., Konwinski, A., Shenker, S., Stoica, I.: Dominant resource fairness: fair allocation of multiple resource types. In: NSDI, vol. 11, p. 24 (2011)
Wikipedia datasets. http://snap.stanford.edu/data/wiki-meta.html
Acknowledgements
This work was financially supported by the Strategic Priority Research Program of the Chinese Academy of Science (No. XDA06010600), as part of the DataOS Project. The authors would like to thank all researchers in DataOS project for useful discussions and suggestions. Also authors thanks anonymous reviewers for their feedbacks.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, H., Hao, C., Wu, Y. et al. Towards a scalable and energy-efficient resource manager for coupling cluster computing with distributed embedded computing. Cluster Comput 20, 3707–3720 (2017). https://doi.org/10.1007/s10586-017-0936-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-017-0936-y