skip to main content
10.1145/2405688.2405692acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

Resource management and fault tolerance principles for supporting distributed real-time and embedded systems in the cloud

Published: 03 December 2012 Publication History

Abstract

Cloud computing provides an attractive solution to host enterprise applications due to its cost effectiveness, and its ability to seamlessly adjust to changing application work-loads while providing the desired performance assurances using elastic and dynamic resource management. These benefits, however, do not yet readily carry over to distributed, real-time and embedded (DRE) systems, which are a class of systems that require stringent assurances on quality of service (QoS) properties including timeliness, reliability and security all at once. This doctoral research is investigating the sources of these limitations that make it hard to host DRE systems in the cloud, and developing solutions to overcome them. This paper makes three contributions in this regard. First, it outlines the key challenges that must be resolved in supporting DRE systems in the cloud and surveys related literature. Second, it presents ongoing work that addresses one key challenge stemming from the need for real-time and scalable resource monitoring in the cloud. Third, it outlines our proposed ideas on resolving the remainder of the challenges.

References

[1]
M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee, D. Patterson, A. Rabkin, I. Stoica, and M. Zaharia, "A View of Cloud Computing," Communications of the ACM, vol. 53, no. 4, pp. 50--58, 2010.
[2]
B. Rimal, E. Choi, and I. Lumb, "A taxonomy and survey of cloud computing systems," in INC, IMS and IDC, 2009. NCM'09. Fifth International Joint Conference on. Ieee, 2009, pp. 44--51.
[3]
T. M. Takai, "Cloud Computing Strategy," Department of Defense Office of the Chief Information Officer, Tech. Rep., Jul. 2012. {Online}. Available: http://www.defense.gov/news/DoDCloudComputingStrategy.pdf
[4]
J. Hoffert, D. Schmidt, and A. Gokhale, "Adapting distributed real-time and embedded pub/sub middleware for cloud computing environments," Middleware 2010, pp. 21--41, 2010.
[5]
A. Hakiri, A. Gokhale, D. Schmidt, B. Pascal, J. Hoffert, and G. Thierry, "A sip-based network qos provisioning framework for cloud-hosted dds applications," On the Move to Meaningful Internet Systems: OTM 2011, pp. 507--524, 2011.
[6]
C. Wilson, H. Ballani, T. Karagiannis, and A. Rowstron, "Better never than late: Meeting deadlines in datacenter networks," in Proceedings of the ACM SIGCOMM 2011 conference on SIGCOMM. ACM, 2011, pp. 50--61.
[7]
K. Keahey, I. Foster, T. Freeman, and X. Zhang, "Virtual workspaces: Achieving quality of service and quality of life in the grid," Scientific Programming, vol. 13, no. 4, pp. 265--275, 2005.
[8]
L. Vaquero, L. Rodero-Merino, and R. Buyya, "Dynamically scaling applications in the cloud," ACM SIGCOMM Computer Communication Review, vol. 41, no. 1, pp. 45--52, 2011.
[9]
M. Massie, B. Chun, and D. Culler, "The ganglia distributed monitoring system: design, implementation, and experience," Parallel Computing, vol. 30, no. 7, pp. 817--840, 2004.
[10]
W. Barth, Nagios: System and network monitoring. No Starch Pr, 2008.
[11]
R. Wolski, N. Spring, and J. Hayes, "The network weather service: a distributed resource performance forecasting service for metacomputing," Future Generation Computer Systems, vol. 15, no. 5, pp. 757--768, 1999.
[12]
F. Han, J. Peng, W. Zhang, Q. Li, J. Li, Q. Jiang, and Q. Yuan, "Virtual resource monitoring in cloud computing," Journal of Shanghai University (English Edition), vol. 15, no. 5, pp. 381--385, 2011.
[13]
S. De Chaves, R. Uriarte, and C. Westphall, "Toward an architecture for monitoring private clouds," Communications Magazine, IEEE, vol. 49, no. 12, pp. 130--137, 2011.
[14]
D. Guinard, V. Trifa, and E. Wilde, "A resource oriented architecture for the web of things," in Internet of Things (IOT), 2010. IEEE, 2010, pp. 1--8.
[15]
J. Meng, S. Mei, and Z. Yan, "Restful web services: A solution for distributed data integration," in Computational Intelligence and Software Engineering, 2009. CiSE 2009. International Conference on. IEEE, 2009, pp. 1--4.
[16]
C. Aras, J. Kurose, D. Reeves, and H. Schulzrinne, "Real-time communication in packet-switched networks," Proceedings of the IEEE, vol. 82, no. 1, pp. 122--139, 1994.
[17]
M. Alizadeh, A. Greenberg, D. Maltz, J. Padhye, P. Patel, B. Prabhakar, S. Sengupta, M. Sridharan, C. Faster, and D. Maltz, "Dctcp: Efficient packet transport for the commoditized data center," in Proc. SIGCOMM, 2010.
[18]
B. Vamanan, J. Hasan, and T. Vijaykumar, "Deadline-aware datacenter tcp (d2tcp)," in Proceedings of the ACM SIGCOMM 2012 conference on Applications, technologies, architectures, and protocols for computer communication. ACM, 2012, pp. 115--126.
[19]
M. Lee, A. Krishnakumar, P. Krishnan, N. Singh, and S. Yajnik, "Supporting soft real-time tasks in the xen hypervisor," in ACM SIGPLAN Notices, vol. 45, no. 7. ACM, 2010, pp. 97--108.
[20]
T. Cucinotta, D. Giani, D. Faggioli, and F. Checconi, "Providing performance guarantees to virtual machines using real-time scheduling," in Euro-Par 2010 Parallel Processing Workshops. Springer, 2011, pp. 657--664.
[21]
A. Masrur, S. Drossler, T. Pfeuffer, and S. Chakraborty, "Vm-based real-time services for automotive control applications," in Embedded and Real-Time Computing Systems and Applications (RTCSA), 2010 IEEE 16th International Conference on. IEEE, 2010, pp. 218--223.
[22]
S. Xi, J. Wilson, C. Lu, and C. Gill, "Rt-xen: towards real-time hypervisor scheduling in xen," in Proceedings of the ninth ACM international conference on Embedded software. ACM, 2011, pp. 39--48.
[23]
J. Strosnider, J. Lehoczky, and L. Sha, "The deferrable server algorithm for enhanced aperiodic responsiveness in hard real-time environments," Computers, IEEE Transactions on, vol. 44, no. 1, pp. 73--91, 1995.
[24]
L. Sha, J. Lehoczky, and R. Rajkumar, "Solutions for some practical problems in prioritized preemptive scheduling," in IEEE Real-Time Systems Symposium, 1986, pp. 181--191.
[25]
B. Sprunt, "Aperiodic task scheduling for real-time systems," Ph.D. dissertation, Citeseer, 1990.
[26]
Vmware high availability. {Online}. Available: http://www.vmware.com/products/high-availability/
[27]
D. Petrovic, "Virtual machine replication."
[28]
B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield, "Remus: High availability via asynchronous virtual machine replication," in Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation. USENIX Association, 2008, pp. 161--174.
[29]
Y. Tamura, K. Sato, S. Kihara, and S. Moriai, "Kemari: Virtual machine synchronization for fault tolerance," In USENIX 08 Poster Session, 2008.
[30]
K.-Y. Hou, M. Uysal, A. Merchant, K. G. Shin, and S. Singhal, "Hydravm: Low-cost, transparent high availability for virtual machines," HP Laboratories, Tech. Rep., 2011.
[31]
J. Fontán, T. Vázquez, L. Gonzalez, R. Montero, and I. Llorente, "Opennebula: The open source virtual machine manager for cluster computing," in Open Source Grid and Cluster Software Conference, 2008.
[32]
K. Birman, D. Freedman, Q. Huang, and P. Dowell, "Overcoming cap with consistent soft-state replication," Computer, no. 99, pp. 1--1, 2011.
[33]
J. Balasubramanian, S. Tambe, C. Lu, A. Gokhale, C. Gill, and D. Schmidt, "Adaptive failover for real-time middleware with passive replication," in Real-Time and Embedded Technology and Applications Symposium, 2009. RTAS 2009. 15th IEEE. IEEE, 2009, pp. 118--127.
[34]
J. Balasubramanian, A. Gokhale, A. Dubey, F. Wolf, D. Schmidt, C. Lu, and C. Gill, "Middleware for resource-aware deployment and configuration of fault-tolerant real-time systems," in Real-Time and Embedded Technology and Applications Symposium (RTAS), 2010 16th IEEE. IEEE, 2010, pp. 69--78.

Cited By

View all
  • (2023)Security in Internet of ThingsProtecting User Privacy in Web Search Utilization10.4018/978-1-6684-6914-9.ch011(215-233)Online publication date: 3-Mar-2023
  • (2013)Federated Cloud Security Architecture for Secure and Agile CloudsHigh Performance Cloud Auditing and Applications10.1007/978-1-4614-3296-8_7(169-188)Online publication date: 1-Aug-2013

Index Terms

  1. Resource management and fault tolerance principles for supporting distributed real-time and embedded systems in the cloud

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    MIDDLEWARE '12: Proceedings of the 9th Middleware Doctoral Symposium of the 13th ACM/IFIP/USENIX International Middleware Conference
    December 2012
    52 pages
    ISBN:9781450316118
    DOI:10.1145/2405688
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • Professional
    • USENIX Assoc: USENIX Assoc
    • IFIP

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 03 December 2012

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DRE systems
    2. cloud
    3. monitoring
    4. resource management

    Qualifiers

    • Research-article

    Conference

    Middleware '12
    Sponsor:
    • USENIX Assoc
    Middleware '12: 13th International Middleware Conference
    December 3, 2012
    Quebec, Montreal, Canada

    Acceptance Rates

    Overall Acceptance Rate 203 of 948 submissions, 21%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 25 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Security in Internet of ThingsProtecting User Privacy in Web Search Utilization10.4018/978-1-6684-6914-9.ch011(215-233)Online publication date: 3-Mar-2023
    • (2013)Federated Cloud Security Architecture for Secure and Agile CloudsHigh Performance Cloud Auditing and Applications10.1007/978-1-4614-3296-8_7(169-188)Online publication date: 1-Aug-2013

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media