Asymmetric virtual machine replication for low latency and high available service

Chen, Rong; Chen, Haibo

doi:10.1007/s11432-017-9292-9

Asymmetric virtual machine replication for low latency and high available service

Research Paper
Published: 20 June 2018

Volume 61, article number 092110, (2018)
Cite this article

Science China Information Sciences Aims and scope Submit manuscript

Rong Chen¹ &
Haibo Chen¹

116 Accesses
2 Citations
Explore all metrics

Abstract

Providing fault tolerance support to client-to-server applications is critical in the data center and cloud computing environments. Virtualization provides a direct way of achieving high availability by encapsulating the protected applications into the virtual machine and by periodically checkpointing the entire virtual machine (VM) state to the backup replication. However, existing VM replication solutions suffer from either excessive checkpointing overhead and network latency or unnecessary CPU resources consumption in backup replication. In this study, we exploit the ingredients of output packets and consider that the replication system maintains external consistency if the pre-released packets originate the already synchronized states. Furthermore, we transform the active-active primary and slave VM combination into an active-semiactive one by shrinking the number of active virtual CPUs (vCPUs) in the slave VM. The former optimization mechanism improves the performance in read-mostly client-to-server networked applications, whereas the latter one relieves the problem of double scheduling in the slave host. Therefore, we proposed the COLO++ system which is built over COLO and is a non-stop service solution with coarse-grained lock-stepping VMs for client-to-server systems. The two plus signs represent two of the optimizations. Experimental results using COLO++ implemented on KVM and Linux depict that it achieves nearly native VM performance under read-mostly workloads, as well as lower scheduling overhead in backup replication.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Jiang B, Ravindran B, Kim C. Lightweight live migration for high availability cluster service. In: Proceedings of the 12th International Conference on Stabilization, Safety, and Security of Distributed Systems, New York, 2010. 420–434
Google Scholar
Mullender S. Distributed systems. United States of America: ACM Press, 1993: 12
Google Scholar
Kivity A, Kamay Y, Laor D, et al. Kvm: the Linux virtual machine monitor. In: Proceedings of the Linux Symposium, Ottawa, 2007. 1: 225–230
Google Scholar
Barham P, Dragovic B, Fraser K, et al. Xen and the art of virtualization. In: Proceedings of the ACM SIGOPS Operating Systems Review, New York, 2003. 164–177
Google Scholar
Bressoud T C, Schneider F B. Hypervisor-based fault tolerance. ACM Trans Comput Syst, 1996, 14: 80–107
Article Google Scholar
Cully B, Lefebvre G, Meyer D, et al. Remus: high availability via asynchronous virtual machine replication. In: Proceedings of the 5th USENIX Symposium on Networked Systems Design and Implementation, San Francisco, 2008. 161–174
Google Scholar
Dong Y Z, Ye W, Jiang Y H, et al. Colo: coarse-grained lock-stepping virtual machines for non-stop service. In: Proceedings of the 4th Annual Symposium on Cloud Computing, Santa Clara, 2013. 3
Google Scholar
Clark C, Fraser K, Hand S, et al. Live migration of virtual machines. In: Proceedings of the 2nd Symposium on Networked Systems Design and Implementation. Berkeley: USENIX Association, 2005. 2: 273–286
Google Scholar
Elnozahy E N M, Alvisi L, Wang Y M, et al. A survey of rollback-recovery protocols in message-passing systems. ACM Comput Surv, 2002, 34: 375–408
Article Google Scholar
Friebel T, Biemueller S. How to deal with lock holder preemption. In: Proceedings of Xen Summit North America, Boston, 2008. 164
Google Scholar
Enck W, Gilbert P, Han S, et al. TaintDroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Trans Comput Syst, 2014, 32: 5
Article Google Scholar
Song X, Shi J, Chen H, et al. Schedule processes, not VCPUs. In: Proceedings of the 4th Asia-Pacific Workshop on Systems, New York, 2013. 1
Google Scholar
Cheng L, Rao J, Lau F. vScale: automatic and efficient processor scaling for SMP virtual machines. In: Proceedings of the 11th European Conference on Computer Systems, New York, 2016. 2
Google Scholar
Russell R. Virtio: towards a de-facto standard for virtual I/O devices. ACM SIGOPS Oper Syst Rev, 2008, 42: 95–103
Article Google Scholar
Intel R. Page modification logging for virtual machine monitor white paper. Intel Whitepaper, 2015. https://doi.org/www.intel.com/content/www/us/en/processors/page-modification-logging-vmm-white-paper.html
Google Scholar
Intel R 82576 and 82599 Gigabit Ethernet controller datashee. Intel Whitepaper, 2002. https://doi.org/www.intel.com/content/www/us/en/embedded/products/networking/82599-10-gbe-controller-datasheet.html
Google Scholar
Fitzpatrick B. Distributed caching with memcached. Linux J, 2004, 2004: 5
Google Scholar
Kopytov A. SysBench: a system performance benchmark. https://doi.org/sysbench.sourceforge.net, 2004
Google Scholar
Bienia C, Kumar S, Singh J P, et al. The PARSEC benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, New York, 2008. 72–81
Chapter Google Scholar
Castro M, Liskov B. Practical byzantine fault tolerance and proactive recovery. ACM Trans Comput Syst, 2002, 20: 398–461
Article Google Scholar
Lamport L, Shostak R, Pease M. The Byzantine generals problem. ACM Trans Program Lang Syst, 1982, 4: 382–401
Article MATH Google Scholar
Schneider F B. Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput Surv, 1990, 22: 299–319
Article Google Scholar
Bernick D, Bruckert B, Vigna P D, et al. NonStop/spl reg/advanced architecture. In: Proceedings of the International Conference on Dependable Systems and Networks, Yokohama, 2005. 12–21
Google Scholar
Webber S, Beirne J. The stratus architecture. In: Proceedings of the 21st International Symposium on Fault-Tolerant Computing, Montr´eal, 1991. 79–85
Google Scholar
Jeffery C M, Figueiredo R J O. A flexible approach to improving system reliability with virtual lockstep. IEEE Trans Dependable Secure Comput, 2012, 9: 2–15
Article Google Scholar
Scales D J, Nelson M, Venkitachalam G. The design of a practical system for fault-tolerant virtual machines. ACM SIGOPS Operat Syst Rev, 2010, 44: 30–39
Article Google Scholar
Reiser H P, Kapitza R. Hypervisor-based efficient proactive recovery. In: Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems. Washington: IEEE Computer Society, 2007. 83–92
Google Scholar
Minhas U F, Rajagopalan S, Cully B, et al. RemusDB: transparent high availability for database systems. VLDB J, 2013, 22: 29–45
Article Google Scholar
Lu M, Chiueh T. Fast memory state synchronization for virtualization-based fault tolerance. In: Proceedings of the IEEE/IFIP International Conference on Dependable Systems and Networks, Lisbon, 2009. 534–543
Google Scholar
Zhu J, Dong W, Jiang Z F, et al. Improving the performance of hypervisor-based fault tolerance. In: Proceedings of the International Symposium on Parallel and Distributed Processing, Atlanta, 2010. 1–10
Google Scholar
Huang D, He B, Miao C. A survey of resource management in multi-tier web applications. IEEE Commun Surv Tut, 2014, 16: 1574–1590
Article Google Scholar
Liu H, He B. VMbuddies: coordinating live migration of multi-tier applications in cloud environments. IEEE Trans Parallel Distrib Syst, 2015, 26: 1192–1205
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University, Shanghai, 200240, China
Rong Chen & Haibo Chen

Authors

Rong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Haibo Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haibo Chen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, R., Chen, H. Asymmetric virtual machine replication for low latency and high available service. Sci. China Inf. Sci. 61, 092110 (2018). https://doi.org/10.1007/s11432-017-9292-9

Download citation

Received: 16 July 2017
Revised: 25 September 2017
Accepted: 14 November 2017
Published: 20 June 2018
DOI: https://doi.org/10.1007/s11432-017-9292-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Asymmetric virtual machine replication for low latency and high available service

Abstract

Access this article

Similar content being viewed by others

Serverless Computing: Current Trends and Open Problems

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

Survey on serverless computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Asymmetric virtual machine replication for low latency and high available service

Abstract

Access this article

Similar content being viewed by others

Serverless Computing: Current Trends and Open Problems

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

Survey on serverless computing

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation