skip to main content
survey

Survey: Live Migration and Disaster Recovery over Long-Distance Networks

Published: 19 July 2016 Publication History

Abstract

We study the virtual machine live migration (LM) and disaster recovery (DR) from a networking perspective, considering long-distance networks, for example, between data centers. These networks are usually constrained by limited available bandwidth, increased latency and congestion, or high cost of use when dedicated network resources are used, while their exact characteristics cannot be controlled. LM and DR present several challenges due to the large amounts of data that need to be transferred over long-distance networks, which increase with the number of migrated or protected resources. In this context, our work presents the way LM and DR are currently being performed and their operation in long-distance networking environments, discussing related issues and bottlenecks and surveying other works. We also present the way networks are evolving today and the new technologies and protocols (e.g., software-defined networking, or SDN, and flexible optical networks) that can be used to boost the efficiency of LM and DR over long distances. Traffic redirection in a long-distance environment is also an important part of the whole equation, since it directly affects the transparency of LM and DR. Related works and solutions both from academia and the industry are presented.

References

[1]
R. Ahmad, A. Gani, S. Hamid, M. Shiraz, A. Yousafzai, and F. Xia. 2015. A survey on virtual machine migration and server consolidation frameworks for cloud data centers. Journal of Network and Computer Applications 52, 11--25.
[2]
S. Akoush, R. Sohan, B. Roman, A. Rice, and A. Hopper. 2011. Activity based sector synchronisation: Efficient transfer of disk-state for wan live migration. MASCOTS. 22--31.
[3]
Alcatel Lucent. 2013. Bell Labs, The Cloud-Optimized MAN and WAN: Leveraging a Multi-Layer SDN Framework to Deliver Scalable and Agile Cloud Services.
[4]
O. Alhazmi and Y. Malaiya. 2013. Evaluating disaster recovery plans using the cloud. IEEE Reliability and Maintainability Symposium.
[5]
S. Al-Kiswany, D. Subhraveti, P. Sarkar, and M. Ripeanu. 2011. VMFlock: Virtual machine co-migration for the cloud. International ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC’11).
[6]
Amazon. 2011. Summary of the Amazon EC2 and Amazon rds service disruption in the us east region. http://aws.amazon.com/message/65648/.
[7]
Amazon. 2015. EC2 Instances. https://aws.amazon.com/ec2/instance-types./ Retrieved November 2015.
[8]
Amazon. 2016. Route 53. http://aws.amazon.com/route53/.
[9]
Amazon. 2016. Using Amazon Web Services for Disaster Recovery. http://media.amazonwebservices.com/ AWS_Disaster_Recovery.pdf.
[10]
A. Anand, V. Sekar, and A. Akella. 2009. SmartRE: An architecture for coordinated network-wide redundancy elimination. SIGCOMM.
[11]
A. Asensio and L. Velasco. Managing transfer-based datacenter connections. Journal of Optical Communications and Networking 6, 7, 660--669.
[12]
A. Asensio, M. Ruiz, and L. Velasco. 2015. Orchestrating connectivity services to support elastic operations in datacenter federations. Photonic Network Communications. 1--16.
[13]
F. Balus, D. Stiliadis, and N. Bitar. 2012. Federated SDN-based Controllers for NVO3. www.ietf.org/proceedings/86/slides/slides-86-nvo3-7.pdf.
[14]
J. Barrera, M. Ruiz, and L. Velasco. 2015. Orchestrating virtual machine migrations in telecom clouds. In Proceedings of IEEE/OSA Optical Fiber Communication Conference (OFC’15).
[15]
A. Bianco, J. Finochietto, L. Giraudo, M. Modesti, and F. Neri. 2008. Network planning for disaster recovery. IEEE Workshop in Local and Metropolitan Area Networks. 43--48.
[16]
S. Bose, S. Brock, R. Skeoch, and S. Rao. 2011. Cloud spider: Combining replication with scheduling for optimizing live migration of virtual machines across wide area networks. IEEE CCGRID.
[17]
B. Boughzala, R. Ben Ali, M. Lemay, Y. Lemieux, and O. Cherkaoui. 2011. OpenFlow supporting inter-domain virtual machine migration. International Conference on Wireless and Optical Communications Networks.
[18]
R. Bradford, E. Kotsovinos, A. Feldmann, and H. Schioberg. 2007. Live wide-area migration of virtual machines including local persistent state. International Conference on Virtual Execution Environments (VEE’07). 169--179.
[19]
T. C. Bressoud and F. B. Schneider. 1996. Hypervisor based fault tolerance. ACM Transactions on Computer Systems (TOCS). 14, 1, 80--107.
[20]
BT. 2015. Hourly Network Summary. http://ippm.bt.net./ Retrieved November 2015.
[21]
M. Casado, T. Koponen, R. Ramanathan, and S. Shenker. 2010. Virtualizing the network forwarding plane. ACM SIGCOMM Workshop on Programmable Routers for Extensible Services of Tomorrow.
[22]
W. Cerroni. 2015. Network performance of multiple virtual machine live migration in cloud federations. Journal of Internet Services and Applications. 6, 1, 1--20.
[23]
R. Chakravorty, S. Katti, J. Crowcroft, and I. Pratt. 2003. Flow aggregation for enhanced TCP over wide-area wireless. INFOCOM. 1754--1764.
[24]
X. Chen, S. Chen, F. Tseng, L. Chou, and H. Chao. 2013. Minimizing virtual machine migration probability for cloud environments. HPCC.
[25]
K. Christodoulopoulos, I. Tomkos, and E. A. Varvarigos. 2011. Elastic bandwidth allocation in flexible OFDM-based optical networks. Journal of Lightwave Technology. 29, 9, 1354--1366.
[26]
Cisco. 2006. InfiniBand SDR, DDR, and QDR Technology Guide
[27]
Cisco. 2015. Cisco Visual Networking Index: Forecast and Methodology, 2014-2019. http://www.cisco.com/ c/en/us/solutions/collateral/service-provider/ip-ngn-ip-next-generation-network/white_paper_c11-4813 60.html. Retrieved November 2015.
[28]
Cloudping. 2015. http://www.cloudping.info./ Retrieved November 2015.
[29]
CloudFlare. 2011. A Brief Primer on Anycast. http://blog.cloudflare.com/a-brief-anycast-primer.
[30]
CloudFlare. 2013. Load Balancing Without Load Balancers.
[31]
Gluster. 2015. Managing GlusterFS Geo-replication. http://www.gluster.org/community/documentation/index. php/Gluster_3.2:_Managing_GlusterFS_Geo-replication. Retrieved September 2015.
[32]
C. Clark, K. Fraser, S. Hand, and J. G. Hansen. 2005. Live migration of virtual machines. Network System Design and Implementation.
[33]
Contrail EU project. 2014. Overview of the Contrail System, Components and Usage.
[34]
T. Costello. 2012. Business continuity: Beyond disaster recovery. Journal IT Professional. 14, 5.
[35]
R. Couto, S. Secci, M. Campista, and L. Costa. 2014. Network design requirements for disaster resilience in IaaS clouds. IEEE Communications Magazine. 52, 10, 52--58.
[36]
R. Couto, S. Secci, M. Campista, and L. Costa. 2015. Server placement with shared backups for disaster-resilient clouds. Computer Networks.
[37]
B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield. 2008. Remus: High availability via asynchronous virtual machine replication. USENIX Symposium on Networked Systems Design and Implementation. 161--174.
[38]
D. Darsena, G. Gelli, A. Manzalini, F. Melito, and F. Verde. 2013. Live migration of virtual machines among edge networks via WAN links. IEEE Future Network and Mobile Summit (FutureNetworkSummit’13).
[39]
B. Davie and J. Gross. 2014 April. Stateless transport tunneling protocol for network virtualization (STT). Draft-Davie-Stt-06 (Work in Progress).
[40]
U. Deshpande, U. Kulkarni, and K. Gopalan. 2012. Inter-rack live migration of multiple virtual machines. International Workshop on Virtualization Technologies in Distributed Computing (VTDC’12).
[41]
U. Deshpande, X. Wang, and K. Gopalan. 2011. Live gang migration of virtual machines. International ACM Symposium on High Performance Parallel and Distributed Computing (HPDC’11).
[42]
Y. Dong, W. Ye, Y. Jiang, I. Pratt, S. Ma, J. Li, and H. Guan. 2013. COLO: COarse-grained LOck-stepping virtual machines for non-stop service. Symposium on Cloud Computing (SOCC’13).
[43]
D. Erickson, G. Gibb, B. Heller, D. Underhill, J. Naous, G. Appenzeller, G. Parulkar, N. McKeown, M. Rosenblum, M. Lam, S. Kumar, V. Alaria, P. Monclus, F. Bonomi, J. Tourrilhes, P. Yalagandula, S. Banerjee, C. Clark, and R. McGeer. 2008. Demo: A demonstration of virtual machine mobility in an openflow network. ACM SIGCOMM.
[44]
Ericsson Review. 2015. IP-optical convergence: A complete solution. https://www.ericsson.com/res/thecompany/docs/publications/ericsson_review/2014/er-ip-optical-convergence.pdf. Retrieved October 2015.
[45]
EVault Cloud Disaster Recovery. 2014. http://www.seagate.com/files/www-content/services-software/cloud-resiliency-services/_shared/masters/docs/wp-cloud-disaster-recovery-ready-for-midmarket-2014-09-0019-w-us.pdf.
[46]
Facebook Hits New Peak Of 1 Billion Users On A Single Day. 2015 http://techcrunch.com/2015/08/27/ facebook-hits-1-billion-users-in-a-single-day/. Retrieved October 2015.
[47]
T. C. Ferreto, M. A. S. Netto, R. N. Calheiros, and C. A. F. De Rose. 2011. Server consolidation with migration control for virtualized data centers. Future Generation Computer Systems.
[48]
F5 Network and VMware. 2011. Enabling Long Distance Live Migration with F5 and VMware vMotion.
[49]
Forbes. 2015. The big bang: How the cloud is changing resilience in the expanding universe of digital data. http://www.forbes.com/forbesinsights/ibm_big_bang/index.html. Retrieved October 2015.
[50]
A. Ganguly, A. Agrawal, P. Boykin, and R. Figueiredo. 2006. WOW: self-organizing wide area overlay networks of virtual workstations, International Symposium on High-Performance Distributed Computing.
[51]
Gartner. 2015. Magic Quadrant Disaster Recovery as a Service. https://www.gartner.com/doc/3033519/magic-quadrant-disaster-recovery-service. Retrieved September 2015.
[52]
Gartner. 2015. Magic Quadrant for WAN Optimization. https://www.gartner.com/doc/3008618/magic-quadrant-wan-optimization. Retrieved September 2015.
[53]
Gartner. 2015. Magic Quadrant for x86 Server Virtualization Infrastructure. https://www.gartner.com/doc/ 3093222/magic-quadrant-x-server-virtualization. Retrieved September 2015
[54]
Gartner. 2015. Magic Quadrant for Enterprise Backup Software and Integrated Appliances. https://www.gartner.com/doc/3074822/magic-quadrant-enterprise-backup-software. Retrieved September 2015.
[55]
B. Gerofi and Y. Ishikawa. 2011. Workload adaptive checkpoint scheduling of virtual machine replication. Pacific Rim International Symposium on Dependable Computing (PRDC’11). 204--213.
[56]
O. Gerstel, M. Jinno, A. Lord, and S. J. B. Yoo. 2012. Elastic optical networking: A new dawn for the optical layer? IEEE Communication Magazine. 50, 2, 12--20.
[57]
S. Ghorbani, C. Schlesinger, M. Monaco, E. Keller, M. Caesar, J. Rexford, and D. Walker. 2014. Transparent, live migration of a software-defined network. ACM Symposium on Cloud Computing. 1--14.
[58]
V. Gramoli, G. Jourjon, and O. Mehani. 2014. Can SDN mitigate disasters? arXiv:1410.4296.
[59]
V. Gramoli, G. Jourjon, and O. Mehani. 2015. Disaster-tolerant storage with SDN. International Conference on Networked Systems.
[60]
F. Hao, T. Lakshman, S. Mukherjee, and H. Song. 2009. Enhancing dynamic cloud-based services using network virtualization. ACM Workshop on Virtualized Infrastructure Systems and Architectures. 37--44.
[61]
E. Harney, S. Goasguen, J. Martin, M. Murphy, and M. Westall. 2007. The efficacy of live virtual machine migrations over the internet. International Workshop on Virtualization Technology in Distributed Computing (VTDC’07), 1--7.
[62]
M. R. Hines, U. Deshpande, and K. Gopalan. 2009. Post-copy live migration of virtual machines. ACM SIGOPS Operating Systems Review. 43, 3.
[63]
T. Hirofuchi, H. Nakada, S. Itoh, and S. Sekiguchi. 2012. Kagemusha: A guest-transparent mobile IPv6 mechanism for wide-area live VM migration. IEEE Network Operations and Management Symposium (NOMS’12). 1319--1326.
[64]
T. Hirofuchi, H. Ogawa, H. Nakada, S. Itoh, and S. Sekiguchi. 2009. A live storage migration mechanism over WAN for relocatable virtual machine services over clouds. IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid’09). 460--465.
[65]
T. Hirofuchi, M. Tsugawa, H. Nakada, and T. Kudoh. 2012. A wan-optimized live storage migration mechanism toward virtual machine evacuation upon severe disasters. IEICE Transactions on Information and Systems 96, 12, 2663--2674.
[66]
K. Hou, K. G. Shin, Y. Turner, and S. Singhal. 2013. Tradeoffs in compressing virtual machine checkpoints. International Workshop on Virtualization Technologies in Distributed Computing (VTDC’13). 41--48.
[67]
W. Huang, Q. Gao, J. Liu, and D. K. Panda. 2007. High performance virtual machine migration with RDMA over modern interconnects. IEEE International Conference on Cluster Computing (CLUSTER’07).
[68]
Huawei. 2014. Huawei Grandly Launches Active-Active Data Center Disaster Recovery Solution. http://pr.huawei.com/en/news/hw-371633-recoverysolution.htm#.VHeShDGUcYM. Retrieved November 2016.
[69]
Infinera. 2015. http://www.infinera.com./ Retrieved September 2015.
[70]
Infonetics. 2015. IHS Forecasts Huge Growth for 100 Gigabit Optical Ports as Operators Increase Network Capacity. http://www.infonetics.com/pr/2015/100G-Coherent-Optical-Ports-Highlights.asp. Retrieved September 2015.
[71]
Infonetics. 2015. Carriers on Track to Spend $5.7B on SDN Hardware, Software and Services by 2019. http://www.infonetics.com/pr/2015/Carrier-SDN-Market-Forecast.asp.
[72]
InfiniBand Trade Association. 2016. InfiniBand Architecture Specification. http://www.infinibandta.org/.
[73]
IEEE802, Data Center Bridging. 2013. http://www.ieee802.org/1/pages/dcbridges.html.
[74]
IETF. 2016. Locator/ID Separation Protocol (lisp). http://datatracker.ietf.org/wg/lisp/charter/.
[75]
IBM. 2014. High availability vs. fault tolerance. http://www-01.ibm.com/support/knowledgecenter/SSPHQG_ 6.1.0/com.ibm.hacmp.concepts/ha_concepts_fault.htm. Retrieved November 2014.
[76]
Infonetics. 2015. http://www.infonetics.com/pr/2014/Cloud-Services-IT-Market-Highlights.asp. Retrieved October 2015.
[77]
ISO/IEC 27031:2011. 2011. http://www.iso.org/iso/catalogue_detail?csnumber=44374.
[78]
ISO 22301:2012. 2012. http://www.iso.org/iso/catalogue_detail.htm?csnumber=50038.
[79]
ISO 22313:2012. 2012. http://www.iso.org/iso/catalogue_detail?csnumber=50050.
[80]
A. Izaddoost and S. Heydari. 2014. Enhancing network service survivability in large-scale failure scenarios. Journal of Communications and Networks 16, 5, 534--547.
[81]
X. Jiang and D. Xu. 2004. VIOLIN: Virtual internetworking on overlay infrastructure. ISPA. 937--946.
[82]
H. Jin, L. Deng, S. Wu, X. Shi, and X. Pan. 2009. Live virtual machine migration with adaptive memory compression. IEEE International Conference on Cluster Computing.
[83]
U. Kalim, M. Gardner, E. Brown, and W. Feng. 2013. Seamless migration of virtual machines across networks. IEEE Computer Communications and Networks (ICCCN’13).
[84]
T. S. Kang, M. Tsugawa, A. Matsunaga, T. Hirofuchi, and J. A. Fortes. 2014. Design and implementation of middleware for cloud disaster recovery via virtual machine migration management. IEEE/ACM 7th International Conference on Utility and Cloud Computing. 166--175.
[85]
T. Kang, M. Tsugawa, J. Fortes, and T. Hirofuchi. 2013. Reducing the migration times of multiple VMs on WANs using a feedback controller. IEEE International Parallel and Distributed Processing Symposium Workshops & PhD Forum (IPDPSW’’13). 1480--1489.
[86]
D. Kapil, E. Pilli, and R. Joshi. 2013. Live virtual machine migration techniques: Survey and research challenges. IEEE International Advance Computing Conference (IACC’13). 963--969.
[87]
A. Khoshkholghi, A. Abdullah, R. Latip, S. Subramaniam, and M. Othman. 2014. Disaster recovery in cloud computing: A survey. Computer and Information Science. 7, 4.
[88]
S. Kihara and S. Moriai. 2008. Kemari: Virtual machine synchronization for fault tolerance. USENIX Annual Technical Conference.
[89]
J. Kim, D. Chae, J. Kim, and J. Kim. 2013. Guide-copy: Fast and silent migration of virtual machine for datacenters. International Conference on High Performance Computing, Networking, Storage and Analysis.
[90]
KVM. 2016. http://www.linux-kvm.org/page/Main_Page.
[91]
H. Lai, Y. Wu, and Y. Cheng. 2013. Exploiting neigborhood similarity for virtual machine migration over wide-area network. IEEE International Conference on Software Security and Reliability (SERE’13). 149--158.
[92]
A. Lenk and S. Tai. 2014. Cloud standby: disaster recovery of distributed systems in the cloud. In Service-Oriented and Cloud Computing. 32--46.
[93]
L. Lewin-Eytan, K. Barabash, R. Cohen, V. Jain, and A. Levin. 2012. Designing modular overlay solutions for network virtualization. IBM Technical Paper.
[94]
Q. Li, J. Huai, J. Li, Tianyu Wo, and Minxiong Wen. 2008. HyperMIP: Hypervisor controlled mobile IP for virtual machine live migration across networks. 11th IEEE High Assurance Systems Engineering Symposium. 80--88.
[95]
H. Liu and B. He. 2015. VMbuddies: Coordinating live migration of multi-tier applications in cloud environments. IEEE Transactions on Parallel and Distributed Systems. 26, 4.
[96]
H. Liu, H Jin, X. Liao, C. Yu, and C. Xu. 2011. Live virtual machine migration via asynchronous replication and state synchronization. IEEE Transactions on Parallel and Distributed Systems. 22, 12, 1986--1999.
[97]
J. Liu, Y. Li, and D. Jin. 2014. SDN-based live VM migration across datacenters. ACM SIGCOMM. 583--584.
[98]
Alcatel Lucent and Bell Labs. 2013. Metro Network Traffic Growth: An Architecture Impact Study.
[99]
D. Malanik and R. Jaek. 2014. The performance of the data-cluster based on the CEPH platform with geographically separated nodes. IEEE International Conference Mathematics and Computers in Sciences and in Industry (MCSI’14). 299--307.
[100]
T. Malleswari, D. Malathi, and G. Vadivu. 2015. Deduplication techniques: A technical survey. International Journal for Innovative Research in Science and Technology. 1, 7, 318--325.
[101]
U. Mandal, M. Habib, S. Zhang, P. Chowdhury, M. Tornatore, and B. Mukherjee. 2014. Heterogeneous bandwidth provisioning for virtual machine migration over SDN-enabled optical networks. IEEE Optical Fiber Communications Conference and Exhibition (OFC’14).
[102]
V. Mann et al. 2012. Crossroads: Seamless vm mobility across datacenters through software defined networking. IEEE Network Operations and Management Symposium (NOMS’12). 88--96.
[103]
A. J. Mashtizadeh, M. Cai, G. Tarasuk-Levin, R. Koller, T. Garfinkel, and S. Setty. 2014. XvMotion: Unified virtual machine migration over long distance. USENIX Annual Technical Conference.
[104]
F. Mattos, D. Menezez, and O. C. Muniz Bandeira Duarte. 2014. XenFlow: Seamless migration primitive and quality of service for virtual networks. IEEE Global Communications Conference (GLOBECOM’14). 2326--2331.
[105]
A. Mayoral, R. Vilalta, R. Munoz, R. Casellas, and R. Martinez. 2015. Experimental seamless virtual machine migration using an integrated SDN IT and network orchestrator. IEEE Optical Fiber Communications Conference and Exhibition (OFC’15).
[106]
N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner. 2008. Openflow: Enabling innovation in campus networks. SIGCOMM Computer Communication Review. 38, 2, 69--74.
[107]
V. Medina and J. García. 2014. A survey of migration mechanisms of virtual machines. ACM Computing Surveys (CSUR). 46, 3.
[108]
Microsoft. 2015. Azure VMs. https://azure.microsoft.com/en-us/pricing/details/virtual-machines. Retrieved November 2015.
[109]
Microsoft. 2016. Hyper-V. http://www.microsoft.com/en-us/server-cloud/solutions/virtualization.aspx.
[110]
U. F. Minhas, S. Rajagopalan, B. Cully, A. Aboulnaga, K. Salem, and A. Warfield. 2013. RemusDB: Transparent high availability for database systems. International Journal on Very Large Data Bases (VLDB). 22, 1, 29--45.
[111]
K. Nagin, D. Hadas, Z. Dubitzky, A. Glikson, I. Loy, B. Rochwerger, and L. Schour. 2011. Inter-cloud mobility of virtual machines. Annual International Conference on Systems and Storage (SYSTOR’11).
[112]
I. Nakagawa, K. Ichikawa, T. Kondo, Y. Kitaguchi, H. Kashiwazaki, and S. Shimojo. 2014. Transpacific live migration with wide area distributed storage. IEEE Computer Software and Applications Conference (COMPSAC’14). 486--492.
[113]
NetApp. 2016. SnapMirror. http://www.netapp.com/us/products/protection-software/snapmirror.aspx.
[114]
Nuage Networks. 2015. http://www.nuagenetworks.net./ Retrieved September 2015.
[115]
C. Oberg, A. Whitt, and R. Mills. 2011. Disasters will happen-are you ready? IEEE Communications Magazine. 1, 49, 36--42.
[116]
Kei Ohmura. 2011. Rapid VM Synchronization with I/O Emulation Logging-Replay.
[117]
Open Networking Foundation. 2015. https://www.opennetworking.org. Retrieved November 2015
[118]
Ovirt. 2015. Storage Live Migration. http://www.ovirt.org/Features/Design/StorageLiveMigration. Retrieved November 2015.
[119]
A. Peddemors, R. Spoor, P. Dekkers, and C. den Besten. 2011. Using DRBD over Wide Area Networks.
[120]
P. Pisa, N. Fernandes, H. Carvalho, M. Moreira, M. E. Campista, L. H. Costa, and O. C. Duarte. 2010. OpenFlow and xen-based virtual network migration. Communications: Wireless in Developing Countries and Networks of the Future. 170--181.
[121]
Y. Pu, Y. Deng, and A. Nakao. 2011. Cloud rack: Enhanced virtual topology migration approach with Open vSwitch. International Conference on Information Networking. 160--164.
[122]
P. Raad, G. Colombo, D. Chi, S. Secci, A. Cianfrani, P. Gallard, and G. Pujolle. 2013. Achieving sub-second downtimes in internet-wide virtual machine live migrations in LISP networks. IFIP/IEEE International Symposium on Integrated Network Management. 286--293.
[123]
P. Raad, G. Colombo, D. Phung Chi, S. Secci, A. Cianfrani, P. Gallard, and G. Pujolle. Demonstrating LISP-based virtual machine mobility for cloud networks. IEEE 1st International Conference on Cloud Networking (CLOUDNET’12). 200--202.
[124]
C. Raiciu, D. Niculescu, M. Bagnulo, and M. J. Handley. 2011. Opportunistic mobility with multipath TCP. MobiArch.
[125]
S. Rajagopalan, B. Cully, R. O'Connor, and A. Warfield. 2012. Secondsite: Disaster tolerance as a service. ACM SIGPLAN/SIGOPS Conference on Virtual Execution Environments. 97--108.
[126]
S. Rajagopalan, D. Williams, and H. Jamjoom. 2013. Pico replication: A high availability framework for middleboxes. Annual Symposium on Cloud Computing (SOCC’13).
[127]
K. Ramakrishnan, P. Shenoy, and J. Van der Merwe, Live data center migration across WANs: A robust cooperative context aware approach. ACM SIGCOMM, Workshop on Internet Network Management. 262--267.
[128]
RFC 7348. 2014 August. VxLAN: A framework for overlaying virtualized layer 2 networks over layer 3 networks.
[129]
P. Riteau, C. Morin, and T. Priol. 2011. Shrinker: Improving live migration of virtual clusters over WANs with distributed data deduplication and content-based addressing. European Conference on Parallel Processing.
[130]
A. Sahoo, K. Kant, and P. Mohapatra. 2009. BGP convergence delay under large-scale failures: Characterization and solutions. Computer Communications. 32, 7, 1207--1218.
[131]
P. Samadi, J. Xu, and K. Bergman. 2015. Virtual machine migration over optical circuit switching network in a converged inter/intra data center architecture. Optical Fiber Communication Conference.
[132]
C. P. Sapuntzakis, R. Chandra, B. Pfaff, J. Chow, M. S. Lam, and M. Rosenblum. 2002. Optimizing the migration of virtual computers. USENIX Symposium on Operating Systems Design and Implementation (OSDI’02).
[133]
T. Sarker and M. Tang. 2013. Performance-driven live migration of multiple virtual machines in datacenters. IEEE International Conference on Granular Computing (GrC’13).
[134]
D. J. Scales, M. Nelson, and G. Venkitachalam. 2010. The design of a practical system for fault-tolerant virtual machines. ACM SIGOPS Operating Systems Review. 44, 4, 30--39.
[135]
Serverdensity. 2014. Network performance at AWS, Google, Rackspace and Softlayer. https://blog. serverdensity.com/network-performance-aws-google-rackspace-softlayer. Retrieved November 2015.
[136]
K. Shima, and N Dang. 2012. Indexes for Distributed File/Storage Systems as a Large Scale Virtual Machine Disk Image Storage in a Wide Area Network.
[137]
V. Shrivastava, P. Zerfos, L. Kang-won, H. Jamjoom, L. Yew-Huey, and S. Banerjee. 2011. Application-aware virtual machine migration in data centers. IEEE INFOCOM. 66--70.
[138]
E. Silvera, G. Sharaby, D. Lorenz, and I. Shapira. 2009. IP mobility to support live migration of virtual machines across subnets. SYSTOR.
[139]
Silver-peak. 2015. Silver Peak and VMware vSphere Replication. https://www.silver-peak.com/sites/default/ files/infoctr/silver-peak_ss_vmware-vsphere-replication.pdf. Retrieved September 2015.
[140]
A. Snoeren, D. Andersen, and H. Balakrishnan. 2001. Fine-grained failover using connection migration. Conference on USENIX Symposium on Internet Technologies and Systems (USITS’01).
[141]
Solutions-review. 2015. Backup and Disaster Recovery Buyers Guide. http://solutions-review.com/backup-disaster-recovery/get-a-free-backup-and-disaster-recovery-buyers-guide. Retrieved September 2015.
[142]
M. Sridharan, K. Duda, I. Ganga, A. Greenberg, G. Lin, M. Pearson, P. Thaler, C. Tumuluri, N. Venkataramiah, and Y. Wang. 2013. NVGRE: Network virtualization using generic routing encapsulation. Draft-Sridharan-Virtualization-Nvgre-03.
[143]
T. E. Stern and K. Bala. 1999. Multiwavelength Optical Networks: A Layered Approach. Prentice Hall.
[144]
A. Strunk. 2012. Costs of virtual machine live migration: A survey. IEEE 8th World Congress on Services. 323--329.
[145]
P. Svard, B. Hudzia, J. Tordsson, and E. Elmroth. 2011. Evaluation of delta compression techniques for effcient live migration of large virtual machines. Conference on Virtual Execution Environments.
[146]
P. Svärd, B. Hudzia, S. Walsh, J. Tordsson, and E. Elmroth. 2015. Principles and performance characteristics of algorithms for live VM migration. ACM SIGOPS Operating Systems Review. 49, 1, 142--155.
[147]
Y. Tan, H. Jiang, D. Feng, L. Tian, and Z. Yan. 2011. CABdedupe: A causality-based deduplication performance booster for cloud backup services. IEEE International Parallel & Distributed Processing Symposium (IPDPS’’11). 1266--1277.
[148]
F. Travostino, P. Daspit, L. Gommans, C. Jog, C. de Laat, J. Mambretti, I. Monga, B. van Oudenaarde, S. Raghunath, and P. Yonghui Wang. 2006. Seamless live migration of virtual machines over the long distance. Future Generation Computer Systems. 22, 8, 901--907.
[149]
M. Tsugawa, P. Riteau, A. Matsunaga, and J. Fortes. 2010. User-level virtual networking mechanisms to support virtual machine migration over multiple clouds. IEEE GLOBECOM Workshops. 568--572.
[150]
A Vahdat. 2013. Scale and programmability in google's software defined data center WAN. ACM Symposium on Cloud Computing (SoCC’13).
[151]
Velocloud. 2016. from: http://www.velocloud.com./ Retrieved February 2016.
[152]
VirtualBox. 2016. https://www.virtualbox.org/.
[153]
VMWare. 2016. http://www.vmware.com/.
[154]
VMWare. 2015. Long Distance vMotion requirements in VMware vSphere 6.0. http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd==displayKC&externalId==2106949. Retrieved September 2015.
[155]
VMware. 2015. vMotion. https://www.vmware.com/products/vsphere/features/vmotion. Retrieved November 2015.
[156]
VMware. 2015. VMware pushes the envelope with vSphere 6.0 vMotion. https://blogs.vmware.com/ performance/2015/02/vmware-pushes-envelope-vsphere-6-0-vmotion.html. Retrieved September 2015.
[157]
VMware. 2015. vSphere Replication. http://www.vmware.com/products/vsphere/features/replication. Retrieved September 2015.
[158]
VMware. 2015c. vSphere 6.0 Advantages Over Hyper-V. https://www.vmware.com/files/pdf/vSphere-6.0-Advantages-Over-Hyper-V.pdf. Retrieved September 2015.
[159]
VMWare, VMWare vCenter Site Recovery Manager. 2016. https://www.vmware.com/products/site-recovery-manager.
[160]
VMWare and Cisco. 2009. Virtual Machine Mobility with Vmware VMotion and Cisco Data Center Interconnect Technologies.
[161]
G. Wang, D. G. Andersen, M. Kaminsky, K. Papagiannaki, T. S. E. Ng, M. Kozuch, and M. P. Ryan. 2010. c-Through: Part-time optics in data centers. ACM SIGCOMM. 327--338.
[162]
Y. Wang, E. Keller, B. Biskeborn, J. van der Merwe, and J. Rexford. 2008. Virtual routers on the move: Live router migration as a network management primitive. ACM SIGCOMM Computer Communication Review. 38, 4, 231--242.
[163]
L. Wang, H. Ramasamy, R. Harper, M. Viswanathan, and E. Plattier. 2015. Experiences with building disaster recovery for enterprise-class clouds. Annual IEEE/IFIP International Conference on Dependable Systems and Networks. 231--238.
[164]
L. Wang. 2006. Desigh and implementation of TCPHA. Draft Release. http://dragon.linux-vs.org/∼dragonfly/.
[165]
H. Watanabe, T. Ohigashi, T. Kondo, K. Nishimura, and R. Aibara. 2010. A performance improvement method for the global live migration of virtual machine with IP mobility. International Conference on Mobile Computing and Ubiquitous Networking (ICMU’10).
[166]
T. Wood, E. Cecchet, K. Ramakrishnan, P. Shenoy, J. Van Der Merwe, and A. Venkataramani. 2010. Disaster recovery as a cloud service: Economic benefits & deployment challenges. 2nd USENIX Workshop on Hot Topics in Cloud Computing. 1--7.
[167]
T. Wood, H. Lagar-Cavilla, K. Ramakrishnan, P. Shenoy, and J. Van der Merwe. 2011. PipeCloud: Using causality to overcome speed-of-light delays in cloud-based disaster recovery. SoCC.
[168]
T. Wood, K. Ramakrishnan, P. Shenoy, and J. van der Merwe. 2011. CloudNet: Dynamic pooling of cloud resources by live WAN migration of virtual machines. ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’11).
[169]
Xen. 2016. http://www.xenproject.org/.
[170]
R. Xie, Y. Wen, X. Jia, and H. Xie. 2014. Supporting seamless virtual machine migration via named data networking in cloud data center. IEEE Transactions on Parallel and Distributed Systems.
[171]
K. Ye, X Jiang, R Ma, and F Yan. 2012. VC-migration: Live migration of virtual clusters in the cloud. ACM/IEEE International Conference on Grid Computing (GRID’12).
[172]
Zerto. 2015. http://www.zerto.com. Retrieved September 2015.
[173]
W. Zhang, K. T. Lam, and C. L. Wang. 2014. Adaptive live VM migration over a WAN: Modeling and implementation. IEEE International Conference Cloud Computing (CLOUD’13). 368--375.
[174]
X. Zhang, Z. Huo, J. Ma, and D. Meng. 2010. Exploiting data deduplication to accelerate live virtual machine migration. IEEE International Conference on Cluster Computing. 88--96.
[175]
J. Zheng, T. Eugene Ng, K. Sripanidkulchai, and Z. Liu. 2014. COMMA: Coordinating the migration of multi-tier applications. ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’14).
[176]
J. Zheng, T. Sing Eugene Ng, and K. Sripanidkulchai. 2011. Workload-aware live storage migration for clouds. ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’11).

Cited By

View all
  • (2024)Strengthening Information Relay by Using a Robust IEP Between Cloud Service Suppliers2024 4th International Conference on Technological Advancements in Computational Sciences (ICTACS)10.1109/ICTACS62700.2024.10841195(1882-1887)Online publication date: 13-Nov-2024
  • (2023)A Taxonomy of Live Migration Management in Cloud ComputingACM Computing Surveys10.1145/361535356:3(1-33)Online publication date: 5-Oct-2023
  • (2022)Distributed Cost-Aware Fault-Tolerant Load Balancing in Geo-Distributed Data CentersIEEE Transactions on Green Communications and Networking10.1109/TGCN.2021.31079156:1(472-483)Online publication date: Mar-2022
  • Show More Cited By

Index Terms

  1. Survey: Live Migration and Disaster Recovery over Long-Distance Networks

    Recommendations

    Reviews

    Naga R Narayanaswamy

    Live migration (LM) is dealt with in this paper as a one-time task or function, performed by moving a virtual machine from one physical machine to another, located in the same or a different data center, without interrupting its operation. The paper also introduces readers to disaster recovery (DR) as a set of practices and activities regarding the continuity of operation of the physical and virtual information technology assets of an organization. With more and more companies moving to cloud services, and companies deploying redundant data centers so that businesses can operate 24/7 without failures, the paper is relevant for defining the expectations and needs of data center solutions. The paper defines terms such as RPO (recovery point objective) and RTO (recovery time objective), which are benchmarks for measuring the effectiveness of LM and DR in systems. Because the paper is intended to be a very detailed survey of the landscape comparing companies such as VMWare, Cisco, Netapp, and so on, concepts that optimize the time such as deduplication and compression are surveyed too. There are several research papers that are compared in this paper. Networking terminologies such as BGP multihoming are also surveyed extensively. The paper also surveys many combinations of industry-leading solutions to see how they offer DR options. One example is how Silver Peak solutions can be combined with Netapp's Snapmirror to provide good DR solutions. The paper is targeted at three types of professionals: industry executives and product managers who want to see the landscape and make improvements on their existing products; CIOs and IT professionals interested in what solution will be better for their companies' LM and DR problems; and academics who want to research the existing solutions and come up with completely new paradigms or solutions that will make significant improvements. The paper achieves these objectives by explaining the topic in a very rigid and detailed manner. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Computing Surveys
    ACM Computing Surveys  Volume 49, Issue 2
    June 2017
    747 pages
    ISSN:0360-0300
    EISSN:1557-7341
    DOI:10.1145/2966278
    • Editor:
    • Sartaj Sahni
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 July 2016
    Accepted: 01 April 2016
    Revised: 01 February 2016
    Received: 01 April 2015
    Published in CSUR Volume 49, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Live migration
    2. clouds
    3. disaster recovery
    4. long-distance networks

    Qualifiers

    • Survey
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)44
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Strengthening Information Relay by Using a Robust IEP Between Cloud Service Suppliers2024 4th International Conference on Technological Advancements in Computational Sciences (ICTACS)10.1109/ICTACS62700.2024.10841195(1882-1887)Online publication date: 13-Nov-2024
    • (2023)A Taxonomy of Live Migration Management in Cloud ComputingACM Computing Surveys10.1145/361535356:3(1-33)Online publication date: 5-Oct-2023
    • (2022)Distributed Cost-Aware Fault-Tolerant Load Balancing in Geo-Distributed Data CentersIEEE Transactions on Green Communications and Networking10.1109/TGCN.2021.31079156:1(472-483)Online publication date: Mar-2022
    • (2022)Understanding the Security Implication of Aborting Virtual Machine Live MigrationIEEE Transactions on Cloud Computing10.1109/TCC.2020.298290010:2(1275-1286)Online publication date: 1-Apr-2022
    • (2022)On Information Technology Disaster Recovery and Its Relevance to Business ContinuityProceedings of 2nd International Conference on Smart Computing and Cyber Security10.1007/978-981-16-9480-6_10(90-99)Online publication date: 27-May-2022
    • (2022)Resource Allocation Challenges in the Cloud and Edge ContinuumAdvances in Computing, Informatics, Networking and Cybersecurity10.1007/978-3-030-87049-2_15(443-464)Online publication date: 3-Mar-2022
    • (2021)Disaster resilience of optical networksOptical Switching and Networking10.1016/j.osn.2021.10061942:COnline publication date: 1-Nov-2021
    • (2020)Disaster Recovery Layer for Distributed OpenStack DeploymentsIEEE Transactions on Cloud Computing10.1109/TCC.2017.27455608:1(112-123)Online publication date: 1-Jan-2020
    • (2020)An Effective Remote Data Disaster Recovery Plan for the Space TT&C SystemMachine Learning for Cyber Security10.1007/978-3-030-62460-6_4(31-41)Online publication date: 8-Oct-2020
    • (2019)Pattern-Driven Resource Allocation in Optical NetworksIEEE Transactions on Network and Service Management10.1109/TNSM.2019.291032116:2(489-504)Online publication date: Jun-2019
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media