Is cloud storage ready? Performance comparison of representative IP-based storage systems

https://doi.org/10.1016/j.jss.2018.01.015Get rights and content

Highlights

  • We conduct a study on three storage systems with realistic conditions.

  • We make several interesting observations through a set of experiments.

  • We make several recommendations to practitioners for optimizations.

Abstract

Network based storage systems have traditionally been dominated by Network Attached Storage (NAS) and Storage Area Network (SAN). Cloud based storage systems, including object storage, have gained growing popularity among both private and enterprise users in recent years. Certain enterprises have even considered replacing traditional storage systems with cloud-based systems. Nevertheless, there still lacks a systematic comparative study on the performance of the aforementioned systems to assist such a transition. To fill in this gap, in this paper, we conduct a comprehensive study on the three major network storage systems with realistic network conditions and application behaviours. Specifically, we select one representative from each category for comparison, i.e., Network File System (NFS) from NAS, Internet Small Computer System Interface (iSCSI) from SAN, and OpenStack Swift from cloud storage. As the first study of its kind, we mainly focus on the client-side and take performance as the perspective for comparison. We build a testbed and a suite of micro-benchmarks to study the impact of network complexities and access behaviours on performance. In addition, we employ two widely used macro-benchmarks – PostMark and FileBench – to test the three systems under realistic workloads. Through a set of comprehensive experiments and thorough analysis, we make several key observations. (1) iSCSI excels under good network conditions, e.g., in local area networks (LANs); when network complexities like network delay and packet loss exist, its performance degrades significantly, especially for data-intensive operations. (2) For Internet-like environments, NFS performs poorly, while Swift demonstrates much resilience. (3) Overall, Swift is a viable replacement for NFS in all network scenarios, while it is not ready yet to replace iSCSI for performance-critical environments. (4) System configuration on the client side impacts storage performance significantly and deserves adequate attention. Based on our experimental study, we also make several recommendations to practitioners and pinpoint aspects for system designers to improve each storage system further.

Introduction

Enterprise storage systems have traditionally been dominated by two major technologies, i.e., Network Attached Storage (NAS) (Gibson and Van Meter, 2000) and Storage Area Network (SAN) (Thornburgh and Schoenborn, 2000). Both technologies have been widely deployed in enterprise environment in the past decade (Aiken, Grunwald, Pleszkun, Willeke, 2003, Radkov, Yin, Goyal, Sarkar, Shenoy, 2004), and have proven their performance and reliability by time.

In recent years, cloud computing paradigm has gained significant popularity and starts to replace traditional computing models. As a critical component, the emerging cloud storage (e.g., Amazon S3) provides a highly promising solution to enable a transition from dedicated storage to more platform-independent IP-based storage. By using an Hypertext Transfer Protocol (HTTP) (Fielding, 2000) based Representational State Transfer (RESTful) (Fielding et al., 1999) API, auto scaling, and pay-as-you-go model, cloud storage delivers a set of attractive characteristics for enterprise environment, including elasticity, scalability, cross-platform accessibility, and web integrability. With widely available wireless networks, users may access the same set of data on cloud storage anytime and anywhere conveniently. For this reason, many enterprise IT departments are considering to replace traditional network storage services with private or public cloud-based storage services (Systems, Crump). IETF has also formed a working group recently on Internet-based cloud storage services (Cui et al., 2015b). Nevertheless, without a thorough understanding of such a new storage model in comparison with conventional ones, it is challenging to realize the technology transition efficiently and perform optimizations further. There are two major reasons posing the challenge.

First, in today’s enterprise environment, end users usually rely on wireless networks to gain mobility and flexibility. Unfortunately, such a practice makes it particularly challenging to ensure the quality of storage services - signal strength varies at different locations, and the connection can be on and off intermittently. What is worse, users on mobile often need to access storage services through the unpredictable Internet (e.g., work from home or work on travel). These issues together inevitably introduce significant network complexity, and thus pose strong challenge on the user-perceivable storage performance. Second, although NAS, SAN, and cloud storage are all important in practical environment, there still lacks a systematic study on understanding the three drastically different storage systems in a comparative way. Without a thorough understanding on the benefits, limitations, and implications, it is difficult to make an informed decision. Admittedly, as the three systems are designed with different goals in mind, they are likely to show distinct characteristics under different system environments. We believe none of them is universally perfect that can satisfy all sorts of demands. It is thus important to understand their behaviours and relative strengths and weaknesses.

To address the challenge mentioned above, we conduct an experimental study striving to understand the intrinsic characteristics of NAS, SAN, and cloud storage, and investigate their implications in different scenarios. Admittedly, there are many perspectives that can be compared from and deserve investigation, including performance, scalability, and reliability. Nevertheless, as the first attempt to provide such a study, we primarily focus on the performance perspective in this paper, and leave the other perspectives, e.g., scalability and reliability, for future work. We compare the three systems from the client aspect. Namely, we evaluate the three systems from the end user perceivable performance and explore potential approaches to improve them accordingly.

We selectively choose Network File System (NFS) (Shepler et al., 2003), Internet Small Computer System Interface (iSCSI) (Chadalapaka et al., 2004), and OpenStack Swift (OpenStack Foundation, 2015), as the representative of each technology. In order to provide a fair comparison, we run the experiments on the same hardware setup, integrate the same Ext4 file system for each system, and access them all through standard POSIX APIs. To provide a controlled wireless environment and make the experiments reproducible, we use the Wide Area Network emulator (WANem) (Nambiar et al., 2014) to emulate various network scenarios. We design a set of fine-grained experiments to cover different aspects, including microbenchmarks for file access and metadata operations. To compare the three systems under realistic workloads, we also leverage two widely used macrobenchmark tools, i.e., PostMark and FileBench.

With all the efforts mentioned above, we strive to answer the following questions:

  • Is cloud storage a viable replacement in scenarios that are traditionally dominated by NAS and SAN?

  • Is cloud storage universally better than NAS and SAN? If not, in what scenarios is cloud storage better?

  • How much does network conditions and application behaviours impact the performance of each technology?

Through systematic analysis, we make several important observations that assist in answering the questions:

  • 1.

    We find that under ideal network conditions, SAN is several orders of magnitude better than both NAS and cloud storage, while NAS slightly outperforms cloud storage. Under network conditions similar to the Internet, performance of both NAS and SAN declines fast, while cloud storage remains almost unaffected. Specifically, for data-intensive operations, e.g., large file access, SAN performs as badly as NAS; while for lightweight operations like metadata and small file access, the superiority of SAN from internal bundling schemes makes it still keep the outperformance over both NAS and cloud storage.

  • 2.

    We discover that the performance difference between NAS and cloud storage can largely boil down to implementation details, while their differences from SAN arise from a combination of protocol design and implementation details. Specifically, the intrinsic bundling mechanism from SAN (i.e., protocol design) makes it superior in small file and metadata operations; while the capability of utilizing multiple TCP connections (i.e., implementation details) affects performance significantly, especially under realistic network conditions involving nontrivial network delay and packet loss.

  • 3.

    Moreover, we notice that access behaviours (e.g., Direct vs. Sync I/Os) have a remarkable impact on storage performance, and should be chosen carefully based on the tradeoff between performance and consistency. We also find that host system setup, e.g., I/O schedulers and local file system, plays a significant role on the performance of SAN.

  • 4.

    From performance perspective, we conclude that object-based cloud storage is not suitable to replace SAN in performance critical environment, especially for manipulating small files; nevertheless, it is a potential replacement for NAS in nearly all network conditions, especially in Internet-like environment.

Based on the observations, we discuss important system implications to application designers and practitioners, and present our own ideas on how to further enhance each storage system design. As the first comparative study of its kind, we hope this study can shed light on understanding the intrinsic characteristics and system implications of each solution from the client perspective, and assist in identifying the right position of each technology in today’s storage systems.

Section snippets

Network Attached Storage

Network Attached Storage (NAS) provides access to file systems deployed on remote storage server via a file based interface. The server handles physical organization of data and coordinates concurrent access. The client mounts a volume and integrates the shared namespace into the local file system.

Network File System (NFS) (Shepler et al., 2003) is a representative NAS protocol. It exposes a portion of the server file system to the clients, which access the exported namespace through Remote

Methodology and environment

In this section, we describe the methodology and environment setup for our experiments. The experiments are conducted in a custom testbed illustrated in Fig. 1. The testbed consists of three components: the client machine, the server machine, and the network emulator and switch, which will be introduced in detail in the subsequent sections.

Microbenchmark analysis

In this section, we analyze the performance of the three storage systems in fine granularity through a series of well-designed micro-benchmarks. We first analyze the effect of access unit size and forms of access on the performance. Based on the analysis, we choose a subset of access forms for further investigation. We then study the performance of single file access, which closely imitates the scenario where a small number of files are accessed. Thereafter, we analyze batch operations, which

Metadata operations

Metadata accesses are heavily involved in file system operations. In this section, we study how metadata intensive workloads perform over the three storage systems. We first describe the benchmark design of metadata experiments, and then present results for performance analysis.

Macrobenchmark analysis

After the fine-grained micro-benchmarking experiments, which focus on a specific aspect of the systems, we analyze realistic workloads from a more coarse-grained and whole-system perspective in this section. For that, we use two widely-adopted macro-benchmarks, i.e., PostMark (Köthe, 2014) and FileBench (Filebench, 2014). Both can simulate a wide range of application workloads, combining file access and metadata operations. The metrics used are completion time and Input/Output Operations Per

Apples-to-apples comparison

As stated previously, the three storage systems are drastically different in nature. They are designed with distinct aspects in mind, and accordingly, are suitable for different usage scenarios. Strictly speaking, it is not possible to provide an apples-to-apples comparison among them. Thus, the purpose of this paper is not to provide such a comparison, but rather to analyze the three systems under the same environment to understand their behaviours, characteristics, and relative pros and cons.

Related work

IP-based storage systems, e.g., NAS and SAN, have been introduced for a long time. Recently, cloud storage emerges as an alternative storage solution and demonstrates great potential. Nevertheless, most of the existing studies have focused on analysing a single family of storage system, rather than comparing different families. For example, Aiken et al. (2003) analysed the performance of iSCSI protocol under different configurations; similarly, Xinidis et al. (2005) evaluated the performance of

Conclusions and future work

The emerging cloud storage gains significant popularity in recent years. Certain enterprises have considered replacing conventional network storage systems with cloud storage solutions. Nevertheless, the lack of a thorough study on these systems makes it difficult to make an informed decision. In this paper, we presented a systematic study on the three different storage systems. Through a comprehensive set of experiments, we made several interesting observations regarding the suitable

Acknowledgements

This paper was supported by “the Fundamental Research Funds for the Central Universities” and National Natural Science Foundation of China (Grant No. 61702046).

Zhonghong Ou is an associate professor at the Department of Computer Science, Beijing University of Posts and Telecommunications, China, since January 2016. He obtained his Ph.D. degree from University of Oulu, Finland. From 2010 to 2015, he was a post-doc researcher at Aalto University, Finland. Zhonghong has a wide spectrum of research interests, including virtualization and cloud computing platforms, cloud storage, energy optimization for mobile platforms.

References (30)

  • S. Aiken et al.

    A performance analysis of the iSCSI protocol

    IEEE/NASA (MSST ’03)

    (2003)
  • J. Axboe

    Linux block IO - present and future

    Linux Symposium

    (2004)
  • Barton, M., 2015. CloudFuse....
  • A. Carpen-Amarie et al.

    Evaluating cloud storage services for tightly-coupled applications

    Euro-Par ’12 Workshops

    (2012)
  • M. Chadalapaka et al.

    Internet small computer systems interface (iSCSI)

    IETF RFC 3720

    (2004)
  • Crump, G., 2015. Analyst blog: should enterprises replace NAS with Object storage?...
  • Y. Cui et al.

    A first look at mobile cloud storage services: architecture, experimentation and challenge

    IEEE Netw

    (2015)
  • Y. Cui et al.

    Internet storage sync: problem statement

    IETF Internet

    (2015)
  • Y. Cui et al.

    Quicksync: improving synchronization efficiency for mobile cloud storage services

    IEEE Trans. Mob. Comput.

    (2017)
  • I. Drago et al.

    Benchmarking personal cloud storage

    ACM IMC ’13

    (2013)
  • I. Drago et al.

    Inside Dropbox: understanding personal cloud storage services

    ACM IMC ’12

    (2012)
  • R.T. Fielding

    Architectural Styles and the Design of Network-based Software Architectures

    (2000)
  • R.T. Fielding et al.

    Hypertext Transfer Protocol – HTTP/1.1

    IETF RFC 2616

    (1999)
  • Filebench, 2014. Filebench – file system benchmark. http://sourceforge.net/projects/filebench/. [Online; accessed...
  • G.A. Gibson et al.

    Network attached storage architecture

    Commun. ACM

    (2000)
  • Cited by (0)

    Zhonghong Ou is an associate professor at the Department of Computer Science, Beijing University of Posts and Telecommunications, China, since January 2016. He obtained his Ph.D. degree from University of Oulu, Finland. From 2010 to 2015, he was a post-doc researcher at Aalto University, Finland. Zhonghong has a wide spectrum of research interests, including virtualization and cloud computing platforms, cloud storage, energy optimization for mobile platforms.

    Meina Song is a full professor at School of Computer Science, Beijing University of Posts and Telecommunications (BUPT), China. She obtained her Ph.D. degree from BUPT. Meina has a wide spectrum of research interests, including cloud storage systems, cloud computing platforms, big data platforms, and big data analytics.

    Zhen-Huan Hwang received his M.Sc. in Mobile Computing - Services and Security from Aalto University, Finland in 2014 and his B.Sc. in Computer Science from National Tsing Hua University, Taiwan, Republic of China in 2009. He had been a system administrator with National Taiwan University and is currently a software development engineer with Amadeus S.A.S., France. His research interest includes distributed systems, algorithms, and security.

    Antti Ylä-Jääski received his Ph.D. from ETH Zurich in 1993. He worked with Nokia 1994-2009 with focus on future Internet, mobile networks, and service architectures. He has been a professor for Department of Computer Science, Aalto University since 2004. Antti has supervised over 200 master’s thesis and 16 doctoral dissertations. He has currently four ongoing research projects in the areas of Green ICT, mobile computing, services and service architectures.

    Ren Wang received her Ph.D. degree in Computer Science at UCLA in 2004, where her research areas include network analysis and modeling, and high performance TCP protocol design and evaluation. Currently she is a senior research scientist at Intel Labs working on improving performance and user experience and reducing power consumption for processors, platforms, networks and clouds.

    Yong Cui is a full professor at Tsinghua University, China. Cui has a Ph.D. in computer science from Tsinghua University. He has published 7 IETF RFCs on IPv6 transition technologies, and he co-chairs the IETF Softwire Working Group for IPv6 transition technologies. He serves/served at the editorial boards of IEEE TPDS and IEEE TCC. His research interests include computer network architecture and mobile computing. He is an associate editor for IEEE Transactions on Cloud Computing and IEEE Transactions on Parallel and Distributed Systems.

    Pan Hui received his Ph.D. degree from Computer Laboratory, University of Cambridge. He is currently a faculty member of the Department of Computer Science and Engineering at the Hong Kong University of Science and Technology where he directs the HKUST-DT System and Media Lab. He has founded and chaired several IEEE/ACM conferences/ workshops, and served on the technical program committee of numerous international conferences including IEEE Infocom, SECON, MASS, Globecom, WCNC, and WWW. He is an associate editor for IEEE Transactions on Cloud Computing and IEEE Transactions on Mobile Computing.

    An earlier version of this article was published in Proceedings of the 8th IEEE/ACM International Conference on Utility and Cloud Computing (UCC ’2015), Limassol, Cyprus, December 7-10, 2015.

    View full text