Core-Edge design of storage area networks—A Single-edge formulation with problem-specific cuts

https://doi.org/10.1016/j.cor.2009.07.005Get rights and content

Abstract

Optimization and heuristic methods for the design of medium to large storage area networks (SANs) are in the early stages of development, but are required if large clustered storage systems are to become a viable alternative to expensive monolithic storage. We present here a new mixed-integer formulation for optimal design of a storage area network. Our formulation models the Single-edge Core-Edge topology. Using a testbed of medium to large problems, we compare the solution times for our new formulation to the current benchmark in the literature—our formulation solves in significantly less time with an off-the-shelf optimization software package. We also generate problem-specific cuts to further reduce the solution time for our formulation. An algorithm, which includes an integer programming subproblem, is described for generating some of these cuts. For all test problems, the cuts yield a further reduction in the solution time.

Introduction

It is standard practice in large organizations to centralize data storage. If managed carefully, this results in lower congestion on the local area network and improved data security. Data flow between the storage devices and the client servers (hosts) travels exclusively across a dedicated storage area network (SAN). The importance of this network's layout is well understood: “The implementation of a storage area network allows system and storage administrators to ensure the consistent storage and retrieval of data on a network. The multipath nature of an SAN, with its characteristic one-to-many relationships between host and storage devices, provides unmatched configuration flexibility and availability, as well as the load-balancing and increased connectivity essential to the creation of a scalable network.” [1].

The design of SANs is still performed manually by consultants. For small size problems (less than 10 hosts and 10 devices) this approach may result in acceptable designs, but for medium to large networks the complexity of the problem makes the manual generation of good designs very difficult. Typically storage architects simplify this task by utilizing expensive high capacity storage devices. The advent of clustered file systems makes the deployment of large networks of low-cost storage devices an attractive alternative to these monolithic systems, with a major barrier to general adoption of this technology being the complexity of the network design problem. Our aim is to provide a solution to this problem using optimization techniques. The Core-Edge topology has been shown to perform better (in terms of cost) than other popular reference topologies such as Multi-Tree and Mesh-Rooted Tree1 for networks with 50–200 hosts and 50–200 devices [3]. We expect the scale of most clustered storage systems in industry would be of this order, and so we focus on the Core-Edge topology here. In smaller networks the Core-Edge topology is still competitive, but the Mesh-Rooted Tree topology tends to be the least expensive [3] and is the subject of future optimization research. Note, Molero et al. [2] provide comprehensive testing of the price/performance ratio for a number of the most commonly used topologies, and it is from these results that the topologies were chosen for testing in [3].

The problem of designing an SAN fits into the broader research area of network design. When applied to computer networks the network design problem typically involves the solution of a facility location problem, and the solution of a second subproblem to determine a backbone network to support a set of a priori flow demands [4]. For surveys of network design in the areas of telecommunications, transportation, and computer systems see [5], [6], [7], [8].

Determining an optimal SAN design for a priori flow demands is known to be NP-hard, but good heuristics can produce designs in a matter of minutes that are significantly better than manual designs, at least in terms of cost [9]. However, the performance of the irregular topologies produced by these heuristics is not understood, and so is a barrier to their implementation. Furthermore, it is generally not possible to obtain the a priori flow data necessary to use this approach, so a design with full host–device connectivity is often the goal. We refer, here, to this problem as the SAN design problem (SANDP). Progress has been made to generate solutions to the SANDP based on the Core-Edge topology, using heuristic [3], [10] and optimization [11] techniques. This problem is referred to in [11] as the Core-Edge SAN design problem (CESANDP). Our focus here is a reduction in the solution time for obtaining the optimal solution to the CESANDP by presenting a new formulation and developing problem specific cuts. Longer term this will enable the optimization technique to be embedded within an interactive storage management application.

The Core-Edge topology consists of two or more disjoint cores of high-bandwidth switches connected to one or two layers of lower-bandwidth edge switches. The edge switches connect to the hosts and devices in a Single-edge (one layer) or Double-edge (one layer for hosts, another for devices) design. Under the Core-Edge topology, no host–device pair is separated by more than three intermediate nodes (hops). Determining a design with ζ(>1) node-disjoint paths between every host–device pair is simplified by the disjoint nature of the cores. Furthermore, with the addition of switches to the edges or the core, an existing design can be scaled to accommodate additional hosts or devices. A Single-edge Core-Edge design is shown in Fig. 1, with the hosts and devices on the left, a layer of edge switches in the middle, and two core switches on the right.

In this paper we present a new formulation for the Single-edge CESANDP, which is more general than anything previously published, accommodating non-homogeneous hosts and devices, with the ability to ensure more than two node-disjoint paths for each host–device pair. We outline problem-specific cuts to reduce the solution time of our formulation, and test the performance of our formulation for medium to large designs using off-the-shelf optimization software. We compare the solution times for our optimization techniques to those presented in [11]. The tests we present utilize Fibre Channel components, but our optimization techniques are independent of technology—in our research we have built a working test SAN from commodity components connected in a Core-Edge topology that uses iSCSI over ethernet. Of course, if one wishes to mix technologies it would be necessary to include additional constraints to ban illegal connections, but this is straightforward to do.

The rest of this paper is set out as follows. In Section 2 we introduce notation and present the problem description. Our new formulation is explained in Section 3. Cuts for our new formulation are outlined in Section 4. In Section 5 we detail the testing of our formulation, and summarize the results of these tests. Finally, in Section 6 we give some concluding remarks.

Section snippets

The Core-Edge SAN design problem description

In this section we introduce notation and present the problem description. The problem description is a restriction to the Single-edge CESANDP described in [11], although it can easily be extended to the Double-Edge case.

The CESANDP formulation

In this section we formulate the (Single-edge) CESANDP.

Problem specific cuts

The Core-Edge topology has a very rigid structure. By this we mean that once a few key decisions are made (such as the edge and core switch cardinalities and types, the core link type, and the number of links between each edge and core switch) a significant proportion of the design is determined. It is possible to take advantage of this structure in the solution process by defining problem specific cuts, which cut off large sub-optimal regions of the solution space. In this section we present

Results

In this section we present the results of testing our new Core-Edge design formulation. All tests were run on the same dual 2.16 GHz core laptop with 2 Gb RAM, using AMPL and CPLEX 9.0.

Conclusions

We have presented a new formulation for the CESANDP, which models the Single-edge version of the Core-Edge topology. Our formulation is more general than what is currently available in the literature, allowing non-homogeneous hosts and devices and more than two disjoint paths between every host–device pair.

We have also outlined the generation of problem-specific cuts. With these cuts our new formulation solves significantly faster than the current benchmark in the literature. Although these

References (11)

There are more references available in the full text version of this article.

Cited by (4)

  • Simulation-based performance analysis of the ALICE mass storage system

    2016, International Journal of Simulation Modelling
  • Designing data storage tier using Integer Programing

    2012, Proceedings of the ACM Symposium on Applied Computing
  • Measurement for improving the design of commodity archival storage tiers

    2011, Proceedings - 2011 4th IEEE International Conference on Utility and Cloud Computing, UCC 2011
View full text