A mixed-integer approach to Core-Edge design of storage area networks

https://doi.org/10.1016/j.cor.2005.11.009Get rights and content

Abstract

In this paper we address the problem of optimal network design for a storage area network. We consider the Core-Edge reference topology and present two formulations for the Core-Edge storage area network design problem. One formulation excludes explicit host/device connections to the edge (as is common in currently available heuristics), the other includes these connections to allow the modeling of multiple disjoint paths between hosts and devices. These formulations include generic component types to reduce the number of constraints and variables, with the properties of these components being determined as part of the solution process. The size of the formulation is further reduced by a preprocessing method that removes suboptimal switches and links from consideration.

We test our formulations on a randomly generated set of problems, all of which are of a size consistent with those encountered in industry. We generate solutions using our two formulations for all test problems in good time.

Finally we apply a relaxation of one of our formulations to re-configure the Cecil back-end network, which is currently used across the University of Auckland. We present two designs for the re-configured network to significantly increase reliability and scalability.

Introduction

The centralization of data storage is now standard practice in most large organizations. Apart from the economies of scale inherent in managing all one's data in a central location, the separation of the data storage devices from the local area network has the advantage of reducing this network's congestion and facilitates data security. The mechanism for connecting the data storage devices to the local area network is the storage area network (SAN). The design of the SAN determines the network's ability to be reprovisioned for future growth, as well as impacting directly on the speed and reliability of data retrieval in the system. Thus it is of major importance.

The SAN design problem (SANDP) involves the selection and configuration of links, hubs and switches (the standard components in computer networks) to allow data to flow between hosts (servers) and storage devices. The objective for the designer is to construct the minimum cost SAN that meets the performance criteria of the user (these usually being adequate speed and reliability). Ideally a solution to this problem would include not only a network configuration, but also the flow paths along which the data would travel. Finding such a solution requires the origin, destination and bandwidth information for all network traffic a priori. In general these flow requirements are not known, so a common design goal is a network with full host–device connectivity that can be easily scaled to support increases in flow bandwidth. To avoid network failure due to the failure of a single component, it is also desirable to include at least two disjoint paths between every host–device pair in the design. This feature has the secondary advantage of allowing for more even load balancing.

A typical SAN in industry could connect 50 hosts to 100 devices. The complexity in a design for such a SAN makes it extremely difficult for technicians to determine which is the better of two solutions. For this reason technicians are extremely resistant to deviations from a set of reference topologies. One of the most popular reference topologies is the Core-Edge topology, because of its inherent reliability and scalability. The Core-Edge topology consists of two disjoint cores (each comprising of one or more high-bandwidth core switches) connected to one or two layers of lower-bandwidth edge switches. The edge switches connect to the hosts and devices in a Single-edge or Double-edge design, as shown in Fig. 1 (note that the Core-Edge topology includes no hubs). Data are routed from a host through an edge switch to one of the core switches and onto the appropriate device via a second edge switch, unless the host and device are connected to a common edge switch, in which case the data need not pass through the core. This means that no host–device pair is separated by more than three intermediate connections. Determining a design with two disjoint paths between every host–device pair is simplified by the disjoint nature of the cores in the Core-Edge topology. Furthermore, with the addition of switches to the edges or the core, an existing design can be scaled to accommodate additional hosts or devices (often without taking the existing network off-line).

The SANDP fits into the broader area of network design. If we restrict our discussion to computer networks, the network design problem (NDP) typically reduces to two subproblems. The first of these is a facility location problem (referred to as the node location problem1—see for example the survey papers by ReVelle and Eiselt [1] and Drenzer and Hamacher [2]). This subproblem involves the design of an access network to connect a set of access nodes to a subset of transit node locations. In the second subproblem the transit nodes lying adjacent to access nodes in the access network are themselves connected via a backbone network. It is assumed that a set of flow demands between access nodes is known a priori, and a path for each commodity needs to be assigned through the connected network [3]. A number of papers [4], [5], [6], [7] present surveys of network design across the areas of telecommunication, facility location, network design, computer systems and transportation.

Klincewicz [6] classifies types of network design problem by topology, surveying the literature in each category. Networks of star/star2 [8], [9], [10], [11] and fully interconnected/star [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22] topologies appear to be the best understood (and most commonly used). Networks with star/star topology require that an access node be connected to a central site through a set number of transit nodes in a tree structure, as shown in Fig. 2. The inclusion of this central site means that the star/star network design problem can be modeled as two node location problems: one problem is to create the access network; the second is to connect the appropriate transit nodes to the central site [23], [24].

Networks with fully interconnected/star topology require the access nodes to be connected via a fully interconnected core network of transit nodes, as shown in Fig. 3. A star topology is again created between the access nodes and transit nodes. Limited examples with a non-fixed partial mesh topology can also be found, such as Gavish [25].

The SANDP is known to be NP-hard: if we ban flow-splitting (so that each flow demand must follow one path) it generalizes the non-bifurcated network loading problem [25], [26], otherwise it is equivalent to multicommodity network design [27]. It combines elements of degree-constrained network design with node costs and node capacities. Literature referring directly to the SANDP is extremely limited. Ward et al. [28] present two algorithms for designing SANs and refer to a model (without formulation) that includes all possible components in the network and uses mixed-integer programming (MIP) to select the optimal design. O’Sullivan et al. [29] present the formulation referred to by Ward et al., and outline a preprocessing method for significantly reducing the size of this formulation. O’Sullivan and Walker [30] present a MIP formulation for SAN design that uses generic network components and assigns capabilities to them as part of the decision process. This formulation, together with a modified version of the preprocessing method outlined by O’Sullivan et al., results in a further significant reduction in the number of variables and constraints. All these models assume a priori knowledge of the network traffic.

By removing the requirement of assigning flow paths a priori, the Core-Edge SAN design problem (CESANDP) is a simplification of the SANDP, but remains NP-hard. This problem involves the design of a SAN with full host–device connectivity, conforming to the Core-Edge topology. The only direct reference to the CESANDP in the literature known to the authors is Strand [31]. Strand outlines an extension of the storage area network toolkit (SANTK), a software application created by the Fibre Channel Group at the University of Minnesota. Strand's extension generates Core-Edge designs to meet the user's design criteria, but does not do so optimally.

We present MIP formulations for four types of CESANDPs. Our formulations incorporate generic component types and a preprocessing method to significantly reduce the number of variables and constraints. We test our formulations on a set of randomly generated problems, all of which are of a size consistent with those standard in industry. We then adapt one of our formulations to re-configure the design for a network that is currently in use across the University of Auckland.

We introduce our notation and give the problem description in Section 2. We present our formulations for the Double-edge CESANDP in Sections 3 and 4, respectively (omitting the Single-edge instances of these problems as their formulations are so similar). In Section 5 we describe our test problems and present solutions and performance results. We describe the Cecil back-end network and give our designs for the re-configured network in Section 6. Finally in Section 7 we make our concluding remarks.

Section snippets

The Core-Edge SAN design problem description

In this section we introduce our notation and present the problem description. We outline the use of technology tables to reduce the size of the model formulation and present the constraints that connect the table entries to the components in our design. Finally we describe the four types of Core-Edge design that we will generate.

The Unconnected CESANDP formulation

In this section we formulate the Unconnected CESANDP, which does not include the hosts and devices in the design of the SAN, but rather allocates sufficient free edge ports to allow their connection as a subsequent step. Due to the similar nature of their formulations, we will omit the case of the Single-edge Unconnected CESANDP and present only the Double-edge Unconnected CESANDP here.

The Connected CESANDP formulation

In this section we formulate the Connected CESANDP, which includes the hosts and devices (and all incident links) in the design of the SAN. Due to the similar nature of their formulations, we will omit the case of the Single-edge Unconnected CESANDP and present only the Double-edge Unconnected CESANDP here. Furthermore, to avoid repetition we will only outline the constraints in this formulation different to those in the Unconnected CESANDP (excluding differences in the link and switch sets).

Results

In this section we present results detailing the solution performance of CPLEX 8.1 when solving the formulations presented in Sections 3 and 4. For the purposes of price comparison we have included the cost of connecting the hosts and devices (as cheaply as possible) in the Unconnected CESANDPs.

A real-world application: reprovisioning the Cecil back-end

Cecil is a web-based interactive tool developed for the University of Auckland as a gateway between the university staff and students. It is aimed at assisting learning through student self-assessment, as well as providing communication and information between academic staff and their students [34]. The reliability of the underlying Cecil network is essential to the University of Auckland for maintaining a constant line of communication between each party.

As a practical application of the

Conclusions

We have described MIP formulations for four types of Core-Edge SAN design problems, presenting the full formulation for the Connected and Unconnected Double-edge CESANDP, but omitting the corresponding Single-edge formulations for brevity. Our formulations incorporate generic component types and a preprocessing method, which utilizes a priori technology tables, to significantly reduce the number of variables and constraints. Our formulations solve in good time for a set of randomly generated

References (34)

  • M. Pióro et al.

    Routing flow, and capacity design in communication and computer networks

    (2004)
  • D.L. Bryan et al.

    Hub-and-spoke networks in air transportation: an analytical review

    Journal of Regional Science

    (1999)
  • J.F. Campbell

    A survey of network hub location

    Studies of Locational Analysis

    (1994)
  • C.C. Lo et al.

    A two-phase algorithm and performance bounds for the star–star concentrator location problem

    IEEE Transactions on Communications

    (1989)
  • H. Pirkul et al.

    Locating concentrators for primary and secondary coverage in a computer communications network

    IEEE Transactions on Communication

    (1988)
  • D.T. Tang et al.

    Optimization of teleprocessing networks with concentrators and multiconnected terminals

    IEEE Transactions on Computers

    (1978)
  • R.R. Boorstyn et al.

    Large-scale network topological optimization

    IEEE Transactions on Communication

    (1977)
  • Cited by (6)

    • Core-Edge design of storage area networks-A Single-edge formulation with problem-specific cuts

      2010, Computers and Operations Research
      Citation Excerpt :

      We refer, here, to this problem as the SAN design problem (SANDP). Progress has been made to generate solutions to the SANDP based on the Core-Edge topology, using heuristic [3,10] and optimization [11] techniques. This problem is referred to in [11] as the Core-Edge SAN design problem (CESANDP).

    • Designing data storage tier using Integer Programing

      2012, Proceedings of the ACM Symposium on Applied Computing
    • Measurement for improving the design of commodity archival storage tiers

      2011, Proceedings - 2011 4th IEEE International Conference on Utility and Cloud Computing, UCC 2011
    • Towards SLA-based optimal workload distribution in SANs

      2008, NOMS 2008 - IEEE/IFIP Network Operations and Management Symposium: Pervasive Management for Ubiquitous Networks and Services
    • SLA-based SAN design

      2008, Proceedings - IEEE INFOCOM
    View full text