A mixed-integer approach to Core-Edge design of storage area networks

doi:10.1016/j.cor.2005.11.009

Computers & Operations Research

Volume 34, Issue 10, October 2007, Pages 2976-3000

https://doi.org/10.1016/j.cor.2005.11.009 Get rights and content

Abstract

In this paper we address the problem of optimal network design for a storage area network. We consider the Core-Edge reference topology and present two formulations for the Core-Edge storage area network design problem. One formulation excludes explicit host/device connections to the edge (as is common in currently available heuristics), the other includes these connections to allow the modeling of multiple disjoint paths between hosts and devices. These formulations include generic component types to reduce the number of constraints and variables, with the properties of these components being determined as part of the solution process. The size of the formulation is further reduced by a preprocessing method that removes suboptimal switches and links from consideration.

We test our formulations on a randomly generated set of problems, all of which are of a size consistent with those encountered in industry. We generate solutions using our two formulations for all test problems in good time.

Finally we apply a relaxation of one of our formulations to re-configure the Cecil back-end network, which is currently used across the University of Auckland. We present two designs for the re-configured network to significantly increase reliability and scalability.

Introduction

The centralization of data storage is now standard practice in most large organizations. Apart from the economies of scale inherent in managing all one's data in a central location, the separation of the data storage devices from the local area network has the advantage of reducing this network's congestion and facilitates data security. The mechanism for connecting the data storage devices to the local area network is the storage area network (SAN). The design of the SAN determines the network's ability to be reprovisioned for future growth, as well as impacting directly on the speed and reliability of data retrieval in the system. Thus it is of major importance.

The SAN design problem (SANDP) involves the selection and configuration of links, hubs and switches (the standard components in computer networks) to allow data to flow between hosts (servers) and storage devices. The objective for the designer is to construct the minimum cost SAN that meets the performance criteria of the user (these usually being adequate speed and reliability). Ideally a solution to this problem would include not only a network configuration, but also the flow paths along which the data would travel. Finding such a solution requires the origin, destination and bandwidth information for all network traffic a priori. In general these flow requirements are not known, so a common design goal is a network with full host–device connectivity that can be easily scaled to support increases in flow bandwidth. To avoid network failure due to the failure of a single component, it is also desirable to include at least two disjoint paths between every host–device pair in the design. This feature has the secondary advantage of allowing for more even load balancing.

A typical SAN in industry could connect 50 hosts to 100 devices. The complexity in a design for such a SAN makes it extremely difficult for technicians to determine which is the better of two solutions. For this reason technicians are extremely resistant to deviations from a set of reference topologies. One of the most popular reference topologies is the Core-Edge topology, because of its inherent reliability and scalability. The Core-Edge topology consists of two disjoint cores (each comprising of one or more high-bandwidth core switches) connected to one or two layers of lower-bandwidth edge switches. The edge switches connect to the hosts and devices in a Single-edge or Double-edge design, as shown in Fig. 1 (note that the Core-Edge topology includes no hubs). Data are routed from a host through an edge switch to one of the core switches and onto the appropriate device via a second edge switch, unless the host and device are connected to a common edge switch, in which case the data need not pass through the core. This means that no host–device pair is separated by more than three intermediate connections. Determining a design with two disjoint paths between every host–device pair is simplified by the disjoint nature of the cores in the Core-Edge topology. Furthermore, with the addition of switches to the edges or the core, an existing design can be scaled to accommodate additional hosts or devices (often without taking the existing network off-line).

The SANDP fits into the broader area of network design. If we restrict our discussion to computer networks, the network design problem (NDP) typically reduces to two subproblems. The first of these is a facility location problem (referred to as the node location problem¹—see for example the survey papers by ReVelle and Eiselt [1] and Drenzer and Hamacher [2]). This subproblem involves the design of an access network to connect a set of access nodes to a subset of transit node locations. In the second subproblem the transit nodes lying adjacent to access nodes in the access network are themselves connected via a backbone network. It is assumed that a set of flow demands between access nodes is known a priori, and a path for each commodity needs to be assigned through the connected network [3]. A number of papers [4], [5], [6], [7] present surveys of network design across the areas of telecommunication, facility location, network design, computer systems and transportation.

Klincewicz [6] classifies types of network design problem by topology, surveying the literature in each category. Networks of star/star² [8], [9], [10], [11] and fully interconnected/star [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22] topologies appear to be the best understood (and most commonly used). Networks with star/star topology require that an access node be connected to a central site through a set number of transit nodes in a tree structure, as shown in Fig. 2. The inclusion of this central site means that the star/star network design problem can be modeled as two node location problems: one problem is to create the access network; the second is to connect the appropriate transit nodes to the central site [23], [24].

Networks with fully interconnected/star topology require the access nodes to be connected via a fully interconnected core network of transit nodes, as shown in Fig. 3. A star topology is again created between the access nodes and transit nodes. Limited examples with a non-fixed partial mesh topology can also be found, such as Gavish [25].

The SANDP is known to be NP-hard: if we ban flow-splitting (so that each flow demand must follow one path) it generalizes the non-bifurcated network loading problem [25], [26], otherwise it is equivalent to multicommodity network design [27]. It combines elements of degree-constrained network design with node costs and node capacities. Literature referring directly to the SANDP is extremely limited. Ward et al. [28] present two algorithms for designing SANs and refer to a model (without formulation) that includes all possible components in the network and uses mixed-integer programming (MIP) to select the optimal design. O’Sullivan et al. [29] present the formulation referred to by Ward et al., and outline a preprocessing method for significantly reducing the size of this formulation. O’Sullivan and Walker [30] present a MIP formulation for SAN design that uses generic network components and assigns capabilities to them as part of the decision process. This formulation, together with a modified version of the preprocessing method outlined by O’Sullivan et al., results in a further significant reduction in the number of variables and constraints. All these models assume a priori knowledge of the network traffic.

By removing the requirement of assigning flow paths a priori, the Core-Edge SAN design problem (CESANDP) is a simplification of the SANDP, but remains NP-hard. This problem involves the design of a SAN with full host–device connectivity, conforming to the Core-Edge topology. The only direct reference to the CESANDP in the literature known to the authors is Strand [31]. Strand outlines an extension of the storage area network toolkit (SANTK), a software application created by the Fibre Channel Group at the University of Minnesota. Strand's extension generates Core-Edge designs to meet the user's design criteria, but does not do so optimally.

We present MIP formulations for four types of CESANDPs. Our formulations incorporate generic component types and a preprocessing method to significantly reduce the number of variables and constraints. We test our formulations on a set of randomly generated problems, all of which are of a size consistent with those standard in industry. We then adapt one of our formulations to re-configure the design for a network that is currently in use across the University of Auckland.

We introduce our notation and give the problem description in Section 2. We present our formulations for the Double-edge CESANDP in Sections 3 and 4, respectively (omitting the Single-edge instances of these problems as their formulations are so similar). In Section 5 we describe our test problems and present solutions and performance results. We describe the Cecil back-end network and give our designs for the re-configured network in Section 6. Finally in Section 7 we make our concluding remarks.

Section snippets

The Core-Edge SAN design problem description

In this section we introduce our notation and present the problem description. We outline the use of technology tables to reduce the size of the model formulation and present the constraints that connect the table entries to the components in our design. Finally we describe the four types of Core-Edge design that we will generate.

The Unconnected CESANDP formulation

In this section we formulate the Unconnected CESANDP, which does not include the hosts and devices in the design of the SAN, but rather allocates sufficient free edge ports to allow their connection as a subsequent step. Due to the similar nature of their formulations, we will omit the case of the Single-edge Unconnected CESANDP and present only the Double-edge Unconnected CESANDP here.

The Connected CESANDP formulation

In this section we formulate the Connected CESANDP, which includes the hosts and devices (and all incident links) in the design of the SAN. Due to the similar nature of their formulations, we will omit the case of the Single-edge Unconnected CESANDP and present only the Double-edge Unconnected CESANDP here. Furthermore, to avoid repetition we will only outline the constraints in this formulation different to those in the Unconnected CESANDP (excluding differences in the link and switch sets).

Results

In this section we present results detailing the solution performance of CPLEX 8.1 when solving the formulations presented in Sections 3 and 4. For the purposes of price comparison we have included the cost of connecting the hosts and devices (as cheaply as possible) in the Unconnected CESANDPs.

A real-world application: reprovisioning the Cecil back-end

Cecil is a web-based interactive tool developed for the University of Auckland as a gateway between the university staff and students. It is aimed at assisting learning through student self-assessment, as well as providing communication and information between academic staff and their students [34]. The reliability of the underlying Cecil network is essential to the University of Auckland for maintaining a constant line of communication between each party.

As a practical application of the

Conclusions

We have described MIP formulations for four types of Core-Edge SAN design problems, presenting the full formulation for the Connected and Unconnected Double-edge CESANDP, but omitting the corresponding Single-edge formulations for brevity. Our formulations incorporate generic component types and a preprocessing method, which utilizes a priori technology tables, to significantly reduce the number of variables and constraints. Our formulations solve in good time for a set of randomly generated

References (34)

C.S. ReVelle et al.
Location analysis: a synthesis and survey
European Journal of Operational Research
(2005)
J.G. Klincewicz
Hub location in backbone/tributary network design: a review
Location Science
(1998)
M.E. O’Kelly et al.
The hub network design problem: a review and synthesis
Journal of Transport Geography
(1994)
H. Pirkul et al.
Topological design of centralized computer networks
International Transactions in Operations Research
(1997)
S. Abdinnour-Helm
A hybrid heuristic for the uncapacitated hub location problem
European Journal of Operations Research
(1998)
S.H. Chung et al.
Optimal design of a distributed network with two-level hierarchical structure
European Journal of Operational Research
(1992)
J.G. Klincewicz
A dual algorithm for the uncapacitated hub location problem
Location Science
(1996)
V.J.M.F. Filho et al.
A tabu search heuristic for the concentrator location problem
Location Science
(1998)
B. Gavish
Topological design of computer communication networks—the overall design problem
European Journal of Operational Research
(1992)
Z. Drezner et al.
Facility location: application and theory
(2002)

M. Pióro et al.

Routing flow, and capacity design in communication and computer networks

(2004)

D.L. Bryan et al.

Hub-and-spoke networks in air transportation: an analytical review

Journal of Regional Science

(1999)

J.F. Campbell

A survey of network hub location

Studies of Locational Analysis

(1994)

C.C. Lo et al.

A two-phase algorithm and performance bounds for the star–star concentrator location problem

IEEE Transactions on Communications

(1989)

H. Pirkul et al.

Locating concentrators for primary and secondary coverage in a computer communications network

IEEE Transactions on Communication

(1988)

D.T. Tang et al.

Optimization of teleprocessing networks with concentrators and multiconnected terminals

IEEE Transactions on Computers

(1978)

R.R. Boorstyn et al.

Large-scale network topological optimization

IEEE Transactions on Communication

(1977)

Cited by (6)

An effective self-adaptive bi-directional multi-layer genetic algorithm for resource and user allocations in networks
2023, Expert Systems with Applications
Transportation, telecommunication, electricity supply, and production–distribution systems all involve the construction and management of highly sophisticated networks. Without properly designing such networks, none of these systems work effectively. Network design problems, as the core of network-based decision making systems, can be used to model the technological design of networks. They are widespread and have many industrial applications. In telecommunication, usually an infrastructure is sought, as a combination of possible links and switches, which can best maintain a given level of data traffic between a number of servers and devices. This paper presents a solution to a problem in which in every period of time, the specification of some of the nodes in network may need to be altered. In the current procedure, all arcs and nodes are fixed and only the node parameters need to change. However, the presented solution strategy, because of its multi-layered characteristic, can be easily extended to the cases where arcs and nodes are also subject to change.
The solution procedure has been tested on the cloud-based Internet Café problem. In this problem, different types of video games can use different amount of RAM, CPU, GPU, and video-RAM resources, with demand for each type of game varying dynamically in a specified period, e.g., over a day. Different servers provide these resources to dumb-terminals which act essentially as input–output devices, including display monitors, for games running on the servers. Each server needs to provide the necessary resources for the games currently played on all terminals served. In this way, depending on the demand of each terminal, at each period, the terminal’s capability can change dynamically, and it will act as a different virtual machine in different periods. Hence, the requirements of the Internet Café infrastructure change dynamically over time. The solution procedure has been tested on a series of generated large instances, and the results are very promising. For a total of 9 cases tested, the optimal solution is obtained, in a matter of minutes, for 3 cases. For an additional 2 cases, the solution obtained is within less than 2% of the optimal solution. The results are also reported for 3 additional test cases for a problem with over 13,000 users and 4,000 seats, demonstrating the procedure capability in handling extra-large problems.
Core-Edge design of storage area networks-A Single-edge formulation with problem-specific cuts
2010, Computers and Operations Research
Citation Excerpt :
We refer, here, to this problem as the SAN design problem (SANDP). Progress has been made to generate solutions to the SANDP based on the Core-Edge topology, using heuristic [3,10] and optimization [11] techniques. This problem is referred to in [11] as the Core-Edge SAN design problem (CESANDP).
Optimization and heuristic methods for the design of medium to large storage area networks (SANs) are in the early stages of development, but are required if large clustered storage systems are to become a viable alternative to expensive monolithic storage. We present here a new mixed-integer formulation for optimal design of a storage area network. Our formulation models the Single-edge Core-Edge topology. Using a testbed of medium to large problems, we compare the solution times for our new formulation to the current benchmark in the literature—our formulation solves in significantly less time with an off-the-shelf optimization software package. We also generate problem-specific cuts to further reduce the solution time for our formulation. An algorithm, which includes an integer programming subproblem, is described for generating some of these cuts. For all test problems, the cuts yield a further reduction in the solution time.
Designing data storage tier using Integer Programing
2012, Proceedings of the ACM Symposium on Applied Computing
Measurement for improving the design of commodity archival storage tiers
2011, Proceedings - 2011 4th IEEE International Conference on Utility and Cloud Computing, UCC 2011
Towards SLA-based optimal workload distribution in SANs
2008, NOMS 2008 - IEEE/IFIP Network Operations and Management Symposium: Pervasive Management for Ubiquitous Networks and Services
SLA-based SAN design
2008, Proceedings - IEEE INFOCOM

View full text

A mixed-integer approach to Core-Edge design of storage area networks

Abstract

Introduction

Section snippets

The Core-Edge SAN design problem description

The Unconnected CESANDP formulation

The Connected CESANDP formulation

Results

A real-world application: reprovisioning the Cecil back-end

Conclusions

European Journal of Operational Research

Location Science

Journal of Transport Geography

International Transactions in Operations Research

European Journal of Operations Research

European Journal of Operational Research

Location Science

Location Science

European Journal of Operational Research

Facility location: application and theory

Routing flow, and capacity design in communication and computer networks

Hub-and-spoke networks in air transportation: an analytical review

Journal of Regional Science

A survey of network hub location

Studies of Locational Analysis

A two-phase algorithm and performance bounds for the star–star concentrator location problem

IEEE Transactions on Communications

Locating concentrators for primary and secondary coverage in a computer communications network

IEEE Transactions on Communication

Optimization of teleprocessing networks with concentrators and multiconnected terminals

IEEE Transactions on Computers

Large-scale network topological optimization

IEEE Transactions on Communication