Elsevier

Computers & Industrial Engineering

Volume 99, September 2016, Pages 260-268
Computers & Industrial Engineering

Optimal network flow: A predictive analytics perspective on the fixed-charge network flow problem

https://doi.org/10.1016/j.cie.2016.07.030Get rights and content

Highlights

  • A predicative model is investigated to determine whether or not arcs are selected in an optimal solution of a FCNF problem.

  • The accuracy of the predictive mode is very high.

  • The model has useful explanatory power regarding the predictors defined.

  • Component importance measure is developed to rank the arcs in the network.

Abstract

The fixed charge network flow (FCNF) problem is a classical NP-hard combinatorial problem with wide spread applications. To the best of our knowledge, this is the first paper that employs a statistical learning technique to analyze and quantify the effect of various network characteristics relating to the optimal solution of the FCNF problem. In particular, we create a probabilistic classifier based on 18 network related variables to produce a quantitative measure that an arc in the network will have a non-zero flow in an optimal solution. The predictive model achieves 85% cross-validated accuracy. An application employing the predictive model is presented from the perspective of identifying critical network components based on the likelihood of an arc being used in an optimal solution.

Introduction

The fixed charge network flow problem (FCNF) can be easily described as follows. For a given network, each node may have a supply or demand commodity requirement and each incident arc have variable and/or fixed costs associated with commodity flow. The aim of the FCNF is to select the arcs and assign feasible flow to them in order to transfer commodities from supply nodes to demand nodes at a minimal total cost. The transportation problem (Balinski, 1961, El-Sherbiny and Alhamali, 2013), lot sizing problem (Steinberg & Napier, 1980), facility location problem (Aikens, 1985, Daskin, 1995), network design problem (Costa, 2005, Ghamlouche et al., 2003, Lederer and Nambimadom, 1998) and others (Armacost et al., 2002, Jarvis et al., 1978) can be modeled as a FCNF.

The FCNF problem is known to be NP-hard (Guisewite & Pardalos, 1990). A significant amount of effort has been invested to study and develop efficient approaches to the FCNF. Many techniques commonly utilize branch and bound to search for an exact solution to the FCNF (Barr et al., 1981, Cabot and Erenguc, 1984, Driebeek, 1966, Hewitt et al., 2010, Kennington and Unger, 1976, Ortega and Wolsey, 2003, Palekar et al., 1990). Branch and bound however may be inefficient due to lacking tight bounds during the linear relaxation step. Heuristic approaches to find the near-optimal solution of the FCNF have generated considerable research interest (Adlakha and Kowalski, 2010, Antony Arokia Durai Raj, 2012, Balinski, 1961, Kim and Pardalos, 1999, Molla-Alizadeh-Zavardehi et al., 2011, Monteiro et al., 2011, Sun et al., 1998). State-of-the-art MIP solvers combine a variety of cutting plane techniques, heuristics and the branch and bound algorithm to find the global optimal solution. Modern MIP solvers use preprocessing methods to reduce the search space by taking information from the original formulations, which significantly accelerate the solving processes (Bixby, Fenelon, Gu, Rothberg, & Wunderling, 2000). In this paper, we take a decidedly different approach to leveraging information from the problem formulation and FCNF instances. That is, we are interested in gaining information about how the various topological and component characteristics relate to the selection of arcs used to transmit the optimal flow. At this time, we have found no literature that approaches a study of the FCNF problem from the perspective of statistical learning.

FCNF formulations are useful in many practical problems. Modern societies are heavily dependent on distributed systems, e.g. communication networks (Cohen, Erez, Ben-Avraham, & Havlin, 2000), electric power transmission networks (Dobson, Carreras, Lynch, & Newman, 2007), and transportation networks (Zheng, Gao, & Zhao, 2007). Designing and maintaining such systems is an important research area in network science. In particular, developing resilient network infrastructures (i.e., resilient with respect to natural disasters or intentional attacks) is of utmost importance and the ability to identify critical components in complex networks has reached a level of national urgency (Birchmeier, 2007). The destruction or damage of one or more critical components in a networked system could have significant consequences in terms of overall system performance (Bell, 2000, Smith et al., 2003). The definition of component criticality is often associated with an overall network performance metric. A component whose hypothetical failure most impacts the network performance level is identified as critical. A substantial body of work using a variety of methods has focused on identifying critical components within networks, e.g. topological approach (Bompard et al., 2009, Crucitti et al., 2005), simulation (Eusgeld, Kröger, Sansavini, Schläpfer, & Zio, 2009), optimization (Bier et al., 2007, Shen et al., 2012, Zio et al., 2012), service measure (Dheenadayalu et al., 2004, Scott et al., 2006) and graph theory (Demšar, Špatenková, & Virrantaus, 2008). In this study we consider an application of our statistical model with respect to identifying critical components wherein the minimum total commodity routing cost, inclusive of fixed costs, is the overall network performance metric.

To the best of our knowledge no existing work has developed models to help characterize predictive network features of optimal solutions to the FCNF. More broadly, little work has been published so far in the application of statistical learning to traditional optimization or network problems. Rocco and Muselli, 2004, Rocco and Muselli, 2005 developed a decision tree and a hamming clustering model to predict network connectivity reliability in graphs. Hamming clustering is applicable only if both the predicted value and all predictors are binary (Muselli & Liberati, 2002). The binary predictions relating to connectivity were made based on a single type of predictor – the status of each arc in the graph as either failed or operating. Based on this information they attempted to evaluate the reliability of origin-destination connectedness. Empirically they create one network instance (11 nodes, 21 edges) and randomly sample from the possible state space of edge failures. Among the possible 221 states, 2000 were assigned to a training set and 1000 assigned to a test set. The models were developed on the 2000 training observations and highly accurate predictions were observed on the test set. While the predictive models developed were highly accurate, they are inherently linked to the single network instance considered.

In this study we employ a statistical learning technique to analyze the data associated with optimal FCNF solutions and we develop a relatively generalizable model based on several salient network features to predict which arcs will be used in an optimal solution. By solving thousands of generated FCNF instances we collect over 60,000 observations and develop a logistic regression model based on the dataset. This model allows us to quantify the influence of several important network characteristics. The resulting model has several potential applications. In this study, we demonstrate an application for providing an alternative approach to identifying critical network components. The remainder of this paper is organized as follows. Section 2 introduces the background of the FCNF and the logistic regression model. The process for developing the predictive model is discussed in Section 3. The identification of critical components using the model is presented in Section 4. Section 5 summarizes the results and introduces planned future work.

Section snippets

Fixed charge network flow problem

The fixed charge network flow (FCNF) problem is described on a network G=(N,A), where N and A are the set of nodes and arcs, respectively. Let cij and fij denote the variable and fixed cost of arc (i,j)A, respectively. Each node iN has a commodity requirement ri associated with it (if it is a supply node, ri>0; if a demand node, ri<0; if a transshipment node, ri=0). An arc parameter Mij is used in the problem formulation to ensure that the fixed cost fij is incurred whenever there is a

Feature engineering

Feature engineering is a term from machine learning used to denote the process of determining and/or deriving predictor variables used in model. Based on initial testing we derive four types of predictors for the classification model: overall network level characteristics, arc specific attributes, linear relaxation based variables, and lastly, variables related to the nodes incident to an arc. These predictors are developed with a basic guiding principle of being easily understandable and

Critical components identification

The logistic regression model successfully discriminates between “optimal” and “non-optimal” arcs in FCNF solutions. In this section we develop and demonstrate an application of such information for critical network component identification. A component importance measure (CIM) is often computed to rank nodes or arcs in terms of their potential impact on a network performance measure. The performance measure we use is the FCNF optimal objective value. Since the FCNF problem is NP-hard,

Conclusions

In this investigation we develop a predictive model to determine whether or not arcs are selected for flow in an optimal solution of a FCNF problem. To do so, we generate and solve over 1000 FCNF instances. The final model, based on 18 derived network related features, allows for high quality discrimination of “optimal” and “non-optimal” arcs. Application to larger FCNF instances retain the predictive performance.

Since we employ a logistic regression technique, the model also has useful

References (57)

  • D. Kim et al.

    A solution approach to the fixed charge network flow problem using a dynamic slope scaling procedure

    Operations Research Letters

    (1999)
  • S. Molla-Alizadeh-Zavardehi et al.

    Solving a capacitated fixed-charge transportation problem by artificial immune and genetic algorithms with a Prüfer number representation

    Expert Systems with Applications

    (2011)
  • D. Scott et al.

    Network robustness index: A new method for identifying critical links and evaluating the performance of transportation networks

    Journal of Transport Geography

    (2006)
  • S. Shen et al.

    Exact interdiction models and algorithms for disconnecting networks via node deletions

    Discrete Optimization

    (2012)
  • M. Sun et al.

    A tabu search heuristic procedure for the fixed charge transportation problem

    European Journal of Operational Research

    (1998)
  • E. Zio et al.

    Identifying groups of critical edges in a realistic electrical network by multi-objective genetic algorithms

    Reliability Engineering & System Safety

    (2012)
  • V. Adlakha et al.

    A heuristic algorithm for the fixed charge problem

    Opsearch

    (2010)
  • H. Akaike

    A new look at the statistical model identification

    IEEE Transactions on Automatic Control

    (1974)
  • C. Armacost et al.

    Composite variable formulations for express shipment service network design

    Transportation Science

    (2002)
  • M. Balinski

    Fixed-cost transportation problems

    Naval Research Logistics Quarterly

    (1961)
  • R. Barr et al.

    A new optimization method for large scale fixed charge transportation problems

    Operations Research

    (1981)
  • J. Birchmeier

    Systematic assessment of the degree of criticality of infrastructures

  • E. Bixby et al.

    MIP: Theory and practice closing the gap

  • A. Cabot et al.

    Some branch-and-bound procedures for fixed-cost transportation problems

    Naval Research Logistics Quarterly

    (1984)
  • N. Chawla et al.

    Editorial: Special issue on learning from imbalanced data sets

    ACM Sigkdd Explorations Newsletter

    (2004)
  • R. Cohen et al.

    Resilience of the internet to random breakdowns

    Physical Review Letters

    (2000)
  • P. Crucitti et al.

    Locating critical lines in high-voltage electrical power grids

    Fluctuation and Noise Letters

    (2005)
  • E. Danna et al.

    Exploring relaxation induced neighborhoods to improve MIP solutions

    Mathematical Programming

    (2005)
  • View full text