Discrete Optimization
On exact solution approaches for the longest induced path problem

https://doi.org/10.1016/j.ejor.2019.04.011Get rights and content

Highlights

  • The longest induced path problem is a challenging optimization problem.

  • We describe three different linear integer programming models.

  • We develop an exact iterative algorithm that exploits these models.

  • We describe a simple randomized heuristic.

  • Computational experiments with real and synthetic data are presented.

Abstract

The graph diameter, which is defined as the length of the longest shortest path in a graph, is often used to quantify graph communication properties. In particular, the graph diameter provides an intuitive measure of the worst-case pairwise distance. However, in many practical settings, where vertices can either fail or be overloaded or can be destroyed by an adversary and thus cannot be used in any communication or transportation path, it is natural to consider other possible measures of the worst-case distance. One such measure is the longest induced path. The longest induced path problem is defined as the problem of finding a subset of vertices of the largest cardinality such that the induced subgraph is a simple path. In contrast to the polynomially computable graph diameter, this problem is NP-hard. In this paper, we focus on exact solution approaches for the problem based on linear integer programming (IP) techniques. We first propose three conceptually different linear IP models and study their basic properties. To improve the performance of the standard IP solvers, we propose an exact iterative algorithm that solves a sequence of smaller IPs to obtain an optimal solution for the original problem. In addition, we develop a heuristic capable of finding induced paths in large networks. Finally, we conduct an extensive computational study to evaluate the performance of the proposed solution methods.

Introduction

Let G=(V,E) be a simple undirected graph with a set of n vertices (nodes) V={1,,n} and a set of m edges E. Many complex systems in a variety of application domains can be effortlessly modeled as graphs (networks), where the vertices serve as structural elements of the system and the edges reflect important pairwise relationships between them. Popular examples of graph models include communication and transportation networks, social networks, power grids, biological networks, economic networks, and food webs (Ahuja, Magnanti, Orlin, 1993, Barabási, et al., 2016, Jackson, 2010, Newman, 2018).

For many of these graph-theoretical models, decision makers are often interested in exploring the communication properties of these networks, which are typically measured using the notion of distance. For any pair of vertices i, j ∈ V, the distance dG(i, j), or simply dij, is the length of the shortest path between vertices i and j in G, and dG(i,j)=+ if vertices i and j are disconnected. For a connected graph G, its diameter is defined asdiam(G)=maxi,jVdG(i,j),i.e., the longest shortest path in the network.

Real-life networks typically have relatively small communication paths and diameters; (see, e.g., Barabási, et al., 2016, Borgatti, Everett, Johnson, 2013, Jackson, 2010, Newman, 2018), as has been observed in a variety of contexts starting with the famous experiments by Stanley Milgram (Travers & Milgram, 1967). Furthermore, in the network analysis literature, one of the most celebrated concepts is the idea of small-world networks, which are defined as graphs with relatively small average pairwise distances and relatively high clustering coefficients (Albert, Barabási, 2002, Watts, Strogatz, 1998).

The graph diameter provides a simple, easily computable (in polynomial time, Ahuja et al., 1993) and intuitive measure for evaluating the worst case for a pairwise distance in a graph, e.g., the longest of the shortest communication paths in the communication network or the longest of the shortest routes in a transportation network. Naturally, one practical extension of the aforementioned worst case is to consider scenarios where some of the graph vertices either fail or are overloaded or are destroyed by an adversary (depending on the application context) and thus cannot be used in any path. That is, some subset of vertices cannot transmit a message in a communication network or cannot serve as a transshipment point in a transportation network, and thus, the alternative shortest paths (detours) must be used. In other words, we aim to identify the worst possible case for the shortest distance between two vertices in the graph given that these vertices remain connected by some path while the rest of the vertices can fail. This extension leads to the problem known as the longest induced path in a graph (Di Giacomo, Liotta, Mchedlidze, 2016, Esperet, Lemoine, Maffray, 2017, Garey, Johnson, 1979, Ishizeki, Otachi, Yamazaki, 2008).

Formally, for any subset of vertices SV, let G[S]=(S,E) denote the subgraph induced by S in G, where E′ contains all edges from E that have both of their endpoints in S. The longest induced path problem is defined as the problem of finding S of the largest cardinality such that the resulting induced subgraph, G[S], is a simple path. In contrast to the polynomially computable graph diameter, the longest induced path problem is known to be NP-hard, and its decision version is NP-complete (Garey & Johnson, 1979).

The longest induced path problem also has several related variations in the literature. The problem of finding the longest induced path in hypercube graphs is known as the snake-in-the-box problem, which has applications in the theory of error-correcting code (Kautz, 1958). Furthermore, for vertices i and j in connected graph G, the detour distance between i and j is defined as the length of the longest path P for which the subgraph induced by the vertices of P is P itself (Chartrand, Johns, & Tian, 1993). Thus, the detour distance between i and j is simply the length of the longest induced path originating at i and ending at j.

Several other interesting interpretations of the longest induced path problem exist. Consider, for example, a social network with information cascades. Assume that a vertex (i.e., an individual in a social network) learns a particular piece of information and then shares the information with all of its neighbors, who thus also learn the information. After learning the information, some of the neighbors might decide to share this information with their neighbors as well (e.g., repost on their personal page on a social media site), which triggers a so-called information cascade. Clearly, if the network is connected, then eventually, all vertices in the network may learn the transmitted information. Next, we observe that an optimal solution of the longest induced path problem provides the longest possible path of information transmission, where one end vertex of the path corresponds to the seed of the information cascade and the other end vertex is the last to learn this piece of information.

Furthermore, the longest induced path problem can be interpreted as a class of the subgraph identification problems. One classic example of this type of problem is the maximum clique problem, which involves finding a clique (i.e., a complete subgraph) of maximum cardinality (Bomze, Budinich, Pardalos, & Pelillo, 1999). Note that the diameter of a clique is one. Thus, the maximum clique problem can be viewed as the problem of finding the subgraph of maximum cardinality with the best possible pairwise communication property (namely, with respect to the diameter of the subgraph). On the other hand, the longest induced path problem can be equivalently stated as the problem of finding the connected induced subgraph of maximum diameter (we provide a formal proof of this observation in Section 2.1). Thus, this problem can be viewed as the problem of finding a connected induced subgraph with the worst possible pairwise communication property. Here, we note that in the literature, the longest induced path problem is also known as the maximum induced path problem; see, e.g., Gavril (2002).

Finally, the longest induced path problem is related to the longest path problem, which is defined as the problem of finding a simple path of maximum length (Ahuja et al., 1993). The longest path problem is also known to be NP-hard (Garey & Johnson, 1979).

Summarizing the discussion above, we conclude that the longest induced path problem belongs to an interesting class of network optimization problems with practically relevant applications in various network analysis and design contexts. Unfortunately, due to the inherent computational complexity of this problem, the literature on solution approaches for this problem is rather limited and focuses on polynomially solvable classes of the problem and on providing some bounds on the length of the induced path.

For example, the studies in Gavril (2002) and Ishizeki et al. (2008) show that the weighted longest induced path problem can be solved in polynomial time for some families of graphs, e.g., k-chordal graphs, which are graphs with no induced cycles of more than k vertices, where k is fixed. The work of Courcelle, Makowsky, and Rotics (2000) shows that several difficult combinatorial problems on graphs, including the longest induced path, can be solved in linear time in graphs with a clique-width of at most k, where k is fixed. In Esperet et al. (2017), the authors establish that every 3-connected planar graph contains an induced path on Ω(log |V|) vertices. Additional related results can be found in Di Giacomo et al. (2016).

In contrast to the aforementioned studies, in this paper, we focus on solving the longest induced path problem for general graphs. Specifically, the main contributions of the paper are as follows:

  • In Section 2, we derive four linear integer programming (IP) formulations for finding the longest induced path based on three possible interpretations of the underlying problem. The main advantage of these IP models is that they can be solved exactly by standard solvers, e.g., Gurobi (see Gurobi Optimization, Inc., 2016), thus providing optimal solutions for the longest induced problem on general graphs with no need for specialized algorithms. We also discuss some additional enhancements of these IP models and explore their basic theoretical properties; see Section 3.

  • Unfortunately, the standard solvers can handle only small network instances. To address this issue, in Section 4.1, we provide an exact algorithm that can solve the problem for larger graphs. The algorithm exploits the problem structure and solves a sequence of smaller IPs in an iterative manner. Furthermore, in Section 4.2, we develop a randomized heuristic that can be used to obtain heuristic solutions (clearly, with no optimality guarantee) for problem instances that cannot be handled by either of the above exact approaches.

  • In Section 5, to demonstrate the performance of the proposed solution methods, we perform extensive computational experiments using random and real-life network instances. Moreover, based on our experimental observations, we provide interesting insights into the relationships between the longest induced path and other graph characteristics.

Finally, Section 6 concludes the discussion and highlights possible avenues for future research.

Section snippets

Integer programming models

In this section, we develop four IP formulations for identifying longest induced paths. The proposed models exploit three different interpretations of an induced path in a graph.

Basic analysis of the IP models

In this section, we provide a basic analysis of the IP formulations derived in Section 2. First, in Table 1, we report the number of variables and constraints in each formulation. In particular, with the simplest setting of L=|V|1 and T=|V|1, models IP3 and IP3c contain fewer variables than do IP1 and IP2. For sufficiently sparse graphs, e.g., |E|=O(|V|), IP3c is the best model with respect to the number of variables and constraints.

In general, the number of variables and constraints in the

Algorithms

In Section 4.1, we present an exact iterative algorithm that solves a sequence of smaller IPs to obtain an optimal solution for the original problem. Then, in Section 4.2 we develop a randomized heuristic based on a random walk that is capable of finding induced paths in larger networks.

Computational experiments

This section presents our computational experiments. The aim of the computational experiments is fourfold:

  • 1.

    Illustrate the performance and limitations of the developed IP formulations, the proposed enhancements, and the exact algorithm on various types of synthetic and real-life network instances;

  • 2.

    Evaluate the performance and efficiency of the random-walk-based heuristic;

  • 3.

    Identify the longest induced paths in various networks and provide valuable insights; and

  • 4.

    Investigate the relationship between

Conclusions

In this paper, we provide novel linear integer programming models for solving the longest induced path problem, which is defined as the problem of finding the subgraph of largest cardinality such that this subgraph is a simple path. Specifically, we present three conceptually different approaches that are based on various structural aspects of the considered problem. The first approach links this problem to finding a subgraph with the maximum diameter and presents an IP model for finding such a

Acknowledgments

This research is also partially supported by the U.S. Air Force Research Laboratory (AFRL) Mathematical Modeling and Optimization Institute. The U.S. government is authorized to reproduce and distribute reprints for governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFRL/RW or the

References (36)

  • A.-L. Barabási et al.

    Emergence of scaling in random networks

    Science

    (1999)
  • A.-L. Barabási

    Network science

    (2016)
  • Batagelj, V., & Mrvar, A. (2009). Pajek datasets (2006)....
  • I. Bomze et al.

    The maximum clique problem

    Handbook of combinatorial optimization

    (1999)
  • S. Borgatti et al.

    Analyzing social networks

    (2013)
  • COLOR03 (2004)....
  • Continuum Analytics, Inc. (2016). Anaconda reference manual....
  • B. Courcelle et al.

    Linear time solvable optimization problems on graphs of bounded clique-width

    Theory of Computing Systems

    (2000)
  • Cited by (13)

    • New formulations and branch-and-cut procedures for the longest induced path problem

      2022, Computers and Operations Research
      Citation Excerpt :

      However, to the best of our knowledge, approaches for solving general instances of the problem were only proposed very recently. Matsypura et al. (2019) presented three compact integer programming (IP) formulations and an exact iterative IP-based algorithm using their IP formulations. The authors also presented a randomized heuristic to tackle larger instances of the problem.

    • MIP formulations for induced graph optimization problems: a tutorial

      2023, International Transactions in Operational Research
    View all citing articles on Scopus
    View full text