Elsevier

Information Sciences

Volume 432, March 2018, Pages 362-375
Information Sciences

New heuristic approaches for maximum balanced biclique problem

https://doi.org/10.1016/j.ins.2017.12.012Get rights and content

Abstract

The maximum balanced biclique problem (MBBP) is an important extension of the maximum clique problem (MCP), which has wide industrial applications. In this paper, we propose a new local search framework for MBBP where four heuristics are incorporated to improve its performance. Our framework alternates between an extension phase via adding vertex pairs and a restarting phase via removing vertex pairs. Three heuristics are proposed for selecting the pairs for addition and removal. The first heuristic is a prediction score function to greedily select the vertex pairs for addition, which makes use of the structural information of the problem. The second heuristic is a self-adaptive restarting heuristic that removes a dynamic number of vertex pairs from the candidate solution to allow the search to continue from a new search area. The third heuristic is proposed for solving massive graphs and is called the two-mode perturbation heuristic. It is used for selecting pairs of vertices for addition and lowers the average complexity for this task. We also introduce a k-bipartite core reduction rule to decrease the scale of all massive instances, which helps our algorithm find optimal solutions for many massive instances. These techniques lead to two efficient local search algorithms for MBBP. Experimental results demonstrate that the proposed algorithms can scale up to massive instances with billions of edges and that the proposed algorithms outperform state-of-the-art MBBP algorithms on standard benchmarks.

Introduction

Given a bipartite graph G=(U,V,E), a biclique B=(Ub,Vb,Eb) is a subgraph of G such that each pair (u, v) (i.e., u ∈ Ub and v ∈ Vb) is mutually adjacent. If Ub=Vb, then B is a balanced biclique of the given bipartite graph. The maximum balanced biclique problem (MBBP) aims to finding the balanced biclique with the maximum number of vertices. The MBBP problem plays a prominent role in various real-world industrial applications, including defect densities in self-assembly enabled nanotechnology [15], [16], defect tolerance for nanotechnology crossbar switches [1], [17], programmable logic array folding in VLSI theory [14], and computational biology problems such as gene expression data problem [24].

The MBBP has been proven to be NP-hard [8], [13], meaning that unless P = NP, there are no polynomial-time algorithms to solve the problem. Additionally, it is difficult to approximate the problem and state-of-the-art approximation algorithms can only achieve an approximation ratio of 2(logn)θ, for some θ > 0 [6]. Because of the hardness of the MBBP, a huge amount of effort has been devoted to finding an acceptable balanced biclique within a reasonable time. To date, most practical algorithms for solving the MBBP have been heuristic algorithms.

A popular method for solving the MBBP is the node-deletion-based method [1], [16], [25], [26] , which solves the MBBP by converting the problem into a maximum balanced independent set problem in a complement bipartite graph. An early node-deletion-based algorithm for the MBBP implemented an application-independent defect tolerant design flow by removing the vertices with the maximum degree [16]. Based on [16], Al-Yamani et al. [1] designed an improved algorithm to handle larger bicliques. A key improvement in their algorithm was the removal of one vertex in an area that is adjacent to the maximum number of vertices with the minimum degree in the other area. A combination of the key ideas from the above two algorithms [1], [16] leads to a more advanced heuristic that first deletes the vertex with the minimum degree in one area and then removes the vertex with the maximum degree in the other area [25]. This has resulted in an algorithm called Alg3 in [25], which is more efficient than those in [1], [16]. Additionally, Alg3 attempts to reduce the degree of the vertex with the smallest degree in one area as in [1] and also reduces the number of edges in the bipartite graph as in [16]. A recent node-deletion-based algorithm [26] drops all vertices adjacent to the vertex with the minimum degree in each iteration, which reduces the number of major loops considerably to achieve the superior performance. It also employs the heuristics from [16] and [1].

Furthermore, a popular method for tackling hard combinatorial optimization problems is local search, which can find good solutions within reasonable time and typically remains effective for solving very large problems. Local search has been successfully applied to various combinatorial optimization problems, including the maximum satisfiability problem [5], minimum weighted vertex cover problem [10], vertex separator problem [3], graph coloring problem [28], maximum weight clique problem [19], minimum set covering problem [21], and many others. However, as far as we know, there is only one local search algorithm for solving the MBBP, which is called the evolutionary algorithm with structure mutation (EA/SM) [27]. In EA/SM, a local search combined with a repair-assisted restart process is used to solve the MBBP. The novel SM mutation operator was introduced to enhance exploration during the local search process. The SM can change the structure of solutions dynamically while keeping their size (fitness) and feasibility unchanged. Additionally, EA/SM implements a type of large mutation in the structure space of the MBBP to help the algorithm escape from local optima. A local search operator was also proposed for the EA/SM to improve the quality of solutions efficiently and a novel repair-assisted restart process was designed to repair every new solution reinitialized. According to the experiments in [27], EA/SM outperforms previous node-deletion-based algorithms [1], [16], [25], [26] on classical random benchmarks. This indicates that local search is a promising method for solving the MBBP and that it deserves further research.

In this paper, we develop a novel local search framework based on pair operations (POLS), which is different from the previous local search algorithms for the MBBP based on one-vertex operations (i.e., adding or removing a single vertex in each step). Our local search framework is based on a combination of an extension phase and restarting phase. There are two basic operations in our framework: vertex pair addition and vertex pair removal. Specifically, given a bipartite graph G=(U,V,E) and candidate solution S=(Us,Vs,Es), our algorithm searches for vertex pairs (u, v) where u ∉ Us and vVs, such that u is adjacent to all vertices ∀vs ∈ Vs and v is adjacent to all vertices ∀us ∈ Us. If the algorithm finds such vertex pairs, it selects one pair to add to the candidate solution, which constitutes the pair addition operation. The pair removal operation selects u in one area Us of the candidate solution S and v in another area Vs, then removes this pair from the candidate solution. Another feature that distinguishes our local search algorithm from the previous local search algorithms for the MBBP is that our algorithm only searches among valid solutions, meaning it guarantees that the candidate solution S after each step is always a balanced biclique. Although the previous EA/SM local search algorithm [27] maintains a biclique during the search, it is not necessarily a balanced biclique.

We also propose four new heuristics for the MBBP. The first three deals with how to select the pairs of vertices for addition or removal and the final heuristic is a reduction rule. Based on the proposed framework and these heuristics, we develop two local search algorithms, the latter of which is an improved version of the former for massive bipartite graphs.

The first heuristic is a novel scoring function for choosing the pairs of vertices for addition. For a candidate addition pair, the scoring function takes into account both the lower and upper bounds of the size of the maximal solution extended from the current solution after adding the candidate pair. This value predicts the size of the solution that can be constructed after adding the candidate vertex pair. Thus, this scoring function is called the prediction score (pscore). Specifically, a cost-effective upper bound is proposed so that pscore can be calculated with low time complexity. Our algorithm chooses the pair of vertices for addition with the greatest pscore.

The second heuristic is a robust self-adaptive restarting (RSR) heuristic, which aims to improve local search by restarting the search if it cannot find a better solution within a certain number of steps. It may take many steps for the algorithm to find a better solution if the search stays in a poor search area containing no (or few) high quality solutions, which could waste a considerable amount of time. To avoid this drawback, we propose a self-adaptive restarting heuristic to dynamically restart the search process. Specifically, if the algorithm cannot find a better solution within a self-adaptive number of search steps, we remove certain vertex pairs from the current candidate solution so that the algorithm can search in a different direction.

The above two heuristics are used in developing a local search algorithm for the MBBP, called POLS with pscore and RSR (PSRS). We perform out experiments to compare PSRS to the state-of-the-art MBBP algorithms [1], [27] on various benchmarks from the literature, including randomly generated classical instances [27] and a broad range of massive bipartite graphs with nearly one billion edges. Experimental results demonstrate that PSRS significantly outperforms previous algorithms and improves upon the best known solution quality for certain difficult instances.

In order to improve the performance of PSRS on massive bipartite graphs, we propose two additional heuristics. The third heuristic is a two-mode perturbation (TMP) heuristic, which combines the greedy selection rule based on pscore with a randomized selection strategy. PSRS typically chooses the pair of vertices for addition with the greatest pscore. However, for massive bipartite graphs, it is very time consuming to find the pair of vertices with the greatest pscore, because there are too many candidate vertex pairs. Additionally, most real-world massive bipartite graphs are very sparse, meaning pure greedy heuristics can easily lead the search into local optima. Based on these two considerations, we improve the selection heuristic by incorporating a randomized selection strategy with a certain probability, leading to the TMP heuristic. This reduces the average cost of choosing the vertex pair for addition, and also introduces additional diversification.

The final heuristic is the k-bipartite core reduction rule (KCR), which is used to reduce the scale of massive bipartite graphs by deleting vertices that are impossible to include in any optimal solutions. This rule is based on a heuristic called the k-bipartite core, which is inspired by the heuristic of the k-core [18].

We improve the PSRS algorithm by using the third and fourth heuristics, and the resulting algorithm is called PSRS+ (PSRS with TMP and KCR), which is more effective for massive graphs. We select 17 massive bipartite graphs from [9] and use them to test the performances of EA/SM, PSRS, and PSRS+. Experiments demonstrate that PSRS+ greatly improves upon PSRS and significantly outperforms EA/SM on all massive real graphs.

We also conduct experimental analysis and additional investigations on the heuristics presented in this work. Specifically, we compare PSRS and PSRS+ with several alternative versions that operate without using one of the aforementioned heuristics. The experimental results demonstrate the effectiveness of the proposed heuristics.

In the next section, we introduce some necessary background knowledge and previous MBBP algorithms. We then propose a local search framework based on pair operations. Next, we describe our novel scoring function, self-adaptive restarting heuristic, and PSRS algorithm. We then present the TMP heuristic and KCR reduction rule, and present the PSRS+ algorithm. Sections 6 and 7 present the experimental results for PSRS and PSRS+, as well as experiments validating the effectiveness of the proposed novel heuristics on some benchmarks. Finally, we provide some concluding remarks.

Section snippets

Basic definitions and notations

Given a bipartite graph G = (U, V, E), G can be divided into two disjoint vertex sets U = {u1, u2, , un} and V = {v1, v2, , vm} such that every edge connects one vertex in U to one vertex in V. E={e1, e2, , et} is the set of edges. The neighborhood of a vertex u ∈ U is N(u) = {v ∈ V|(v, u) ∈ E}. Similarly, the neighborhood of a vertex v ∈ V is N(v) = {u ∈ U|(v, u) ∈ E}. The degree of a vertex v is the size of its neighborhood and is denoted |N(v)|. The size of a bipartite graph is defined as

Two novel pair operations in local search for the MBBP

Local search algorithms perform searches within corresponding search spaces. The key to defining a search space is how the algorithms transform a candidate solution into a different solution.

PSRS algorithm

Based on the POLS framework, we propose an algorithm for solving the MBBP called PSRS. In this section, we introduce the PSRS algorithm and describe two of its important components.

The PSRS algorithm is outlined at a high level in Algorithm 2 and described below. In the beginning, the current candidate solution S is the empty set. The algorithm then initializes NoImprove, which denotes the number of non-improvement iterations, and iter (when NoImprove reaches iter, the RSR heuristic initiates

Two novel ideas for real massive bipartite graphs

In this section, we collect massive bipartite graphs from the Koblenz Network Collection (KONECT) [9], which contains network datasets from the areas of web science, network science, etc. We list the 17 selected real massive bipartite graphs in Table 1.

Some major characteristics of each selected instance appear in Table 1. The columns are: The names of the instances (Instance), numbers of left vertices (|V|), numbers of right vertices (|U|), total numbers of vertices (|V|+|U|), total numbers of

Experimental results on random benchmarks

In this section, we perform extensive experiments to test the performance of our algorithm on two random benchmarks, including a classical random benchmark (30 instances) and some new massive benchmarks (90 instances), where all instances are randomly generated as in previous works [25], [26]. These instances have sizes of 250, 500, 1000, 5000, 10,000, 20,000, 30,000, and 40,000. The probability p that a particular edge exists in the given bipartite graphs has three values: p = 95%, 90%, and

Experimental results on massive benchmarks

In this section, we evaluate the performance of PSRS+TM and PSRS+TMKC on the massive bipartite graphs. For PSRS+TM and PSRS+TMKC, the time limit was 1000 s. The parameter q was set to 0.8. For every instance, each algorithm performed 10 independent runs with different random seeds. In the massive benchmark experiment, α and β were also set to 10,000 and 10, respectively.

In Table 5, for each algorithm, we list the maximum value (max), average value (avg) of 10 independent runs, and real run time

Summary and future work

This paper presented two fast local search algorithms called PSRS and PSRS+ for the MBBP. We proposed a new prediction selection strategy based on pairs of vertices and designed an addition rule to find good search spaces. Furthermore, we introduced the RSR heuristic to overcome the cycling and restart problems. Experimental results indicated that PSRS performs better than four previous state-of-the-art algorithms on all random instances in terms of quality of solution values. More importantly,

Acknowledgements

This work was supported in part by NSFC (under Grant nos. 61370156, 61503074, 61502464, 61402070, 61403077, and 61403076) and China National 973 program 2014CB340301.

References (28)

  • U. Feige et al.

    Hardness of approximation of the balanced complete bipartite subgraph problem

    Technical report

    (2004)
  • F. Glover

    Tabu search-part i

    ORSA J.Comput.

    (1989)
  • J. Kunegis

    Konect: the koblenz network collection

    Proceedings of the 22nd International Conference on World Wide Web

    (2013)
  • E. Marchiori

    Genetic, Iterated and Multistart Local Search for the Maximum Clique Problem

    Applications of Evolutionary Computing

    (2002)
  • Cited by (29)

    • General swap-based multiple neighborhood adaptive search for the maximum balanced biclique problem

      2020, Computers and Operations Research
      Citation Excerpt :

      Very recently, Zhou and Hao (2019) presented a highly effective local search method (TSGR) integrating two graph reduction techniques to shrink the given graph within the tabu search framework. According to the computational results reported in Wang et al. (2018b) and Zhou and Hao (2019), PSRS (PSRS+) and TSGR show the best performance among the heuristic approaches for MBBP. In this work, we propose a general swap-based multiple neighborhood adaptive search SBMNAS for MBBP.

    • A local search algorithm with reinforcement learning based repair procedure for minimum weight independent dominating set

      2020, Information Sciences
      Citation Excerpt :

      This work focuses on using heuristic algorithms to solve the MWIDS. Although the heuristic algorithms cannot guarantee the optimality of the solution that they obtain, they can guarantee high-quality solutions within a reasonable time [11,18,28,31,33]. However, there are few heuristic algorithms for solving MWIDS.

    • Dynamic thresholding search for minimum vertex cover in massive sparse graphs

      2019, Engineering Applications of Artificial Intelligence
    • An algorithm for spelling the pitches of any musical scale

      2019, Information Sciences
      Citation Excerpt :

      This work presents an application of the approach of searching a solution space using a heuristic method to a task in processing musical data, which involves spelling pitches of musical scales. ( For some other recent examples of heuristic search approaches used in diverse domains, see [10,11,13,17,25]). Pitch spelling refers to the process of deciding the proper letter name for a pitch (such as, choosing among DImage 1, E(♮), F♭ for pitch-class 4), which is dependent upon the locations of other pitches around the pitch in question.

    View all citing articles on Scopus
    View full text