New heuristic approaches for maximum balanced biclique problem

doi:10.1016/j.ins.2017.12.012

Information Sciences

Volume 432, March 2018, Pages 362-375

https://doi.org/10.1016/j.ins.2017.12.012 Get rights and content

Abstract

The maximum balanced biclique problem (MBBP) is an important extension of the maximum clique problem (MCP), which has wide industrial applications. In this paper, we propose a new local search framework for MBBP where four heuristics are incorporated to improve its performance. Our framework alternates between an extension phase via adding vertex pairs and a restarting phase via removing vertex pairs. Three heuristics are proposed for selecting the pairs for addition and removal. The first heuristic is a prediction score function to greedily select the vertex pairs for addition, which makes use of the structural information of the problem. The second heuristic is a self-adaptive restarting heuristic that removes a dynamic number of vertex pairs from the candidate solution to allow the search to continue from a new search area. The third heuristic is proposed for solving massive graphs and is called the two-mode perturbation heuristic. It is used for selecting pairs of vertices for addition and lowers the average complexity for this task. We also introduce a k-bipartite core reduction rule to decrease the scale of all massive instances, which helps our algorithm find optimal solutions for many massive instances. These techniques lead to two efficient local search algorithms for MBBP. Experimental results demonstrate that the proposed algorithms can scale up to massive instances with billions of edges and that the proposed algorithms outperform state-of-the-art MBBP algorithms on standard benchmarks.

Introduction

Given a bipartite graph $G = (U, V, E),$ a biclique $B = (U^{b}, V^{b}, E^{b})$ is a subgraph of G such that each pair (u, v) (i.e., u ∈ U^b and v ∈ V^b) is mutually adjacent. If $U^{b} = V^{b},$ then B is a balanced biclique of the given bipartite graph. The maximum balanced biclique problem (MBBP) aims to finding the balanced biclique with the maximum number of vertices. The MBBP problem plays a prominent role in various real-world industrial applications, including defect densities in self-assembly enabled nanotechnology [15], [16], defect tolerance for nanotechnology crossbar switches [1], [17], programmable logic array folding in VLSI theory [14], and computational biology problems such as gene expression data problem [24].

The MBBP has been proven to be NP-hard [8], [13], meaning that unless P = NP, there are no polynomial-time algorithms to solve the problem. Additionally, it is difficult to approximate the problem and state-of-the-art approximation algorithms can only achieve an approximation ratio of $2^{{(l o g n)}^{θ}},$ for some θ > 0 [6]. Because of the hardness of the MBBP, a huge amount of effort has been devoted to finding an acceptable balanced biclique within a reasonable time. To date, most practical algorithms for solving the MBBP have been heuristic algorithms.

A popular method for solving the MBBP is the node-deletion-based method [1], [16], [25], [26] , which solves the MBBP by converting the problem into a maximum balanced independent set problem in a complement bipartite graph. An early node-deletion-based algorithm for the MBBP implemented an application-independent defect tolerant design flow by removing the vertices with the maximum degree [16]. Based on [16], Al-Yamani et al. [1] designed an improved algorithm to handle larger bicliques. A key improvement in their algorithm was the removal of one vertex in an area that is adjacent to the maximum number of vertices with the minimum degree in the other area. A combination of the key ideas from the above two algorithms [1], [16] leads to a more advanced heuristic that first deletes the vertex with the minimum degree in one area and then removes the vertex with the maximum degree in the other area [25]. This has resulted in an algorithm called Alg3 in [25], which is more efficient than those in [1], [16]. Additionally, Alg3 attempts to reduce the degree of the vertex with the smallest degree in one area as in [1] and also reduces the number of edges in the bipartite graph as in [16]. A recent node-deletion-based algorithm [26] drops all vertices adjacent to the vertex with the minimum degree in each iteration, which reduces the number of major loops considerably to achieve the superior performance. It also employs the heuristics from [16] and [1].

Furthermore, a popular method for tackling hard combinatorial optimization problems is local search, which can find good solutions within reasonable time and typically remains effective for solving very large problems. Local search has been successfully applied to various combinatorial optimization problems, including the maximum satisfiability problem [5], minimum weighted vertex cover problem [10], vertex separator problem [3], graph coloring problem [28], maximum weight clique problem [19], minimum set covering problem [21], and many others. However, as far as we know, there is only one local search algorithm for solving the MBBP, which is called the evolutionary algorithm with structure mutation (EA/SM) [27]. In EA/SM, a local search combined with a repair-assisted restart process is used to solve the MBBP. The novel SM mutation operator was introduced to enhance exploration during the local search process. The SM can change the structure of solutions dynamically while keeping their size (fitness) and feasibility unchanged. Additionally, EA/SM implements a type of large mutation in the structure space of the MBBP to help the algorithm escape from local optima. A local search operator was also proposed for the EA/SM to improve the quality of solutions efficiently and a novel repair-assisted restart process was designed to repair every new solution reinitialized. According to the experiments in [27], EA/SM outperforms previous node-deletion-based algorithms [1], [16], [25], [26] on classical random benchmarks. This indicates that local search is a promising method for solving the MBBP and that it deserves further research.

In this paper, we develop a novel local search framework based on pair operations (POLS), which is different from the previous local search algorithms for the MBBP based on one-vertex operations (i.e., adding or removing a single vertex in each step). Our local search framework is based on a combination of an extension phase and restarting phase. There are two basic operations in our framework: vertex pair addition and vertex pair removal. Specifically, given a bipartite graph $G = (U, V, E)$ and candidate solution $S = (U^{s}, V^{s}, E^{s}),$ our algorithm searches for vertex pairs (u, v) where u ∉ U^s and v∉V^s, such that u is adjacent to all vertices ∀v^s ∈ V^s and v is adjacent to all vertices ∀u^s ∈ U^s. If the algorithm finds such vertex pairs, it selects one pair to add to the candidate solution, which constitutes the pair addition operation. The pair removal operation selects u in one area U^s of the candidate solution S and v in another area V^s, then removes this pair from the candidate solution. Another feature that distinguishes our local search algorithm from the previous local search algorithms for the MBBP is that our algorithm only searches among valid solutions, meaning it guarantees that the candidate solution S after each step is always a balanced biclique. Although the previous EA/SM local search algorithm [27] maintains a biclique during the search, it is not necessarily a balanced biclique.

We also propose four new heuristics for the MBBP. The first three deals with how to select the pairs of vertices for addition or removal and the final heuristic is a reduction rule. Based on the proposed framework and these heuristics, we develop two local search algorithms, the latter of which is an improved version of the former for massive bipartite graphs.

The first heuristic is a novel scoring function for choosing the pairs of vertices for addition. For a candidate addition pair, the scoring function takes into account both the lower and upper bounds of the size of the maximal solution extended from the current solution after adding the candidate pair. This value predicts the size of the solution that can be constructed after adding the candidate vertex pair. Thus, this scoring function is called the prediction score (pscore). Specifically, a cost-effective upper bound is proposed so that pscore can be calculated with low time complexity. Our algorithm chooses the pair of vertices for addition with the greatest pscore.

The second heuristic is a robust self-adaptive restarting (RSR) heuristic, which aims to improve local search by restarting the search if it cannot find a better solution within a certain number of steps. It may take many steps for the algorithm to find a better solution if the search stays in a poor search area containing no (or few) high quality solutions, which could waste a considerable amount of time. To avoid this drawback, we propose a self-adaptive restarting heuristic to dynamically restart the search process. Specifically, if the algorithm cannot find a better solution within a self-adaptive number of search steps, we remove certain vertex pairs from the current candidate solution so that the algorithm can search in a different direction.

The above two heuristics are used in developing a local search algorithm for the MBBP, called POLS with pscore and RSR (PSRS). We perform out experiments to compare PSRS to the state-of-the-art MBBP algorithms [1], [27] on various benchmarks from the literature, including randomly generated classical instances [27] and a broad range of massive bipartite graphs with nearly one billion edges. Experimental results demonstrate that PSRS significantly outperforms previous algorithms and improves upon the best known solution quality for certain difficult instances.

In order to improve the performance of PSRS on massive bipartite graphs, we propose two additional heuristics. The third heuristic is a two-mode perturbation (TMP) heuristic, which combines the greedy selection rule based on pscore with a randomized selection strategy. PSRS typically chooses the pair of vertices for addition with the greatest pscore. However, for massive bipartite graphs, it is very time consuming to find the pair of vertices with the greatest pscore, because there are too many candidate vertex pairs. Additionally, most real-world massive bipartite graphs are very sparse, meaning pure greedy heuristics can easily lead the search into local optima. Based on these two considerations, we improve the selection heuristic by incorporating a randomized selection strategy with a certain probability, leading to the TMP heuristic. This reduces the average cost of choosing the vertex pair for addition, and also introduces additional diversification.

The final heuristic is the k-bipartite core reduction rule (KCR), which is used to reduce the scale of massive bipartite graphs by deleting vertices that are impossible to include in any optimal solutions. This rule is based on a heuristic called the k-bipartite core, which is inspired by the heuristic of the k-core [18].

We improve the PSRS algorithm by using the third and fourth heuristics, and the resulting algorithm is called PSRS+ (PSRS with TMP and KCR), which is more effective for massive graphs. We select 17 massive bipartite graphs from [9] and use them to test the performances of EA/SM, PSRS, and PSRS+. Experiments demonstrate that PSRS+ greatly improves upon PSRS and significantly outperforms EA/SM on all massive real graphs.

We also conduct experimental analysis and additional investigations on the heuristics presented in this work. Specifically, we compare PSRS and PSRS+ with several alternative versions that operate without using one of the aforementioned heuristics. The experimental results demonstrate the effectiveness of the proposed heuristics.

In the next section, we introduce some necessary background knowledge and previous MBBP algorithms. We then propose a local search framework based on pair operations. Next, we describe our novel scoring function, self-adaptive restarting heuristic, and PSRS algorithm. We then present the TMP heuristic and KCR reduction rule, and present the PSRS+ algorithm. Sections 6 and 7 present the experimental results for PSRS and PSRS+, as well as experiments validating the effectiveness of the proposed novel heuristics on some benchmarks. Finally, we provide some concluding remarks.

Section snippets

Basic definitions and notations

Given a bipartite graph G = (U, V, E), G can be divided into two disjoint vertex sets U = {u₁, u₂, $\dots,$ u_n} and V = {v₁, v₂, $\dots,$ v_m} such that every edge connects one vertex in U to one vertex in V. E={e₁, e₂, $\dots,$ e_t} is the set of edges. The neighborhood of a vertex u ∈ U is N(u) = {v ∈ V|(v, u) ∈ E}. Similarly, the neighborhood of a vertex v ∈ V is N(v) = {u ∈ U|(v, u) ∈ E}. The degree of a vertex v is the size of its neighborhood and is denoted |N(v)|. The size of a bipartite graph is defined as

Two novel pair operations in local search for the MBBP

Local search algorithms perform searches within corresponding search spaces. The key to defining a search space is how the algorithms transform a candidate solution into a different solution.

PSRS algorithm

Based on the POLS framework, we propose an algorithm for solving the MBBP called PSRS. In this section, we introduce the PSRS algorithm and describe two of its important components.

The PSRS algorithm is outlined at a high level in Algorithm 2 and described below. In the beginning, the current candidate solution S is the empty set. The algorithm then initializes NoImprove, which denotes the number of non-improvement iterations, and iter (when NoImprove reaches iter, the RSR heuristic initiates

Two novel ideas for real massive bipartite graphs

In this section, we collect massive bipartite graphs from the Koblenz Network Collection (KONECT) [9], which contains network datasets from the areas of web science, network science, etc. We list the 17 selected real massive bipartite graphs in Table 1.

Some major characteristics of each selected instance appear in Table 1. The columns are: The names of the instances (Instance), numbers of left vertices (|V|), numbers of right vertices (|U|), total numbers of vertices ( $| V | + | U |$ ), total numbers of

Experimental results on random benchmarks

In this section, we perform extensive experiments to test the performance of our algorithm on two random benchmarks, including a classical random benchmark (30 instances) and some new massive benchmarks (90 instances), where all instances are randomly generated as in previous works [25], [26]. These instances have sizes of 250, 500, 1000, 5000, 10,000, 20,000, 30,000, and 40,000. The probability p that a particular edge exists in the given bipartite graphs has three values: p = 95%, 90%, and

Experimental results on massive benchmarks

In this section, we evaluate the performance of PSRS+TM and PSRS+TMKC on the massive bipartite graphs. For PSRS+TM and PSRS+TMKC, the time limit was 1000 s. The parameter q was set to 0.8. For every instance, each algorithm performed 10 independent runs with different random seeds. In the massive benchmark experiment, α and β were also set to 10,000 and 10, respectively.

In Table 5, for each algorithm, we list the maximum value (max), average value (avg) of 10 independent runs, and real run time

Summary and future work

This paper presented two fast local search algorithms called PSRS and PSRS+ for the MBBP. We proposed a new prediction selection strategy based on pairs of vertices and designed an addition rule to find good search spaces. Furthermore, we introduced the RSR heuristic to overcome the cycling and restart problems. Experimental results indicated that PSRS performs better than four previous state-of-the-art algorithms on all random instances in terms of quality of solution values. More importantly,

Acknowledgements

This work was supported in part by NSFC (under Grant nos. 61370156, 61503074, 61502464, 61402070, 61403077, and 61403076) and China National 973 program 2014CB340301.

References (28)

S. Cai et al.
New local search methods for partial maxsat
Artif. Intell.
(2016)
D.S. Johnson
The np-completeness column: an ongoing guide
J. Algorithms
(1985)
R. Li et al.
An efficient local search framework for the minimum weighted vertex cover problem
Inf. Sci.
(2016)
T. Ma et al.
Led: a fast overlapping communities detection algorithm based on structural clustering
Neurocomputing
(2016)
Y. Wang et al.
Two efficient local search algorithms for maximum weight clique problem
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence
(2016)
Y. Zhou et al.
Reinforcement learning based local search for grouping problems: a case study on graph coloring
Expert Syst. Appl.
(2016)
A.A. Al-Yamani et al.
A defect tolerance scheme for nanotechnology circuits
Circuits Syst. I
(2007)
S. Baluja
Population-based incremental learning. a method for integrating genetic search based function optimization and competitive learning
Technical Report
(1994)
U. Benlic et al.
Breakout local search for the vertex separator problem
IJCAI
(2013)
S. Cai et al.
Fast solving maximum weight clique problem in massive graphs
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016
(2016)

U. Feige et al.

Hardness of approximation of the balanced complete bipartite subgraph problem

Technical report

(2004)

F. Glover

Tabu search-part i

ORSA J.Comput.

(1989)

J. Kunegis

Konect: the koblenz network collection

Proceedings of the 22nd International Conference on World Wide Web

(2013)

E. Marchiori

Genetic, Iterated and Multistart Local Search for the Maximum Clique Problem

Applications of Evolutionary Computing

(2002)

Cited by (29)

Identifying the cardinality-constrained critical nodes with a hybrid evolutionary algorithm
2023, Information Sciences
Identifying the critical nodes in a network is crucial for understanding its characteristics, controlling its structure, and determining its functionality. Cardinality-constrained CNP (CC-CNP) is a nondeterministic polynomial-time (NP)-hard combinatorial optimization problem that refers to minimizing a set of nodes such that after deletion, the size of the largest connected component in the residual subgraph is smaller than a prescribed value. CC-CNP is applicable to a variety of fields, such as epidemic and infectious disease control, electric power network construction and maintenance, and traffic network control.
In this work, we present a multistage local search (MSLS) algorithm for generating high-quality initial solutions for CC-CNP, where two strategies, circular node deletion and node change and the tabu search-based first in, first out (FIFO) principle, are utilized to prevent search detours. Then, a population-based strategy is incorporated, resulting in a genetic algorithm-based multistage local search algorithm (GAMSLS) that adopts a genetic algorithm framework, refines the initial solutions in the crossover process, and utilizes a new population update strategy to ensure the diversity and individual quality of the population. The proposed algorithm is evaluated on 75 network instances and is shown to outperform state-of-the-art algorithms for CC-CNP.
General swap-based multiple neighborhood adaptive search for the maximum balanced biclique problem
2020, Computers and Operations Research
Citation Excerpt :
Very recently, Zhou and Hao (2019) presented a highly effective local search method (TSGR) integrating two graph reduction techniques to shrink the given graph within the tabu search framework. According to the computational results reported in Wang et al. (2018b) and Zhou and Hao (2019), PSRS (PSRS+) and TSGR show the best performance among the heuristic approaches for MBBP. In this work, we propose a general swap-based multiple neighborhood adaptive search SBMNAS for MBBP.
The maximum balanced biclique problem (MBBP) is to find the largest complete bipartite subgraph induced by two equal-sized subsets of vertices in a bipartite graph. MBBP is an NP-hard problem with a number of relevant applications. In this work, we propose a general swap-based multiple neighborhood adaptive search (SBMNAS) for MBBP. This algorithm combines a general k-SWAP operator which is used in local searches for MBBP for the first time, an adaptive rule for neighborhood exploration and a frequency-based perturbation strategy to ensure a global diversification. SBMNAS is evaluated on 60 random dense instances and 25 real-life large sparse instances from the popular Koblenz Network Collection (KONECT). Computational results show that our proposed algorithm attains all but one best-known solutions, and finds improved best-known results for 19 instances (new lower bounds).
Efficient temporal core maintenance of massive graphs
2020, Information Sciences
$k -$ core is a cohesive subgraph such that every vertex has at least k neighbors within the subgraph, which provides a good measure to evaluate the importance of vertices as well as their connections. Unfortunately, $k -$ core cannot adequately reveal the structure of a temporal graph, in which two vertices may connect multiple edges containing time information. As a result, $(k, h) -$ core is derived from $k -$ core, which is also called temporal core, to provide a well-formulated definition, where h represents the number of temporal edges between two vertices. However, it is costly to repeatedly decompose a temporal graph changing over time.To address this challenge, we study the method of $(k, h) -$ core maintenance, which can find current $(k, h) -$ cores with less computational efforts. To estimate the influence scope of inserted (removed) edges, we propose quasi-temporal core, denoted by quasi $- (k, h) -$ core, which relaxes the constraint of $(k, h) -$ core but still has similar properties to $(k, h) -$ core. With the aid of quasi $- (k, h) -$ core, our insertion algorithm finds the minimum incremental graph for each influenced $(k, h) -$ core, and the removal algorithm adjusts each influenced $(k, h) -$ core in the minimal range. Experimental results verify effectiveness and scalability of our proposed algorithms.
A local search algorithm with reinforcement learning based repair procedure for minimum weight independent dominating set
2020, Information Sciences
Citation Excerpt :
This work focuses on using heuristic algorithms to solve the MWIDS. Although the heuristic algorithms cannot guarantee the optimality of the solution that they obtain, they can guarantee high-quality solutions within a reasonable time [11,18,28,31,33]. However, there are few heuristic algorithms for solving MWIDS.
The minimum weight independent dominating set problem (MWIDS) is a famous NP-hard combinatorial optimization problem. We herein propose a local search algorithm with reinforcement-learning-based repair procedure (LSRR). The proposed algorithm combines local search with repair procedure based on the mind of reinforcement learning. This algorithm iterates through three procedures: the greedy procedure to improve the initial solution, the local search procedure to further improve the solution, and the repair procedure to destroy the initial solution and then reconstruct a new solution. In addition, because of the particularity of the weight functions in all benchmarks, we propose three novel scoring functions. Experiments are performed on two types of graphs including random graphs and random geometric graphs. Experimental results display that LSRR outperforms the previous MWIDS algorithms significantly.
Dynamic thresholding search for minimum vertex cover in massive sparse graphs
2019, Engineering Applications of Artificial Intelligence
A number of important applications related to complex network analysis require finding small vertex covers in massive graphs. This paper proposes an effective stochastic local search algorithm called DTS_MVC to fulfill this task. Relying on a fast vertex-based search strategy, DTS_MVC effectively explores the search space by alternating between a thresholding search phase during which the algorithm accepts both improving and non-improving solutions that satisfy a dynamically changing quality threshold, and a conditional improving phase where only improving solutions are accepted. A novel non-parametric operation-prohibiting mechanism is introduced to avoid search cycling. Computational experiments on 86 massive real-world benchmark graphs indicate that DTS_MVC performs remarkably well by discovering 7 improved best known results (new upper bounds). Additional experiments are conducted to shed light on the key ingredients of DTS_MVC.
An algorithm for spelling the pitches of any musical scale
2019, Information Sciences
Citation Excerpt :
This work presents an application of the approach of searching a solution space using a heuristic method to a task in processing musical data, which involves spelling pitches of musical scales. ( For some other recent examples of heuristic search approaches used in diverse domains, see [10,11,13,17,25]). Pitch spelling refers to the process of deciding the proper letter name for a pitch (such as, choosing among DImage 1, E(♮), F♭ for pitch-class 4), which is dependent upon the locations of other pitches around the pitch in question.
In this paper, we propose a method for the fundamental task of optimally spelling the pitches of any given musical scale. The input, given as a sequence of pitch-class numbers, can be any randomly compiled subset of the chromatic scale, resulting in either a traditional/known scale or a novel/unknown one. The method consists of generating all potential solutions containing all possible spellings for the pitch classes in a given input sequence, and subjecting them to five filtering stages to find the correct solution. We present an algorithm to accomplish this task, and demonstrate some exemplary outputs. Constructing also a modified version of the algorithm to retrieve and execute all possible input sequences, we also present distributions of various outcomes of the procedure over the input universe to exhibit an overall view of results to be produced by the algorithm, along with some findings obtained by this process.

View all citing articles on Scopus

View full text

New heuristic approaches for maximum balanced biclique problem

Abstract

Introduction

Section snippets

Basic definitions and notations

Two novel pair operations in local search for the MBBP

PSRS algorithm

Two novel ideas for real massive bipartite graphs

Experimental results on random benchmarks

Experimental results on massive benchmarks

Summary and future work

Acknowledgements

Artif. Intell.

J. Algorithms

Inf. Sci.

Neurocomputing

Expert Syst. Appl.

A defect tolerance scheme for nanotechnology circuits

Circuits Syst. I

Population-based incremental learning. a method for integrating genetic search based function optimization and competitive learning

Technical Report

Breakout local search for the vertex separator problem

IJCAI

Fast solving maximum weight clique problem in massive graphs

Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016