Elsevier

Parallel Computing

Volume 101, April 2021, 102722
Parallel Computing

Parallel branch and bound algorithm for solving integer linear programming models derived from behavioral synthesis

https://doi.org/10.1016/j.parco.2020.102722Get rights and content

Abstract

Integer Linear Programming (ILP) formulation of behavioral synthesis allows hardware designers to implement efficient circuits considering resource and timing constraint. However, finding the optimal answer of ILP models is an NP-Hard problem and remains a computational challenge. In this paper, we address this challenge by developing two exact parallel branch and bound algorithms which are capable of solving large-scale ILP models derived from behavioral synthesis. The first algorithm enables sub-node parallelism as well as adaptive branching and memory efficient techniques to accelerate solving ILP models on shared memory multi-core systems. The second algorithm is developed based on node parallelism strategy. We evaluated the proposed algorithms using large ILP models derived from Media Bench Data Flow Graphs. The experimental results indicate both the proposed methods can successfully accelerate behavioral synthesis on multi-core platforms and outperforms IBM ILOG CPLEX (v12.60) MIP solver in solving large ILP models.

Introduction

Synthesis of digital circuits is the process of generating hardware from technical specifications. Synthesis consists of transformations and optimizations at multiple levels to generate the desired circuit. Behavioral Synthesis specifically refers to creating register-transfer level descriptions from algorithmic (or behavioral) descriptions. In behavioral synthesis, optimizing hardware latency under resource constraints is an NP-Hard problem [20]. Hence, when the size of the problem grows, heuristic algorithms such as list-based scheduling [20] are used to solve it. But these algorithms are not scalable in support of additional goals for multi-objective optimizations. Integer Linear Programming (ILP) formulation of behavioral synthesis allows hardware designers to model various optimization goals and constraints along with resource and timing limitations.

Different algorithms and tools have been presented for solving ILP models, but due to the complexity of the problem, solving large models is still a computational challenge. One of the practical solutions to overcome the huge state space created from ILP models is development of algorithms adapted to a family of models using their common characteristics that are not public [3], [13], [27]. Branch and Bound (B&B) is one of the algorithm design paradigms that enables using spacial characteristics of problems to prone the state space.

B&B algorithms address optimization problems by organizing the set of candidate solutions as a tree structure, called the search tree, and systematic enumeration of its nodes. There are three algorithmic components that guide the behavior of a B&B algorithm: the search strategy, the branching strategy, and the pruning rules [21]. The search strategy determines how to traverse the search tree. During tree traversal, B&B algorithms process each node and recursively split the search space into smaller spaces where this splitting is called branching. Instead of brute-force processing of all the tree branches, a B&B algorithm uses an approximation function to estimate lower/upper bound for each branch and prune it if is not promising. In fact, a non-promising branch is a sub-space that can be verified does not include the optimal solution. The most common approximation function for B&B algorithms in ILP solvers is LP relaxation [30], where at each node of the search tree, the bound is obtained by solving the relaxed ILP.

High performance computing enables simultaneous usage of multiple processing resources by developing parallel algorithms to achieve reasonable execution time during solving computationally expensive problems. A Parallel B&B (PB&B) algorithm runs multiple processes simultaneously on the search tree. In a PB&B that uses LP relaxation technique, processing of the search tree can occurs in different granularities. In sub-node parallelism a parallel LP relaxation algorithm runs at each node of the search tree. Also, node parallelism enables processing different nodes simultaneously. Dividing the search tree into a number of sub-trees and processing them in parallel is called sub-tree parallelism. Tree parallelism is the most coarse grain granularity and is the processing of different copies of the search tree in parallel with different strategies, and any process that finds a new band shares it with other processes. Some times, a PB&B is the mix of different granularities, as well.

In this paper we propose two PB&B algorithms for solving ILP models derived from behavioral synthesis by providing sub-node / node parallelism on multi-core platforms. The main contributions of this paper are:

  • -

    Representing resource constrained behavioral synthesis problems (Scheduling, resource allocation and binding) in the form of ILP models.

  • -

    Developing two fast parallel B&B algorithms with different tree parallelism strategies for solving the ILP models derived from behavioral synthesis.

  • -

    Applying memory efficient techniques for encoding the search tree nodes of the proposed B&B algorithm.

  • -

    Using problem metadata to present a best-first tree traversal method for guiding the search.

We used Media Bench [19] DFGs to derive a set of large ILP models that support behavioral synthesis of digital circuits, and we studied their features to optimize the models and develop two efficient parallel algorithms for solving them. Then, we implemented the proposed algorithms by C++ and OpenMP library and applied it to the set of models. Also, we tried to solve the models by IBM ILOG CPLEX 12.6 [4] optimizer which is a state of the art MILP solver supporting parallel processing. The experimental results verify that our B&B approach is scalable for solving ILP models derived from behavioral synthesis and it outperforms CPLEX so that both the proposed algorithms can solve models with more than 2000 decision variables and 1000 constraints, while CPLEX cannot solve models with more than 550 variables / constraints.

In the rest of the paper, Section 2 is appropriated to related works. In Section 3, we define the problem by building and optimizing the ILP models that support scheduling, resource allocation and resource binding in behavioral synthesis of digital circuits. Then, we present the proposed PB&B algorithms for solving ILP models in Section 4 and evaluate our experiments in Section 5. Finally, we conclude the paper in Section 6 by reinforcing the main conclusions of this research.

Section snippets

Related work

High-level optimization of circuit structures is extremely critical in achieving the best circuit implementation [20]. It is one of the main goals in behavioral synthesis and serves a variety of objectives. In the following, we will mention some examples. For energy awareness optimization of mobile devices, Li et al. [17] formulated the relation between energy, time and probability based on an energy-minimum model and proposed an efficient dynamic programming algorithm to solve it. In the

Preliminaries

Here we define the problem by developing ILP models for behavioral synthesis. Definitions in this section are different from those in [20], [30], only in some minor details.

Definition 1

A Data-Flow Graph (DFG) is a directed graph Gd=(Vd,Ed) which represents the behavioral model of a digital circuit at the function level abstraction in terms of operations (functional units) and their data exchanges (dependencies). Each vertex vVd corresponds to an operation and each edge (u,v)Ed corresponds to the data

The proposed PB&B algorithms

In this section, we describe the components of the proposed PB&B algorithms as well as parallelization methods and optimizations applied in problem encoding and memory management.

Experimental results

In this study, we use an execution platform with a 64-core AMD processor, 128 GB of main memory, and CentOS distribution of Linux operating system to test the effectiveness of the proposed method. The proposed algorithm is implemented by C++ programming language and OpenMP library. Also, IBM ILOG CPLEX 12.6 has been used to solve models and compared with the results of the proposed method. CPLEX is a leading commercial software product for solving MILPs which uses B&C and supports many

Conclusion

This paper presents two parallel branch and bound algorithms for solving ILP models derived from behavioral synthesis of digital circuits as well as a problem formulation and some optimizations on it. Furthermore, it encodes sub-problems to compress search tree nodes which causes reducing the memory consumption and increases scalability of the algorithms in solving problems with larger sizes. The proposed algorithms are based on LP-relaxed solving of ILP models and are different in

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (31)

  • M. Fazlali et al.

    A new datapath merging method for reconfigurable system

    Proceedings of the International Workshop on Applied Reconfigurable Computing

    (2009)
  • M. Fazlali et al.

    A modified merging approach for datapath configuration time reduction

    Proceedings of the International Symposium on Applied Reconfigurable Computing

    (2010)
  • M. Fazlali et al.

    Efficient datapath merging for the overhead reduction of run-time reconfigurable systems

    J. Supercomput.

    (2012)
  • M. Fazlali et al.

    Data path configuration time reduction for run-time reconfigurable systems.

    Proceedings of the ERSA

    (2009)
  • M. Fazlali et al.

    High speed merged-datapath design for run-time reconfigurable systems

    Proceedings of the International Conference on Field-Programmable Technology

    (2009)
  • Cited by (5)

    View full text