On the performance of parallel approximate inverse preconditioning using Java multithreading techniques

doi:10.1016/j.amc.2007.01.024

Applied Mathematics and Computation

Volume 190, Issue 1, 1 July 2007, Pages 255-270

https://doi.org/10.1016/j.amc.2007.01.024 Get rights and content

Abstract

In this paper a parallel shared memory Java multithreaded design and implementation of the explicit approximate inverse preconditioning is presented for solving efficiently arrow-type linear systems on symmetric multiprocessor systems. A new parallel algorithm for computing a class of optimized approximate inverse matrix is introduced. The performance on a symmetric multiprocessor system, using Java multithreading, is investigated by solving characteristic arrow-type linear systems and numerical results are given, considering the parallel performance of the construction of the optimized approximate inverse and the explicit preconditioned generalized conjugate gradient square scheme.

Introduction

Sparse matrix computations, which have inherent parallelism, are of central importance, because of the applicability to real-life problems and are the most time-consuming part in computational science and engineering computations. Hence, research efforts were focused on the production of efficient parallel computational methods and related software suitable for multiprocessor systems.

An important achievement over the last decades is the appearance and use of Explicit Preconditioned Methods, cf. [5], for solving sparse linear systems, and the preconditioned form of a linear system Au = s is MAu = Ms, where M is preconditioner, cf. [5]. The preconditioner M has therefore to satisfy the following conditions: (i) MA should have a “clustered” spectrum, (ii) M can be efficiently computed in parallel and (iii) finally “M × vector” should be fast to compute in parallel, cf. [5].

Explicit Approximate Inverse Preconditioning methods, being composed mainly of linear operations between vectors and matrices, are inherently parallel and their performance when coupled with an efficiently constructed preconditioner M can be relatively easy to replicate across a broad spectrum of hardware and software platforms and languages. However, since constructing an efficient preconditioner M is generally a computationally intensive task on its own, methods for efficiently computing the preconditioner M in parallel have also been the object of research and investigation, usually presenting greater design difficulties than the preconditioned conjugate gradient type methods.

For the development of parallel programs, the scientific computing community predominantly uses parallel extensions to C or FORTRAN, usually via well-established standards such as OpenMP and MPI. While those standards are beyond doubt well-documented, functional and reliable, the targeted languages themselves lack any direct multithreading support at a functions/library level, which is instead provided entirely by third party tools or thread programming models such as POSIX threads. While this is usually not a problem since documentation and standardization is more than sufficient, there are still issues with portability and specific implementations’ issues and peculiarities.

The Java programming language has been designed with built-in multithreading support, providing at least the most basic thread creation and management functionalities in its standard libraries, under most supported platforms. Other points in favour of Java as a generic use programming language are its C-like syntax and relative ease of use, which is generally considered to render software development faster and less error prone, as well as its portable nature. Last but not least, the language’s program execution performance is now generally considered to be comparable with traditional compiled languages, due to advancements in Just-In-Time (JIT) compilation technologies, so that even the once significant performance gap is slowly becoming less of an issue, cf. [1]. Recently research has been focused on subtler performance issues of Java such as data locality, garbage collection and JIT compilation details under modern operating systems, indicating that the language and its actual execution environments are mature enough to start examining performance minutiae instead of macroscopic performance issues, cf. [11].

One of the few, perhaps, issues that still prevent Java from gaining widespread acceptance as a parallel computing tool is the lack of officially defined standards and APIs such as transparent parallel Software Development Kits (SDKs), automatic loop parallelization or message passing, equivalent to OpenMP or MPI. While there have been attempts at constructing Java equivalents, for example the OpenMP-like JOMP, cf. [2], or distributed environment extensions to standard Java, using either MPI-compatible or ad hoc Remote Method Invocation (RMI) interfaces, cf. [13], or even combinations of all of the above, cf. [14], those projects ended up creating API and language divergences, “reinventing the wheel”, and virtually none of them has ever gained sufficient support and acceptance to be considered a well-established standard for Java developers to rely upon. Thus, most parallel programming in Java is still done manually, even though Java’s most recent versions (starting from version 1.5 and above) provide some relatively advanced multithreading tools, partly making up for the lack of a complete and universally accepted parallel SDK for Java.

In Section 2, approximate inverse matrix algorithms in conjunction with the explicit preconditioned generalized conjugate gradient square scheme, for the solution of sparse arrow-type linear systems are presented. In Section 3, a new parallel approximate inverse matrix algorithm for symmetric multiprocessor systems is given. The implementation details of the proposed parallel method, using Java multithreading, are presented in Section 4. Finally, in Section 5 the performance of the parallel approximate inverse matrix techniques and the parallel explicit preconditioned conjugate gradient type method is illustrated by solving characteristic sparse arrow-type linear systems on a symmetric multiprocessor system and numerical results are given.

Section snippets

Approximate inverse preconditioning

Let us consider the numerical solution of arrow-type linear systems, i.e., $Au = s,$ where A is a sparse arrow-type (n × n) matrix of the following form: $.$

Arrow-type matrices occur in practice, cf. [5], for example, in the course of the Lanczos method for solving the eigenvalue problem for large sparse matrices, in the eigenstructure problems of arrowhead matrices which arise from applications in molecular physics, and in the application of the finite element or finite difference method over a region by

Parallel approximate inverse matrix computations

A Parallel ANti Diagonal Banded Approximate Inverse Arrow-type Matrix algorithm (PANDBAIATM), based on Arrow-type Approximate LU-type Factorization algorithm has been proposed, cf. [8], for solving arrow-type linear systems. In this section an improved parallel optimized approximate inverse algorithm for solving arrow-type linear using Java multithreading techniques will be presented.

The most important challenges encountered when computing this form of approximate inverse in parallel are its

Java multithreading techniques

In the next subsections, a brief introduction to Java’s thread execution mechanisms will be given, as well as an overview of the adopted problem decomposition methods into Java threads, elements which are intimately tied to Java’s Object Oriented paradigm. Finally, an overview of the development environment and libraries used is also provided.

Numerical results

In this section the applicability and effectiveness of the parallel approximate inverse and the parallel preconditioned generalized conjugate gradient square scheme for solving characteristic arrow-type linear system are examined.

The parallel numerical results were obtained on a Fujitsu Primergy S200 workstation with 2 GB of PC-2700 DDR RAM, twin Intel XEON 2.80 GHz CPUs with the Hyper-Threading functionality enabled and fully supported at a hardware, operating system and runtime environment

Conclusions

The design of parallel explicit approximate inverses and conjugate gradient type schemes results in efficient parallel methods for solving arrow-type linear systems on symmetric multiprocessor systems, which can be efficiently ported to a variety of programming languages and parallel environments including standard Java. Despite the Java programming language’s intended application context and the test platform’s hardware limitations, relatively high speedups and efficiencies were obtained.

References (14)

J. Bull, L. Smith, L. Pottage, R. Freeman, Benchmarking Java against C and Fortran for Scientific Applications, in: ACM...
M. Bull, M.E. Kambites, Jomp – an OpenMP-like interface for Java, in: Proceedings of the 2000 ACM Java Grande...
M. Bull, S. Telford, Programming Models for Parallel Java Applications, UKHEC Newsletter, issue 2, p. 9, November...
T.W. Christofer et al.
High-Performance Java Platform Computing
(2000)
G.A. Gravvanis
Explicit isomorphic iterative methods for solving arrow-type linear systems
Int. J. Comput. Math.
(2000)
G.A. Gravvanis, Solving parabolic and nonlinear 1D problems with periodic boundary conditions, in: CD-ROM Proceedings...
G.A. Gravvanis
Parallel preconditioned algorithms for solving special tridiagonal systems
(1999)

There are more references available in the full text version of this article.

Cited by (9)

Optimal server resource reservation policies for priority classes of users under cyclic non-homogeneous markov modeling
2009, European Journal of Operational Research
Resource availability optimization is studied on a server–client system where different users are partitioned into priority classes. The aim is to provide higher resource availability according to the priority of each class. For this purpose, resource reservation is modeled by a homogeneous continuous time Markov chain (CTMC), but also by a cyclic non-homogeneous Markov chain (CNHMC) as there is a cyclic behavior of the users’ requests for resources. The contribution of the work presented consists in the formulation of a multiobjective optimization problem for both the above cases that aims to determine the optimal resource reservation policy providing higher levels of resource availability for all classes. The optimization problem is solved either with known methods or with a proposed kind of heuristic algorithm. Finally, explicit generalized approximate inverse preconditioning methods are adopted for solving efficiently sparse linear systems that are derived, in order to compute resource availability.
A Multi-Threaded Cuckoo Search Algorithm for the Capacitated Vehicle Routing Problem
2020, ACM International Conference Proceeding Series
Software rejuvenation and resource reservation policies for optimizing server resource availability using cyclic nonhomogeneous Markov chains
2013, Applied Stochastic Models in Business and Industry
Performance evaluation of an irregular application parallelized in Java
2010, Proceedings of the International Conference on Parallel Processing Workshops
Implementation of a multithreaded branch and bound algorithm for permutation flowshop problems
2010, Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Systems
High performance inverse preconditioning
2009, Archives of Computational Methods in Engineering

View all citing articles on Scopus

View full text

On the performance of parallel approximate inverse preconditioning using Java multithreading techniques

Abstract

Introduction

Section snippets

Approximate inverse preconditioning

Parallel approximate inverse matrix computations

Java multithreading techniques

Numerical results

Conclusions

High-Performance Java Platform Computing

Explicit isomorphic iterative methods for solving arrow-type linear systems

Int. J. Comput. Math.

Parallel preconditioned algorithms for solving special tridiagonal systems