Direct approaches to exploit many-core architecture in bioinformatics

doi:10.1016/j.future.2012.03.018

Future Generation Computer Systems

Volume 29, Issue 1, January 2013, Pages 15-26

https://doi.org/10.1016/j.future.2012.03.018 Get rights and content

Abstract

Current trends in computer programming look for solutions in the challenging task of porting and optimizing existing algorithms to many-core architectures with tens of Central Processing Units (CPUs). Yet, the lack of standardized general-purpose parallel programming and porting methodologies represents the main bottleneck on these developments. We have focused on bioinformatics applied to genomics in general and the so-called “Next-Generation” Sequencing (NGS) in particular, in order to study the viability and cost of porting and optimizing well known algorithms to a many-core architecture. Three different methods are tackled in order to implement existing algorithms in Tile64, corresponding to a microprocessor containing 64 CPUs, each of them being capable of executing an independent Linux operating system. Three different approaches have been explored: (i) implementation of the Needleman–Wunsch/Smith–Waterman pairwise aligner from scratch; (ii) direct translation of the Message Passing Interface (MPI) C++ ABySS assembly algorithm with changes on the communication layer; and (iii) migration of the ClustalW tool, parallelizing only the most time-consuming stage. The performance-gain/development-cost tradeoffs indicate that the Tile64 microprocessor has the potential to increase the performance of bioinformatics in an unprecedented way for a standalone Personal Computer (PC). Yet, the effective exploitation of these parallel implementations requires a detailed understanding of the peculiar many-core characteristics when migrating previous non-parallel source codes.

Highlights

► Computing power of the Tile64 many-core microprocessor can be exploited for NGS bioinformatics tasks. ► Tile64 many-core CPU architecture works as a cluster of pico-computers, as with the MC64-NW/SW algorithm. ► MC64-ClustalW shows an important performance improvement with a minor development effort. ► MC64-ABySS reveals that a MPI-like efficient API for Tile64 is essential to port successfully most of the existing parallel code. ► Wide-spreading of many-core CPU technologies could lead to a new paradigm in programming methodologies in the next years.

Introduction

Nowadays, high-performance processing cannot be understood without new Chip Multiprocessors (CMPs), which are being actively developed. Amongst such chips are the Graphics Processing Units (GPUs) with hundreds of cores [1] and the Sony–IBM–Toshiba Cell Broadband Engine Architecture (CBEA) [2], which allow us to render complex animations and provide as well enough computing power to perform other calculus-intensive tasks [3]. This line is exploited by supercomputing blade systems like the IBM BladeCenter server platform [4] and the Nvidia Tesla [5]. Yet, such products require detailed programming methodologies and architecture optimizations, which are much more complex than the ones of the single Central Processing Units (CPUs) [6], [7]. A comparison between these different programming approaches and an attempt to automate such processes can be found at [8]. Other companies are working in CMP and multithreaded many-core microprocessors, like the Intel Tera-scale Processor with 80 cores [9], Single-chip Cloud Computing (SCC) [10] with 48 cores, the “Knights Ferry” [11] with 32 cores, the Xeon E7 with 10 cores and two threads per core [12], the Sun Microsystems UltraSPARC T2 Pro with eight cores and eight threads per core [13], the Adapteva Epiphany IV (64 cores) [14] and the Tilera Tile64 microprocessor, as explained below.

The TilExpress-20G cards include a many-core Tile64 microprocessor with 64 tiles (cores) at 866 MHz, 8 GB of RAM and two 10 GB Ethernet ports. Though they were initially designed for networking applications, video encoding [15] and streaming broadcasts (thanks to their high communication bandwidth and scalability), we have already demonstrated the usefulness of such architecture for bioinformatics [16], [17]. We are applying such developments for quality control, traceability and fraud prevention of olive oil [18], as well a other genomics approaches of our research group on Agri-Food Biotechnology.

Other works have focused on fine-tuning intensive processing bioinformatics algorithms to platforms like GPU [19], [20], [21] or the above mentioned CBEA [22], obtaining better performances than with single CPU implementations, as a result of their parallelization factor. The main difference between these developments and the present work is that Tile64 actually contains many CPUs (in the sense that each one of them is able to execute a standalone operating system). In contrast, the so-called many-core GPU contains very restricted processing units, unable to execute a whole complex algorithm on their own. Thus, many of the bioinformatics algorithms ported to the GPU architectures show extraordinary speed-ups but, due to the restricted resources, usually only work well with some specific data, sharply decreasing their performance when applied to a wider range of input data. For instance, in the case of sequence alignments, only short sequences (typically, peptides) are allowed, precluding their application for larger projects involving nucleic acid data (e.g., genomics). On the other hand, the CBEA integrates two kinds of microprocessors: one Power Processing Element (PPE) and eight Synergistic Processing Elements (SPE); whereas Tile64 contains a homogenous matrix of 8×8 tiles. This architecture offers the opportunity to evaluate such a system to evaluate the effort that must be dedicated to obtain a working parallelized algorithm on it, and the achieved performance according to the development and code migration approach used, as explained below. Indeed, this work focuses more on evaluating the possible migration approaches, and less on getting the maximum possible performance (which in any case may require a more time-consuming approach).

In order to further evaluate the suitability of the Tile64 architecture in the field of bioinformatics, we have tested three commonly used algorithms by life-science researchers: (i) pairwise sequence alignments: (ii) multiple sequence alignments; and (iii) de novo genome assembly. Taking into account the Tile64 software development characteristics [16], three different approaches regarding porting efforts have been considered; namely: (i) an implementation from scratch of a dynamic programming algorithm like the Fast Linear-Space Alignment (FastLSA) [23], with a parallel strategy for pairwise alignments; (ii) a half-way solution, where a customized communication layer replaces the Message Passing Interface (MPI) in the Assembly By Short Sequences (ABySS) algorithm [24]; and (iii) a slightly modified implementation of a parallel multiple sequence aligner (ClustalW) [25]. Each proposal is discussed in terms of the trade-offs between the development efforts and the achieved performance. Such strategy allows noting several mandatory principles for efficient many-core developments.

On the other hand, the term many-core is used as a synonym of many-core CPU and this should not be confused with many-core and General-Purpose GPU (GPGPU). To avoid confusions, the manufacturers of many-core CPUs have coined a new term: tile. This term relates to the geometry that can be visually observed in the die of the chip; it is a reminiscence of how a tile contributes to complete a mosaic. The shape of the die provides a preliminary idea of how different are the many-core CPUs vs the GPUs. Table 1 shows the main differences between these technologies from a programming and performance point of view. The last row of the table visually notes these differences by comparing die-shots from these platforms. This may help to understand that the parallelization strategies are indeed drastically different. Thus, GPGPU cores are specialized in executing small threads rather than huge tasks (such as a whole operating system) and this simplicity allows an easier integration of hundreds of these computing elements in a single die; whereas the many-core CPUs have just tens of tiles nowadays. So then, the strategy when porting algorithms to GPUs is to generate as much execution threads as possible, in order to maximize the number of cores working at a given time.

These large differences among many-core CPUs and GPUs make very difficult any comparison between the performance and capabilities between such architectures. Actually, they should be considered as complementary rather than competitors, as is noted in Table 1. Tilera’s approach is not the only one following this new tile-based computing paradigm. Intel has joined the bandwagon of this technology with Single-chip Cloud Computing (SCC) [10], a platform with hardware-based message-passing capabilities and a whole operating system running in each tile. Nevertheless, Intel’s platform is nowadays in an academic and research stage, Tilera’s platform being the only one commercially available. Another example is the Adapteva Epiphany IV chip, which should be available by later 2012.

Section snippets

The Tile64 microprocessor

The Tilera microprocessors are available as standalone chips or in System-on-Chip (SoC) many-core Peripheral Component Interconnect express (PCIe) cards. Each of the 64 tiles integrated into a single Tile64 microprocessor die is capable of running a full operating system, as previously indicated, managing several independent threads [26]. Each tile is a 32-bit Reduced Instruction Set Computing (RISC) machine (with no floating point instructions) running at 500–866 MHz, with an exclusive 8 kB L1

Porting applications to the many-core architecture

Migrating existing applications to a new platform or architecture is a common task in software engineering. Several approaches have been proposed over the years [28] in order to accomplish this mission with the best results and the least possible effort, both in scientific [29] and management applications [30]. Bioinformatics is not an exception, especially when new ways of parallel execution are explored or new architectures (like GPGPU) are used [31]. Indeed, new programming methodologies and

Development from scratch

The algorithm selected in this first approach is the FastLSA [23] pairwise global aligner. From a bioinformatics point of view, the goal of an alignment algorithm is to identify similar and discrepant regions of DNA, RNA or peptide (e.g., protein) sequences. The alignment is called pairwise if the goal is to find the best match among just two sequences, being named multiple if more than two sequences are involved in such alignment. The FastLSA is a variant of the Needleman–Wunsch (NW) algorithm

Migration with changes only in the communication layer

In this approach, the popular ABySS algorithm [24] was chosen to be migrated to the Tile64 platform with the least possible changes. This is a de novo sequence parallel assembler which uses an MPI [48]. As Tilera does not provide any kind of MPI implementation, we have developed an ad hoc MPI-like middleware to migrate the open source code of ABySS to the Tile64 platform. This middleware satisfies the minimal requirements to execute ABySS.

Direct porting with some optimizations

The chosen algorithm in this approach was ClustalW, being one of the most used multiple sequence aligners. The goal of a multiple alignment is to find similarities and differences in a set of sequences, so that evolutionary relationships can be established, including the generation of phylogenetic trees. Likewise, the polymorphisms in the sequences can be identified, which can be useful, for instance, to design specific molecular markers for DNA, RNA or peptide fingerprinting.

Discussion

The Tile64 microprocessors of the Tilera cards are the first true many-core general-purpose chips commercially available. Such architecture has a clear potential for bioinformatics as previously reported [16]. But obviously, the practical usefulness of the Tile64 parallelization also depends on the particular algorithm used, as demonstrated in the present work. In general, and not surprisingly, it is needed to thoroughly optimize the code to improve the global performance, as with other chips.

Conclusions and future prospects

The ideal scenario when working with a new bioinformatics parallel platform (hardware) would be not only obtaining good results in terms of performance, but also getting it up and running as soon as possible, which implies the ability of executing existing programs with as few modifications as possible. The knowledge and experience empirically gained with the straightforward algorithm migrations presented in this paper allow us to draw the following conclusions about the use of many-core

Acknowledgments

The authors thank Tilera (http://www.tilera.com) for providing hardware and software tools. This work was supported by “Ministerio de Ciencia e Innovación” (MICINN grants AGL2010-17316, BIO2009-07443 and BIO2011-15237); “Consejería de Agricultura y Pesca” of “Junta de Andalucía” (041/C/2007, 75/C/2009 & 56/C/2010); “Grupo PAI” (AGR-248); and “Universidad de Córdoba” (“Ayuda a Grupos”), Spain.

Francisco J. Esteban is an Analyst at the Informatics Service of Cordoba University (Spain). His research interests include network protocols, grid computing, parallel programming and bioinformatics. He earned his degree in Telecommunications Engineering at Madrid Polytechnic University (Spain) in 1991.

References (68)

A. Ruiz et al.
Recognition of circular patterns on GPUs: Performance analysis and contributions
Journal of Parallel and Distributed Computing
(2008)
J. Kurzak et al.
Optimizing matrix multiplication for a short-vector SIMD architecture — CELL processor
Parallel Computing
(2009)
D. Díaz et al.
Parallelizing and optimizing a bioinformatics pairwise sequence alignment algorithm for many-core architecture
Parallel Computing
(2011)
C. Trapnell et al.
Optimizing data intensive GPGPU computations for DNA sequence alignment
Parallel Computing
(2009)
H. Zaidi et al.
Implementation of an environment for Monte Carlo simulation of fully 3-D positron tomography on a high-performance parallel platform
Parallel Computing
(1998)
B. Schmidt et al.
A hybrid architecture for bioinformatics
Future Generation Computer Systems
(2002)
S. White et al.
Performance of the NAS Parallel Benchmarks on PVM-based Networks
Journal of Parallel and Distributed Computing
(1995)
V. Sachdeva et al.
Exploring the viability of the Cell Broadband Engine for bioinformatics applications
Parallel Computing
(2008)
S.B. Needleman et al.
A general method applicable to the search for similarities in the amino acid sequence of two proteins
Journal of Molecular Biology
(1970)
O. Gotoh
An improved algorithm for matching biological sequences
Journal of Molecular Biology
(1982)

M. Mirto et al.

A bioinfomatics grid alignment toolkit

Future Generation Computer Systems

(2008)

J.D. Owens, D. Luebke, N. Govindaraju, M. Harris, J. Krüger, A.E. Lefohn, T.J. Purcell, A survey of general-purpose...

J.A. Kahle et al.

Introduction to the Cell multiprocessor

IBM Journal of Research and Development

(2005)

T. Chen et al.

Cell broadband engine architecture and its first implementation: a performance view

IBM Journal of Research and Development

(2007)

A.K. Nanda et al.

Cell/B.E. blades: building blocks for scalable, real-time, interactive, and digital media servers

IBM Journal of Research and Development

(2007)

E. Lindholm et al.

NVIDIA Tesla: a unified graphics and computing architecture

IEEE Micro

(2008)

T.H. Beach, N.J. Avis, An intelligent semi-automatic application porting system for application accelerators, presented...

T.G. Mattson, R.V.D. Wijngaart, M. Frumkin, Programming the Intel 80-core network-on-a-chip Terascale processor,...

Intel. (2010, 2010-10-31). The SCC Platform Overview....

Intel. (2010, 2010-10-31). Intel’s Teraflops Research Chip....

Intel. (2011, 2011-04-26). Intel Xeon Processor E7 family delivers record-breaking performance, new security,...

M. Shah, J. Barreh, J. Brooks, R. Golla, G. Grohoski, N. Gura, R. Hetherington, P. Jordan, M. Luttrell, C. Olson, B....

Adapteva. (2011, 2011/10/4). Epihany Multicore IP....

W. Flohr, Implementation of and MPEG Codec on the Tilera 64 Processor, Department of Electrical and Systems...

S. Gálvez et al.

Next-generation bioinformatics: using many-core processor architecture to develop a web service for sequence alignment

Bioinformatics

(2010)

G. Besnard et al.

Genomic profiling of plastid DNA variation in the Mediterranean olive tree

BMC Plant Biology

(2011)

S.A. Manavski et al.

CUDA compatible GPU cards as efficient hardware accelerators for Smith–Waterman sequence alignment

BMC Bioinformatics

(2008)

L. Ligowski, W. Rudnicki, An efficient implementation of Smith–Waterman algorithm on GPU using CUDA, for massively...

M.S. Farrar, (2010, 2010-08-31). Optimizing Smith–Waterman for the cell broadband engine....

A. Driga et al.

FastLSA: a fast, linear-space, parallel and sequential algorithm for sequence alignment

Algorithmica

(2006)

J.T. Simpson et al.

ABySS: a parallel assembler for short read sequence data

Genome Research

(2009)

M.A. Larkin et al.

ClustalW and clustal X version 2.0

Bioinformatics

(2007)

S. Bell, B. Edwards, J. Amann, R. Conlin, K. Joyce, V. Leung, J. MacKay, M. Reif, L. Bao, J. Brown, M. Mattina, C.-C....

J. des Rivieres et al.

Eclipse: a platform for integrating development tools

IBM Systems Journal

(2004)

Cited by (8)

Nucleic-acid sequencing
2019, Encyclopedia of Biomedical Engineering
The genotype (structural genomics) of non-cellular entities, single cells and multicellular organisms determines their phenotype, which in turn can be potentially beneficial or dangerous from an anthropomorphic and biotechnological point of view. The sequencing of nucleic acids was accomplished after many trials, using different methods that were initially quite laborious, involving time-consuming steps and the use of toxic substances, including radioisotopes. They were later on improved, using non-toxic chemicals, automation, miniaturization and high throughput equipment. The methodological milestones during the development of the nucleic-acid sequencing platforms are exciting; the most interesting part of them being the significant implications of such breakthroughs for biotechnology and thus human development and welfare, besides the advancement of science, protection of the biodiversity and environment, fight against the climate change and global warming, etc. The different nucleic-acid sequencing technologies were sometimes referred as ‘next-generation’ and even ‘next-next-generation.’ To avoid such semantic ambiguity and confusion, the nucleic-acid sequencing methodologies are divided into three categories in this review (namely, first-, second- and third- generation sequencing). In summary, all these scientific and technological developments involving nucleic-acid sequencing will allow the genome and transcriptome to become the ultimate molecular marker in biotechnology in general and biomedical sciences in particular.
Parallel protein multiple sequence alignment approaches: a systematic literature review
2023, Journal of Supercomputing
BLVector: Fast BLAST-Like Algorithm for Manycore CPU With Vectorization
2021, Frontiers in Genetics
MC64-Cluster: Many-Core CPU Cluster Architecture and Performance Analysis in B-Tree Searches
2018, Computer Journal
Parallel optimal pairwise biological sequence comparison: Algorithms, platforms, and classification
2016, ACM Computing Surveys
MC64-ClustalWP2: A highly-parallel hybrid strategy to align multiple sequences in many-core architectures
2014, PLoS ONE

View all citing articles on Scopus

David Díaz is a graduate student at the Department of Languages and Computer Science of Malaga University (Spain). His research interests include parallel programming, bioinformatics application development and optimization, bioinformatics service integration and computer architectures. He earned his BS degree in Technical Engineering in Computer Systems at Malaga University in 2008.

Pilar Hernández is a Tenured Scientist at the Institute for Sustainable Agriculture (IAS) of the Spanish Council for Scientific Research (CSIC) at Cordoba (Spain). Her research interests include exploring the possibilities of parallel computing on the analysis of next-generation sequencing data. She is currently a member of the editorial boards of the journals ‘Hereditas’ and ‘International Journal of Plant Genomics’. She is also a member of the Coordinating Committee of the International Wheat Genome Sequencing Consortium (IWGSC; www.wheatgenome.org) and the Coordination Committee of the European Triticeae Genomics Initiative (ETGI; http://www.etgi.org). She received her Agricultural Engineering degree (1993) and her Ph.D. (1998) degree from Cordoba University.

Juan A. Caballero is a Professor of Statistics at the Department of Statistics and Operational Research of Cordoba University (Spain), where he is Vice-chancellor for Information and Communications Technologies (ICT). His research and teaching interest include statistical simulation and free-method distribution. He has been elected member of the Conference of Chancellors of Spanish Universities (CRUE; http://www.crue.org). He received his Ph.D. degree in Mathematics in 1991 from Granada University (Spain).

Gabriel Dorado is a Tenured Full Professor at the Department of Biochemistry and Molecular Biology of Cordoba University (Spain). He received the degrees of Bachelor of Science (1983), Master of Science (1983) and Ph.D. (1986) on Biology from Cordoba University. His research interests focus on both improving the university teaching, as well as using molecular biology and bioinformatics tools to address biotechnology challenges. He is leader of three teams: two of them to improve lecturing, and one for biotechnology. His current $h$ -index is 18, with 103 entries indexed by the Web of Knowledge (Thomson Reuters). He is editor of three books (one published in 2009 about biotechnology and two in 2012 about lecturing and molecular biology).

Sergio Gálvez is a Professor at the Languages and Computer Sciences Department of Malaga University (Spain). His research interests include optimization of algorithms applied to bioinformatics, bioinformatics services, integration, and parallelization of algorithms for many-core architectures. He works actively in the regional technological community, spreading the Java and Oracle platforms. He is as well author of many lectures and books related to these technologies. He is assessor and co-founder of the G2CREA Collaborative Innotechnologies Company, established in the Technology Park of Andalusia (Spain). He received his MS (1995) and Ph.D. (2000) degrees in Computer Science from Malaga University (Spain).

¹: Tel.: +34 957213005; fax: +34 957218116.

²: Tel.: +34 952133312; fax: +34 952131397.

³: Tel.: +34 957499277; fax: +34 957499252.

⁴: Tel.: +34 957211068; fax: +34 957218116.

⁵: Tel.: +34 957218689; fax: +34 957218592.

⁶: Authors who contributed to the project leadership.

View full text

Direct approaches to exploit many-core architecture in bioinformatics

Abstract

Highlights

Introduction

Section snippets

The Tile64 microprocessor

Porting applications to the many-core architecture

Development from scratch

Migration with changes only in the communication layer

Direct porting with some optimizations

Discussion

Conclusions and future prospects

Acknowledgments

Journal of Parallel and Distributed Computing

Parallel Computing

Parallel Computing

Parallel Computing

Parallel Computing

Future Generation Computer Systems

Journal of Parallel and Distributed Computing

Parallel Computing

Journal of Molecular Biology

Journal of Molecular Biology

Future Generation Computer Systems

Introduction to the Cell multiprocessor

IBM Journal of Research and Development

Cell broadband engine architecture and its first implementation: a performance view

IBM Journal of Research and Development

Cell/B.E. blades: building blocks for scalable, real-time, interactive, and digital media servers

IBM Journal of Research and Development

NVIDIA Tesla: a unified graphics and computing architecture

IEEE Micro

Next-generation bioinformatics: using many-core processor architecture to develop a web service for sequence alignment

Bioinformatics

Genomic profiling of plastid DNA variation in the Mediterranean olive tree

BMC Plant Biology

CUDA compatible GPU cards as efficient hardware accelerators for Smith–Waterman sequence alignment

BMC Bioinformatics

FastLSA: a fast, linear-space, parallel and sequential algorithm for sequence alignment

Algorithmica

ABySS: a parallel assembler for short read sequence data

Genome Research

ClustalW and clustal X version 2.0

Bioinformatics

Eclipse: a platform for integrating development tools

IBM Systems Journal