Direct approaches to exploit many-core architecture in bioinformatics
Highlights
► Computing power of the Tile64 many-core microprocessor can be exploited for NGS bioinformatics tasks. ► Tile64 many-core CPU architecture works as a cluster of pico-computers, as with the MC64-NW/SW algorithm. ► MC64-ClustalW shows an important performance improvement with a minor development effort. ► MC64-ABySS reveals that a MPI-like efficient API for Tile64 is essential to port successfully most of the existing parallel code. ► Wide-spreading of many-core CPU technologies could lead to a new paradigm in programming methodologies in the next years.
Introduction
Nowadays, high-performance processing cannot be understood without new Chip Multiprocessors (CMPs), which are being actively developed. Amongst such chips are the Graphics Processing Units (GPUs) with hundreds of cores [1] and the Sony–IBM–Toshiba Cell Broadband Engine Architecture (CBEA) [2], which allow us to render complex animations and provide as well enough computing power to perform other calculus-intensive tasks [3]. This line is exploited by supercomputing blade systems like the IBM BladeCenter server platform [4] and the Nvidia Tesla [5]. Yet, such products require detailed programming methodologies and architecture optimizations, which are much more complex than the ones of the single Central Processing Units (CPUs) [6], [7]. A comparison between these different programming approaches and an attempt to automate such processes can be found at [8]. Other companies are working in CMP and multithreaded many-core microprocessors, like the Intel Tera-scale Processor with 80 cores [9], Single-chip Cloud Computing (SCC) [10] with 48 cores, the “Knights Ferry” [11] with 32 cores, the Xeon E7 with 10 cores and two threads per core [12], the Sun Microsystems UltraSPARC T2 Pro with eight cores and eight threads per core [13], the Adapteva Epiphany IV (64 cores) [14] and the Tilera Tile64 microprocessor, as explained below.
The TilExpress-20G cards include a many-core Tile64 microprocessor with 64 tiles (cores) at 866 MHz, 8 GB of RAM and two 10 GB Ethernet ports. Though they were initially designed for networking applications, video encoding [15] and streaming broadcasts (thanks to their high communication bandwidth and scalability), we have already demonstrated the usefulness of such architecture for bioinformatics [16], [17]. We are applying such developments for quality control, traceability and fraud prevention of olive oil [18], as well a other genomics approaches of our research group on Agri-Food Biotechnology.
Other works have focused on fine-tuning intensive processing bioinformatics algorithms to platforms like GPU [19], [20], [21] or the above mentioned CBEA [22], obtaining better performances than with single CPU implementations, as a result of their parallelization factor. The main difference between these developments and the present work is that Tile64 actually contains many CPUs (in the sense that each one of them is able to execute a standalone operating system). In contrast, the so-called many-core GPU contains very restricted processing units, unable to execute a whole complex algorithm on their own. Thus, many of the bioinformatics algorithms ported to the GPU architectures show extraordinary speed-ups but, due to the restricted resources, usually only work well with some specific data, sharply decreasing their performance when applied to a wider range of input data. For instance, in the case of sequence alignments, only short sequences (typically, peptides) are allowed, precluding their application for larger projects involving nucleic acid data (e.g., genomics). On the other hand, the CBEA integrates two kinds of microprocessors: one Power Processing Element (PPE) and eight Synergistic Processing Elements (SPE); whereas Tile64 contains a homogenous matrix of 8×8 tiles. This architecture offers the opportunity to evaluate such a system to evaluate the effort that must be dedicated to obtain a working parallelized algorithm on it, and the achieved performance according to the development and code migration approach used, as explained below. Indeed, this work focuses more on evaluating the possible migration approaches, and less on getting the maximum possible performance (which in any case may require a more time-consuming approach).
In order to further evaluate the suitability of the Tile64 architecture in the field of bioinformatics, we have tested three commonly used algorithms by life-science researchers: (i) pairwise sequence alignments: (ii) multiple sequence alignments; and (iii) de novo genome assembly. Taking into account the Tile64 software development characteristics [16], three different approaches regarding porting efforts have been considered; namely: (i) an implementation from scratch of a dynamic programming algorithm like the Fast Linear-Space Alignment (FastLSA) [23], with a parallel strategy for pairwise alignments; (ii) a half-way solution, where a customized communication layer replaces the Message Passing Interface (MPI) in the Assembly By Short Sequences (ABySS) algorithm [24]; and (iii) a slightly modified implementation of a parallel multiple sequence aligner (ClustalW) [25]. Each proposal is discussed in terms of the trade-offs between the development efforts and the achieved performance. Such strategy allows noting several mandatory principles for efficient many-core developments.
On the other hand, the term many-core is used as a synonym of many-core CPU and this should not be confused with many-core and General-Purpose GPU (GPGPU). To avoid confusions, the manufacturers of many-core CPUs have coined a new term: tile. This term relates to the geometry that can be visually observed in the die of the chip; it is a reminiscence of how a tile contributes to complete a mosaic. The shape of the die provides a preliminary idea of how different are the many-core CPUs vs the GPUs. Table 1 shows the main differences between these technologies from a programming and performance point of view. The last row of the table visually notes these differences by comparing die-shots from these platforms. This may help to understand that the parallelization strategies are indeed drastically different. Thus, GPGPU cores are specialized in executing small threads rather than huge tasks (such as a whole operating system) and this simplicity allows an easier integration of hundreds of these computing elements in a single die; whereas the many-core CPUs have just tens of tiles nowadays. So then, the strategy when porting algorithms to GPUs is to generate as much execution threads as possible, in order to maximize the number of cores working at a given time.
These large differences among many-core CPUs and GPUs make very difficult any comparison between the performance and capabilities between such architectures. Actually, they should be considered as complementary rather than competitors, as is noted in Table 1. Tilera’s approach is not the only one following this new tile-based computing paradigm. Intel has joined the bandwagon of this technology with Single-chip Cloud Computing (SCC) [10], a platform with hardware-based message-passing capabilities and a whole operating system running in each tile. Nevertheless, Intel’s platform is nowadays in an academic and research stage, Tilera’s platform being the only one commercially available. Another example is the Adapteva Epiphany IV chip, which should be available by later 2012.
Section snippets
The Tile64 microprocessor
The Tilera microprocessors are available as standalone chips or in System-on-Chip (SoC) many-core Peripheral Component Interconnect express (PCIe) cards. Each of the 64 tiles integrated into a single Tile64 microprocessor die is capable of running a full operating system, as previously indicated, managing several independent threads [26]. Each tile is a 32-bit Reduced Instruction Set Computing (RISC) machine (with no floating point instructions) running at 500–866 MHz, with an exclusive 8 kB L1
Porting applications to the many-core architecture
Migrating existing applications to a new platform or architecture is a common task in software engineering. Several approaches have been proposed over the years [28] in order to accomplish this mission with the best results and the least possible effort, both in scientific [29] and management applications [30]. Bioinformatics is not an exception, especially when new ways of parallel execution are explored or new architectures (like GPGPU) are used [31]. Indeed, new programming methodologies and
Development from scratch
The algorithm selected in this first approach is the FastLSA [23] pairwise global aligner. From a bioinformatics point of view, the goal of an alignment algorithm is to identify similar and discrepant regions of DNA, RNA or peptide (e.g., protein) sequences. The alignment is called pairwise if the goal is to find the best match among just two sequences, being named multiple if more than two sequences are involved in such alignment. The FastLSA is a variant of the Needleman–Wunsch (NW) algorithm
Migration with changes only in the communication layer
In this approach, the popular ABySS algorithm [24] was chosen to be migrated to the Tile64 platform with the least possible changes. This is a de novo sequence parallel assembler which uses an MPI [48]. As Tilera does not provide any kind of MPI implementation, we have developed an ad hoc MPI-like middleware to migrate the open source code of ABySS to the Tile64 platform. This middleware satisfies the minimal requirements to execute ABySS.
Direct porting with some optimizations
The chosen algorithm in this approach was ClustalW, being one of the most used multiple sequence aligners. The goal of a multiple alignment is to find similarities and differences in a set of sequences, so that evolutionary relationships can be established, including the generation of phylogenetic trees. Likewise, the polymorphisms in the sequences can be identified, which can be useful, for instance, to design specific molecular markers for DNA, RNA or peptide fingerprinting.
Discussion
The Tile64 microprocessors of the Tilera cards are the first true many-core general-purpose chips commercially available. Such architecture has a clear potential for bioinformatics as previously reported [16]. But obviously, the practical usefulness of the Tile64 parallelization also depends on the particular algorithm used, as demonstrated in the present work. In general, and not surprisingly, it is needed to thoroughly optimize the code to improve the global performance, as with other chips.
Conclusions and future prospects
The ideal scenario when working with a new bioinformatics parallel platform (hardware) would be not only obtaining good results in terms of performance, but also getting it up and running as soon as possible, which implies the ability of executing existing programs with as few modifications as possible. The knowledge and experience empirically gained with the straightforward algorithm migrations presented in this paper allow us to draw the following conclusions about the use of many-core
Acknowledgments
The authors thank Tilera (http://www.tilera.com) for providing hardware and software tools. This work was supported by “Ministerio de Ciencia e Innovación” (MICINN grants AGL2010-17316, BIO2009-07443 and BIO2011-15237); “Consejería de Agricultura y Pesca” of “Junta de Andalucía” (041/C/2007, 75/C/2009 & 56/C/2010); “Grupo PAI” (AGR-248); and “Universidad de Córdoba” (“Ayuda a Grupos”), Spain.
Francisco J. Esteban is an Analyst at the Informatics Service of Cordoba University (Spain). His research interests include network protocols, grid computing, parallel programming and bioinformatics. He earned his degree in Telecommunications Engineering at Madrid Polytechnic University (Spain) in 1991.
References (68)
- et al.
Recognition of circular patterns on GPUs: Performance analysis and contributions
Journal of Parallel and Distributed Computing
(2008) - et al.
Optimizing matrix multiplication for a short-vector SIMD architecture — CELL processor
Parallel Computing
(2009) - et al.
Parallelizing and optimizing a bioinformatics pairwise sequence alignment algorithm for many-core architecture
Parallel Computing
(2011) - et al.
Optimizing data intensive GPGPU computations for DNA sequence alignment
Parallel Computing
(2009) - et al.
Implementation of an environment for Monte Carlo simulation of fully 3-D positron tomography on a high-performance parallel platform
Parallel Computing
(1998) - et al.
A hybrid architecture for bioinformatics
Future Generation Computer Systems
(2002) - et al.
Performance of the NAS Parallel Benchmarks on PVM-based Networks
Journal of Parallel and Distributed Computing
(1995) - et al.
Exploring the viability of the Cell Broadband Engine for bioinformatics applications
Parallel Computing
(2008) - et al.
A general method applicable to the search for similarities in the amino acid sequence of two proteins
Journal of Molecular Biology
(1970) An improved algorithm for matching biological sequences
Journal of Molecular Biology
(1982)
A bioinfomatics grid alignment toolkit
Future Generation Computer Systems
Introduction to the Cell multiprocessor
IBM Journal of Research and Development
Cell broadband engine architecture and its first implementation: a performance view
IBM Journal of Research and Development
Cell/B.E. blades: building blocks for scalable, real-time, interactive, and digital media servers
IBM Journal of Research and Development
NVIDIA Tesla: a unified graphics and computing architecture
IEEE Micro
Next-generation bioinformatics: using many-core processor architecture to develop a web service for sequence alignment
Bioinformatics
Genomic profiling of plastid DNA variation in the Mediterranean olive tree
BMC Plant Biology
CUDA compatible GPU cards as efficient hardware accelerators for Smith–Waterman sequence alignment
BMC Bioinformatics
FastLSA: a fast, linear-space, parallel and sequential algorithm for sequence alignment
Algorithmica
ABySS: a parallel assembler for short read sequence data
Genome Research
ClustalW and clustal X version 2.0
Bioinformatics
Eclipse: a platform for integrating development tools
IBM Systems Journal
Cited by (8)
Nucleic-acid sequencing
2019, Encyclopedia of Biomedical EngineeringParallel protein multiple sequence alignment approaches: a systematic literature review
2023, Journal of SupercomputingBLVector: Fast BLAST-Like Algorithm for Manycore CPU With Vectorization
2021, Frontiers in GeneticsParallel optimal pairwise biological sequence comparison: Algorithms, platforms, and classification
2016, ACM Computing Surveys
Francisco J. Esteban is an Analyst at the Informatics Service of Cordoba University (Spain). His research interests include network protocols, grid computing, parallel programming and bioinformatics. He earned his degree in Telecommunications Engineering at Madrid Polytechnic University (Spain) in 1991.
David Díaz is a graduate student at the Department of Languages and Computer Science of Malaga University (Spain). His research interests include parallel programming, bioinformatics application development and optimization, bioinformatics service integration and computer architectures. He earned his BS degree in Technical Engineering in Computer Systems at Malaga University in 2008.
Pilar Hernández is a Tenured Scientist at the Institute for Sustainable Agriculture (IAS) of the Spanish Council for Scientific Research (CSIC) at Cordoba (Spain). Her research interests include exploring the possibilities of parallel computing on the analysis of next-generation sequencing data. She is currently a member of the editorial boards of the journals ‘Hereditas’ and ‘International Journal of Plant Genomics’. She is also a member of the Coordinating Committee of the International Wheat Genome Sequencing Consortium (IWGSC; www.wheatgenome.org) and the Coordination Committee of the European Triticeae Genomics Initiative (ETGI; http://www.etgi.org). She received her Agricultural Engineering degree (1993) and her Ph.D. (1998) degree from Cordoba University.
Juan A. Caballero is a Professor of Statistics at the Department of Statistics and Operational Research of Cordoba University (Spain), where he is Vice-chancellor for Information and Communications Technologies (ICT). His research and teaching interest include statistical simulation and free-method distribution. He has been elected member of the Conference of Chancellors of Spanish Universities (CRUE; http://www.crue.org). He received his Ph.D. degree in Mathematics in 1991 from Granada University (Spain).
Gabriel Dorado is a Tenured Full Professor at the Department of Biochemistry and Molecular Biology of Cordoba University (Spain). He received the degrees of Bachelor of Science (1983), Master of Science (1983) and Ph.D. (1986) on Biology from Cordoba University. His research interests focus on both improving the university teaching, as well as using molecular biology and bioinformatics tools to address biotechnology challenges. He is leader of three teams: two of them to improve lecturing, and one for biotechnology. His current -index is 18, with 103 entries indexed by the Web of Knowledge (Thomson Reuters). He is editor of three books (one published in 2009 about biotechnology and two in 2012 about lecturing and molecular biology).
Sergio Gálvez is a Professor at the Languages and Computer Sciences Department of Malaga University (Spain). His research interests include optimization of algorithms applied to bioinformatics, bioinformatics services, integration, and parallelization of algorithms for many-core architectures. He works actively in the regional technological community, spreading the Java and Oracle platforms. He is as well author of many lectures and books related to these technologies. He is assessor and co-founder of the G2CREA Collaborative Innotechnologies Company, established in the Technology Park of Andalusia (Spain). He received his MS (1995) and Ph.D. (2000) degrees in Computer Science from Malaga University (Spain).
- 1
Tel.: +34 957213005; fax: +34 957218116.
- 2
Tel.: +34 952133312; fax: +34 952131397.
- 3
Tel.: +34 957499277; fax: +34 957499252.
- 4
Tel.: +34 957211068; fax: +34 957218116.
- 5
Tel.: +34 957218689; fax: +34 957218592.
- 6
Authors who contributed to the project leadership.