Optimal Worksharing of DNA Sequence Analysis on Accelerated Platforms

Memeti, Suejb; Pllana, Sabri; Kołodziej, Joanna

doi:10.1007/978-3-319-44881-7_14

Suejb Memeti⁵,
Sabri Pllana⁵ &
Joanna Kołodziej⁶

Part of the book series: Computer Communications and Networks ((CCN))

1513 Accesses
2 Citations

Abstract

In this chapter, we describe an optimized approach for DNA sequence analysis on a heterogeneous platform that is accelerated with the Intel Xeon Phi. Such platforms commonly comprise one or two general purpose CPUs and one (or more) Xeon Phi coprocessors. Our parallel DNA sequence analysis algorithm is based on Finite Automata and finds patterns in large-scale DNA sequences. To determine the optimal worksharing (that is, DNA sequence fractions for the host and accelerating device) we propose a solution that combines combinatorial optimization and machine learning. The objective function that we aim to minimize is the execution time of the DNA sequence analysis. We use combinatorial optimization to efficiently explore the system configuration space and determine with machine learning the near-optimal system configuration for execution of the DNA sequence analysis. We evaluate our approach empirically using real-world DNA segments of various organisms. For experimentation, we use an accelerated platform that comprises two 12-core Intel Xeon E5 CPUs and an Intel Xeon Phi 7120P accelerator with 61 cores.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DNA sequences alignment in multi-GPUs: acceleration and energy payoff

Article Open access 20 November 2018

BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs

Article Open access 11 March 2020

GenSeeK: A Novel Parallel Multiple Pattern Recognition Algorithm for DNA Sequences

References

Abraham, E., Bekas, C., Brandic, I., Genaim, S., Johnsen, E.B., Kondov, I., Pllana, S., Streit, A.: Preparing HPC applications for exascale: challenges and recommendations. In: 2015 International Conference on Network-Based Information Systems (NBiS), IEEE (2015)
Google Scholar
Albayrak, O.E., Akturk, I., Ozturk, O.: Improving application behavior on heterogeneous manycore systems through kernel mapping. Parallel Comput. 39(12), 867–878 (2013). doi:10.1016/j.parco.2013.08.011
Google Scholar
Arudchutha, S., Nishanthy, T., Ragel, R.G.: String matching with multicore CPUs: performing better with the Aho-Corasick algorithm. arXiv preprint arXiv:14031305 (2014)
Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.A.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurrency Comput.: Pract. Experience 23(2), 187–198 (2011)
Article Google Scholar
Ayguadé, E., Blainey, B., Duran, A., Labarta, J., Martínez, F., Martorell, X., Silvera, R.: Is the schedule clause really necessary in OpenMP? In: OpenMP Shared Memory Parallel Programming, pp. 147–159. Springer (2003)
Google Scholar
Bellekens, X., Andonovic, I., Atkinson, R., Renfrew, C., Kirkham, T.: Investigation of GPU-based pattern matching. In: The 14th Annual Post Graduate Symposium on the Convergence of Telecommunications, Networking and Broadcasting (PGNet2013) (PGNet2013) (2013)
Google Scholar
Benkner, S., Pllana, S., Traff, J., Tsigas, P., Dolinsky, U., Augonnet, C., Bachmayer, B., Kessler, C., Moloney, D., Osipov, V.: PEPPHER: efficient and productive usage of hybrid computing systems. Micro IEEE 31(5), 28–41 (2011)
Article Google Scholar
Brandic, I., Pllana, S., Benkner, S.: An approach for the high-level specification of QoS-aware grid workflows considering location affinity. Sci. Program. 14(3–4), 231–250 (2006)
Google Scholar
Chacón, A., Moure, J.C., Espinosa, A., Hernndez, P.: In-step FM-Index for faster pattern matching. In: Alexandrov V.N., Lees M., Krzhizhanovskaya V.V., Dongarra J., Sloot P.M.A. (eds.) ICCS, Elsevier, Procedia Computer Science, vol. 18, pp. 70–79 (2013)
Google Scholar
Chrysos, G.: Intel Xeon Phi Coprocessor-the Architecture. Intel Whitepaper (2014)
Google Scholar
Collins, F.S., Green, E.D., Guttmacher, A.E., Guyer, M.S.: A vision for the future of genomics research. Nature 422(6934), 835–847 (2003)
Article Google Scholar
Dokulil, J., Bajrovic, E., Benkner, S., Pllana, S., Sandrieser, M., Bachmayer, B.: High-level support for hybrid parallel execution of C++ applications targeting Intel Xeon Phi coprocessors. In: ICCS, Elsevier, Procedia Computer Science, vol. 18, pp. 2508–2511 (2013)
Google Scholar
Drews, F., Lichtenberg, J., Welch, L.R.: Scalable parallel word search in multicore/multiprocessor systems. J. Supercomput. 51(1), 58–75 (2010)
Article Google Scholar
Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: Ompss: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(02), 173–193 (2011)
Article MathSciNet Google Scholar
Fahringer, T., Pllana, S., Testori, J.: Teuta: tool support for performance modeling of distributed and parallel applications. Computational Science - ICCS 2004. Lecture Notes in Computer Science, vol. 3038, pp. 456–463. Springer, Berlin (2004)
Chapter Google Scholar
Farkaš, T., Kubán, P., Lucká, M.: Effective parallel multicore-optimized k-mers counting algorithm. In: SOFSEM 2016: Theory and Practice of Computer Science: 42nd International Conference on Current Trends in Theory and Practice of Computer Science, Harrachov, Czech Republic, January 23–28, 2016, pp. 469–477. Springer, Berlin (2016)
Google Scholar
Grewe, D., OBoyle, M.F.: A static task partitioning approach for heterogeneous systems using OpenCL. In: Compiler Construction, pp. 286–305. Springer (2011)
Google Scholar
Herath, D., Lakmali, C., Ragel, R.: Accelerating string matching for bio-computing applications on multi-core CPUs. In: 2012 7th IEEE International Conference on Industrial and Information Systems (ICIIS), pp. 1–6 (2012)
Google Scholar
Kessler, C.W., Dastgeer, U., Thibault, S., Namyst, R., Richards, A., Dolinsky, U., Benkner, S., Trff, J.L., Pllana, S.: Programmability and performance portability aspects of heterogeneous multi-/manycore systems. IEEE, pp. 1403–1408 (2012)
Google Scholar
Khan, F.A., Han, Y., Pllana, S., Brezany, P.: An ant-colony-optimization based approach for determination of parameter significance of scientific workflows. In: 24th IEEE International Conference on Advanced Information Networking and Applications. Perth, WA, 2010, pp. 1241–1248 (2010). doi:10.1109/AINA.2010.24
Kołodziej, J., Khan, S.: Data scheduling in data grids and data centers: a short taxonomy of problems and intelligent resolution techniques. In: Nguyen, N.T., Kolodziej, J., Burczyski, T., Biba, M. (eds.) Transactions on Computational Collective Intelligence X. Lecture Notes in Computer Science, vol. 7776, pp. 103–119. Springer, Berlin (2013)
Chapter Google Scholar
Kołodziej, J., Khan, S.U., Wang, L., Zomaya, A.Y.: Energy efficient genetic-based schedulers in computational grids. Concurrency Comput.: Pract. Experience 27(4), 809–829 (2015)
Article Google Scholar
Kouzinopoulos, C., Margaritis, K.: String matching on a multicore GPU using CUDA. In: 13th Panhellenic Conference on Informatics, 2009. PCI ’09, pp. 14–18 (2009)
Google Scholar
Li, H., Ni, B., Wong, M.H., Leung, K.S.: A fast CUDA implementation of agrep algorithm for approximate nucleotide sequence matching. In: SASP, pp. 74–77. IEEE Computer Society (2011)
Google Scholar
Lin, C.H., Liu, C.H., Chien, L.S., Chang, S.C.: Accelerating pattern matching using a novel parallel algorithm on GPUs. IEEE Trans. Comput. 62(10), 1906–1916 (2013)
Article MathSciNet Google Scholar
Luchaup, D., Smith, R., Estan, C., Jha, S.: Speculative parallel pattern matching. IEEE Trans. Inf. Forensics Secur. 6(2), 438–451 (2011)
Article Google Scholar
Luftig, M.A., Richey, S.: DNA and forensic science. New Eng. L Rev. 35, 609 (2000)
Google Scholar
Luk, C.K., Hong, S., Kim, H.: Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In: 42nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-42, 2009, pp. 45–55. IEEE (2009)
Google Scholar
Mellmann, A., Harmsen, D., Cummings, C.A., Zentz, E.B., Leopold, S.R., Rico, A., Prior, K., Szczepanowski, R., Ji, Y., Zhang, W., McLaughlin, S.F., Henkhaus, J.K., Leopold, B., Bielaszewska, M., Prager, R., Brzoska, P.M., Moore, R.L., Guenther, S., Rothberg, J.M., Karch, H.: Prospective genomic characterization of the german enterohemorrhagic escherichia coli O104:H4 outbreak by rapid next generation sequencing technology. PLoS ONE 6(7):e22, 751 (2011)
Google Scholar
Memeti, S., Pllana, S.: PaREM: a novel approach for parallel regular expression matching. In: 17th International Conference on Computational Science and Engineering (CSE-2014), pp. 690–697 (2014). doi:10.1109/CSE.2014.146
Memeti, S., Pllana, S.: Accelerating DNA sequence analysis using Intel Xeon Phi. In: PBio at the 2015 IEEE International Symposium on Parallel and Distributed Processing with Applications (ISPA). IEEE (2015a)
Google Scholar
Memeti, S., Pllana, S.: Analyzing large-scale DNA sequences on multi-core architectures. In: 18th IEEE International Conference on Computational Science and Engineering (CSE-2015). IEEE (2015b)
Google Scholar
Nakao, M., Lee, J., Boku, T., Sato, M.: XcalableMP implementation and performance of NAS parallel benchmarks. In: Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, p. 11. ACM (2010)
Google Scholar
NCBI: National center for biotechnology information U.S. National Library of Medicine. http://www.ncbi.nlm.nih.gov/genbank (2015). Accessed Dec 2015
Odajima, T., Boku, T., Hanawa, T., Lee, J., Sato, M.: GPU/CPU work sharing with parallel language XcalableMP-dev for parallelized accelerated computing. In: 2012 41st International Conference on Parallel Processing Workshops (ICPPW), pp. 97–106. IEEE (2012)
Google Scholar
Pllana, S., Benkner, S., Mehofer, E., Natvig, L., Xhafa, F.: Towards an intelligent environment for programming multi-core computing systems. In: Euro-Par Workshops, Lecture Notes in Computer Science, vol. 5415, pp. 141–151. Springer (2008a)
Google Scholar
Pllana, S., Benkner, S., Xhafa, F., Barolli, L.: Hybrid performance modeling and prediction of large-scale computing systems. In: CISIS 2008. International Conference on Complex, Intelligent and Software Intensive Systems, 2008, pp. 132–138 (2008b)
Google Scholar
Pllana, S., Brandic, I., Benkner, S.: A survey of the state of the art in performance modeling and prediction of parallel and distributed computing systems. Int. J. Comput. Intell. Res. (IJCIR) 4(1), 17–26 (2008c)
Google Scholar
Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical recipes, 3rd edn. In: The Art of Scientific Computing, 3rd edn. Cambridge University Press (2007)
Google Scholar
Ravi, V.T., Agrawal, G.: A dynamic scheduling framework for emerging heterogeneous systems. In: 2011 18th International Conference on High Performance Computing (HiPC), pp. 1–10. IEEE (2011)
Google Scholar
Rohrer, B.: How to choose algorithms for Microsoft Azure Machine Learning. https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-choice/ (2015). Accessed Oct 2015
Sandrieser, M., Benkner, S., Pllana, S.: Using explicit platform descriptions to support programming of heterogeneous many-core systems. Parallel Comput. 38(1–2), 52–56 (2012)
Article Google Scholar
Scogland, T.R., Feng, Wc., Rountree, B., de Supinski, B.R.: CoreTSAR: adaptive worksharing for heterogeneous systems. In: Supercomputing, pp. 172–186. Springer (2014)
Google Scholar
Stephens, Z.D., Lee, S.Y., Faghri, F., Campbell, R.H., Zhai, C., Efron, M.J., Iyer, R., Schatz, M.C., Sinha, S., Robinson, G.E.: Big data: astronomical or genomical? PLoS Biol 13(7):e1002, 195 (2015)
Google Scholar
Tian, X., Saito, H., Preis, S., Garcia, E.N., Kozhukhov, S., Masten, M., Cherkasov, A.G., Panchenko, N.: Practical SIMD vectorization techniques for Intel Xeon Phi Coprocessors. In: IPDPS Workshops, pp. 1149–1158. IEEE (2013)
Google Scholar
Tumeo, A., Villa, O.: Accelerating DNA analysis applications on GPU clusters. In: 2010 IEEE 8th Symposium on Application Specific Processors (SASP), pp. 71–76 (2010)
Google Scholar
Viebke, A., Pllana, S.: The potential of the Intel (R) Xeon Phi for supervised deep learning. In: 2015 IEEE 17th International Conference on High Performance Computing and Communications (HPCC). pp. 758–765. IEEE (2015)
Google Scholar
Villa, O., Chavarra-Miranda, D.G., Maschhoff, K.J.: Input-independent, scalable and fast string matching on the Cray XMT. In: IPDPS, IEEE, pp. 1–12 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Linnaeus University, 351 95, Vaxjo, Sweden
Suejb Memeti & Sabri Pllana
Cracow University of Technology, 31 155, Cracow, Poland
Joanna Kołodziej

Authors

Suejb Memeti
View author publications
You can also search for this author in PubMed Google Scholar
Sabri Pllana
View author publications
You can also search for this author in PubMed Google Scholar
Joanna Kołodziej
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Suejb Memeti .

Editor information

Editors and Affiliations

University Politehnica of Bucharest, Bucharest, Romania
Florin Pop
Cracow University of Technology, Cracow, Poland
Joanna Kołodziej
Second University of Naples, Naples, Caserta, Italy
Beniamino Di Martino

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Memeti, S., Pllana, S., Kołodziej, J. (2016). Optimal Worksharing of DNA Sequence Analysis on Accelerated Platforms. In: Pop, F., Kołodziej, J., Di Martino, B. (eds) Resource Management for Big Data Platforms. Computer Communications and Networks. Springer, Cham. https://doi.org/10.1007/978-3-319-44881-7_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-44881-7_14
Published: 28 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44880-0
Online ISBN: 978-3-319-44881-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Optimal Worksharing of DNA Sequence Analysis on Accelerated Platforms

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DNA sequences alignment in multi-GPUs: acceleration and energy payoff

BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs

GenSeeK: A Novel Parallel Multiple Pattern Recognition Algorithm for DNA Sequences

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Optimal Worksharing of DNA Sequence Analysis on Accelerated Platforms

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

DNA sequences alignment in multi-GPUs: acceleration and energy payoff

BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs

GenSeeK: A Novel Parallel Multiple Pattern Recognition Algorithm for DNA Sequences

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation