Combining Hadoop and GPU to preprocess large Affymetrix microarray data | IEEE Conference Publication | IEEE Xplore

Combining Hadoop and GPU to preprocess large Affymetrix microarray data


Abstract:

High density oligonucleotide array (microarray) from Affymetrix has been widely used for the measurements of gene expressions. Currently, public data repositories, such a...Show More

Abstract:

High density oligonucleotide array (microarray) from Affymetrix has been widely used for the measurements of gene expressions. Currently, public data repositories, such as Gene Expression Omnibus (GEO) of the National Center for Biotechnology Information (NCBI), have accumulated large amounts of microarray data. Efficient integrative analysis of those microarray data will provide significant knowledge about biological systems. None of the existing microarray preprocessing and quality assessment tools can handle very large microarray datasets with tens of thousands of experiments. The preprocessing and quality assessment of microarray datasets contain both data-intensive and compute-intensive tasks. In this paper, we develop a new set of tools using a mix of the Hadoop (for data intensive tasks) and the General-Purpose Graphics Processing Units (GPGPUs) (for compute intensive tasks) to efficiently process large microarray data. Evaluation of our new tools on large microarray datasets with ten thousands of experiments showed promising superior performance. We demonstrate that the combination of Hadoop and GPGPU computation is effective for complex scientific applications that contain both data-intensive and compute-intensive tasks. Our new tool set will make it possible to utilize valuable large microarray data in the public repositories.
Date of Conference: 27-30 October 2014
Date Added to IEEE Xplore: 08 January 2015
Electronic ISBN:978-1-4799-5666-1
Conference Location: Washington, DC, USA

References

References is not available for this document.