Conferences >2009 International Conference...

Comparing the performance of clusters, Hadoop, and Active Disks on microarray correlation computations

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Microarray-based comparative genomic hybridization (aCGH) offers an increasingly fine-grained method for detecting copy number variations in DNA. These copy number variat...View more

Metadata

Abstract:

Microarray-based comparative genomic hybridization (aCGH) offers an increasingly fine-grained method for detecting copy number variations in DNA. These copy number variations can directly influence the expression of the proteins that are encoded in the genes in question. A useful analysis of the data produced from these microarray experiments is pairwise correlation. However, the high resolution of today's microarray technology requires that supercomputing computation and storage resources be leveraged in order to perform this analysis. This application is an exemplar of the class of data intensive problems which require high-throughput I/O in order to be tractable. Although the performance of these types of applications on a cluster can be improved by parallelization, storage hardware and network limitations restrict the scalability of an I/O-bound application such as this. The Hadoop software framework is designed to enable data-intensive applications on cluster architectures, and offers significantly better scalability due to its distributed file system. However, specialized architecture adhering to the Active Disk paradigm, in which compute power is placed close to the disk instead of across a network, can further improve performance. The Netezza Corporation's database systems are designed around the Active Disk approach, and offer tremendous gains in implementing this application over the traditional cluster architecture. We present methods and performance analyses of several implementations of this application: on a cluster, on a cluster with a parallel file system, with Hadoop on a cluster, and using a Netezza data warehouse appliance. Our results offer benchmarks for the performance of data intensive applications within these distributed computing paradigms.

Published in: 2009 International Conference on High Performance Computing (HiPC)

Date of Conference: 16-19 December 2009

Date Added to IEEE Xplore: 18 March 2010

ISBN Information:

Print ISSN: 1094-7256

DOI: 10.1109/HIPC.2009.5433190

Conference Location: Kochi, India

Contents

References is not available for this document.

Comparing the performance of clusters, Hadoop, and Active Disks on microarray correlation computations

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Comparing the performance of clusters, Hadoop, and Active Disks on microarray correlation computations

Alerts

Abstract:

Metadata

Abstract:

References

IEEE Account

Purchase Details

Profile Information

Need Help?