Loading [a11y]/accessibility-menu.js
A binary format for genetic data designed for large whole genome studies that enable both marker and strand based analyses | IEEE Conference Publication | IEEE Xplore

A binary format for genetic data designed for large whole genome studies that enable both marker and strand based analyses


Abstract:

Recent advances in genotyping technology have enabled large studies with data from thousands of subjects to contain half a million or more of single nucleotide polymorphi...Show More

Abstract:

Recent advances in genotyping technology have enabled large studies with data from thousands of subjects to contain half a million or more of single nucleotide polymorphisms (SNPs) marker per subject. This rapid increase in the size of data has generated the need to compress the data in order to reduce the storage capacity requirements and the memory required at run time to perform analysis on the data. The availability of so many markers across the whole genome has created opportunities for new methodologies to be implemented that take advantage of the relatively high density of the markers to perform analyses that take into account the Linkage Disequilibrium (LD), an effect where some combinations of genetic markers are non-randomly associated. Classical techniques for transforming genotypic data into a binary format are already in use by several applications however we demonstrate in this paper that the traditional transformations are not adequate for certain types of analyses as some information key to new methodologies of analyses is lost. We propose a new protocol for formatting binary genotypic data that can be used in all types of analyses while still offering a very high compression rate.
Date of Conference: 08-10 October 2008
Date Added to IEEE Xplore: 08 December 2008
ISBN Information:
Conference Location: Athens, Greece

Contact IEEE to Subscribe

References

References is not available for this document.