Evaluation of CD-HIT for constructing non-redundant databases | IEEE Conference Publication | IEEE Xplore

Evaluation of CD-HIT for constructing non-redundant databases


Abstract:

CD-HIT is one of the most popular tools for reducing sequence redundancy, and is considered to be the state-of-art method. It tries to minimise redundancy by reducing an ...Show More

Abstract:

CD-HIT is one of the most popular tools for reducing sequence redundancy, and is considered to be the state-of-art method. It tries to minimise redundancy by reducing an input database into several representative sequences, under a user-defined threshold of sequence identity. We present a comprehensive assessment of the redundancy in the outputs of CD-HIT, exploring the impact of different identity thresholds and new evaluation data on the redundancy. We demonstrate that the relationship between threshold and redundancies is surprising weak. Applications of CD-HIT that set low identity threshold values also may suffer from substantial degradation in both efficiency and accuracy.
Date of Conference: 15-18 December 2016
Date Added to IEEE Xplore: 19 January 2017
ISBN Information:
Conference Location: Shenzhen

References

References is not available for this document.