Abstract
In this work, we investigate a new method to search DNA sequences based on multimedia retrieval approach. We try to address the issues related to index sizes and performance by first transforming the DNA sequences into images, and then index these images using content-based image indexing techniques. The main goal is to allow users retrieve similar gene sequences using stored image features rather than the sequence itself. We suggest two algorithms to do the conversions, each of which have been tested to reveal its sensitivity to both sequence length and sequence changes. We have also compared our approach to BLAST, which were used as a reference system. The result from our experiments has shown that this approach performed well with respect to size and speed, but more work must be done to improve it in terms of search sensitivity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of molecular biology 215(3), 403–410 (1990)
Altschul, S.F., et al.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic acids research 25(17), 3389 (1997)
Bray, N., Dubchak, I., Pachter, L.: AVID: A global alignment program. Genome research 13(1), 97–102 (2003)
Brown, A.L.: Constructing chromosome scale suffix trees. In: Proceedings of the 2nd conference on Asia-Pacific bioinformatics, pp. 105–112. Australian Computer Society (2004)
Cao, X., Li, S.C., Tung, A.K.H.: Indexing DNA sequences using q-grams. In: Database Systems for Advanced Applications, vol. 3453, pp. 4–16. Springer, Heidelberg (2005)
Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Comput. Surv. 40(2), 1–60 (2008)
Dimitrova, N., Cheung, Y.H., Zhang, M.: Analysis and visualization of DNA spectrograms: open possibilities for the genome research. In: Proceedings of the 14th ACM International Multimedia Conference, pp. 1017–1024. ACM Press, New York (2006)
Guttman, A.: R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD International Conference on Management of data, pp. 47–57. ACM Press, New York (1984)
Hohl, M., Kurtz, S., Ohlebusch, E.: Efficient multiple genome alignment. Bioinformatics 18(Suppl. 1), S312 (2002)
Hunt, E., Atkinson, M.P., Irving, R.W.: A database index to large biological sequences. In: VLDB 2001: Proceedings of the 27th International Conference on Very Large Data Bases, pp. 139–148. Morgan Kaufmann Publishers, San Francisco (2001)
Kanz, C., et al.: The EMBL Nucleotide Sequence Database. Nucl. Acids Res. 33(1), D29–D33 (2005)
Lux, M., Chatzichristofis, S.A.: Lire: lucene image retrieval: an extensible java cbir library. In: Proceeding of the 16th ACM international conference on Multimedia, pp. 1085–1088. ACM, New York (2008)
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences of the United States of America 85(8), 2444–2448 (1988)
Phoophakdee, B., Zaki, M.J.: Genome-scale disk-based suffix tree indexing. In: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, p. 833. ACM, New York (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ramampiaro, H., Grande, A. (2011). DNA Sequence Search Using Content-Based Image Search Approach. In: Rocha, M.P., RodrÃguez, J.M.C., Fdez-Riverola, F., Valencia, A. (eds) 5th International Conference on Practical Applications of Computational Biology & Bioinformatics (PACBB 2011). Advances in Intelligent and Soft Computing, vol 93. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19914-1_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-19914-1_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19913-4
Online ISBN: 978-3-642-19914-1
eBook Packages: EngineeringEngineering (R0)