ABSTRACT
Recently, the genomic scaffold filling problem has attracted a lot of attention at home and abroad. However, almost current studies assume that the scaffold is given as an incomplete sequence (i.e., missing genes can be inserted anywhere in the incomplete sequence). This differs significantly from most of the real genomic dataset (where a scaffold is given as a list of contigs). In this paper, we review the genomic scaffold filling problem by considering this important case when two scaffolds R and S is given, the missing genes can only be inserted in between the contigs, and the objective is to maximize the number of common adjacencies between the filled genome R’ and S’. For this problem, a polynomial time algorithm is designed by using greedy search strategy, which proves the correctness of the algorithm, analyzes the time complexity of the algorithm, and completes the development of a visual program based on python, which further validates the effectiveness of the algorithm.
- Munoz, A., Zheng, C. F., Zhu, Q., Albert, V. A., Roubsley, S., and Sankoff, D. 2010. Scaffold filling, contig fusion and comparative gene order inference. BMC Bioinformatics, (June 2010), 11:304. http://dx.doi.org/10.1186/1471-2105-11-304.Google Scholar
- Yancopoulos, S., Attie, O. and Friedberg, R. 2005. Efficient sorting of genomic permutations by translocation, inversion and block interchange. BMC Bioinformatics, 21 (June 2005), 3340-3346. http://dx.doi.org/10.1093/bioinformatics/bti535.Google Scholar
- Jiang, H. T., Zheng, C. F., Sankoff, D. and Zhu, B. H. 2012. Scaffold filling under the breakpoint, related distances. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9, 4 (April 2012), 1220-1229. http://dx.doi.org/10.1109/TCBB.2012.57.Google ScholarDigital Library
- Angibaud, S., Fertin, G. and Rusu, I. 2009. On the approximability of comparing genomes with duplicates. Journal of Graph Algorithms and Applicatins, 13, 1 (June 2009), 19-53. http://dx.doi.org/10.7155/jgaa.00175.Google Scholar
- Blin, G., Fertin, G., Sikora, F. and Vialette, S. 2009. The exemplar breakpoint distance for non-trivial genomes cannot be approximated. In Proceedings of the 3rd International Workshop on Algorithms and Computation (WALCOM'09). Springer Verlag,Heidelberg, 357-368. http://dx.doi.org/10.1007/978-3-540-77891-2_4.Google Scholar
- Shao, M. F. and Bernard, M. E. M. 2016. On computing breakpoint distance for genomes with duplicate genes. In Proceedings of the 20th Annual Conference on Research in Computational Molecular Biology (RECOMB'16). Springer Verlag, Heidelberg, 189-203. http://dx.doi.org/10.1007/978-3-319-31957-5_14.Google Scholar
- Chen, Z. X., Fowler, R. H., Fu, B. and Zhu, B. H. 2008. On the inapproximability of the exemplar conserved interval distance problem of genomes. Journal of Combinatorial Optimization, 15, 2 (February 2008), 201-221. http://dx.doi.org/10.1007/s10878-007-9077-1.Google ScholarCross Ref
- Jiang, M. H. 2010. The zero exemplar distance problem. In Proceeding of the 8th Annual RECOMB Satellite Workshop on Comparative Genomics. (RECOMB-CG'10). Springer Verlag, Heidelberg, 74-82. http://dx.doi.org/10.1007/978-3-642-16181-0_7.Google ScholarCross Ref
- Chen, Z. X., Fu, B., Xu, J. H., Yang, B. T., Zhao, Z. Y. and Zhu, B. H. 2007. Non-breaking similarity of genomes with gene repetitions. In Proceeding of the 18th Annual Symposium on Combinatorial Pattern Matching. (CPM'07). Springer Verlag, Heidelberg, 119-130. http://dx.doi.org/10.1007/978-3-540-73437-6_14.Google Scholar
- Chen, Z. X., Fu, B., Goebel, R., Lin, G. H., Tong, W. T., Xu, J. H. 2014. On the approximability of the exemplar adjacency number problem of genomes with gene repetitions. Theoretical Computer Science. 550, C, (July 2014), 59-65. http://dx.doi.org/10.1016/j.tcs.2014.07.011.Google ScholarDigital Library
- Cormode, G. and Muthukrishnan, S. 2002. The string edit distance matching problem with moves. In Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms. (SODA'02). ACM Press, New York, USA, 2002, 667–676. http://dx.doi.org/10.1145/1186810.1186812.Google ScholarDigital Library
- Angibaud, S., Fertin, G., Rusu, I., Thevenin, A. and Vialette, S. 2007. A pseudo-boolean programming approach for computing the breakpoint distance between two genomes with duplicate genes. In Proceedings of the 5th Annual RECOMB Satellite Workshop on Comparative Genomics 2007. (RECOMB-CG'07). Springer Verlag, Heidelberg, 16-29. DOI= http://dx.doi.org/10.1007/978-3-540-74960-8_2.Google Scholar
- Angibaud, S., Fertin, G. and Rusu, I. 2008. On the approximability of comparing genomes with duplicates. In Proceedings of the 2nd International Workshop on Algorithms and Computation. (WALCOM'08). Springer Verlag, Heidelberg, 34-45. http://dx.doi.org/10.1007/978-3-540-77891-2_4.Google Scholar
- Jiang, H. T., Zhong, F. R. and Zhu, B. H. 2011. Filling scaffolds with gene repetitions: maximizing the number of adjacencies. Lecture Notes in Computer Science. v6661, (June 2011), 55-64. http://dx.doi.org/10.1007/978-3-642-21458-5_7.Google Scholar
- Liu, N., Jiang, H. T., Zhu, D. M. and Zhu, B. H. 2013. An improved approximation algorithm for scaffold filling to maximize the common adjacencies. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 10, 4 (July/August 2013), 905-913. http://dx.doi.org/10.1109/TCBB.2013.100.Google ScholarDigital Library
- Jiang, H. T., Ma, J. J., Luan, J. F. and Zhu, D. M. 2015. Approximation and nonapproximability for the one-sided scaffold filling problem. In Proceeding of the 21st International Conference on Computing and Combinatorics Conference. (COCOON'15). Springer Verlag, Heidelberg, 251-263. http://dx.doi.org/10.1007/978-3-319-21398-9_20.Google Scholar
- Ma, J. J. and Jiang, H. T. 2016. Notes on the 6/5-Approximation Algorithm for One-Sided Scaffold Filling. In Proceeding of the 10th International Workshop on Frontiers in Algorithmics. (FAW'16). Springer Verlag, Heidelberg, 2016, 145-157. http://dx.doi.org/10.1007/978-3-319-39817-4_15.Google Scholar
- Liu, N., Zhu, D. M., Jiang, H. T., Zhu, B. H. 2016. A 1.5-approximation algorithm for two-sided scaffold filling. Algorithmica. 74, 1 (January 2016), 91-116. http://dx.doi.org/10.1007/s00453-014-9938-9.Google ScholarDigital Library
- Ma, J. J., Jiang, H. T., Zhu, D. M. and Zhang, S. 2017. A 1.4-approximation algorithm for two-sided scaffold filling. In Proceeding of the 11th International Frontiers of Algorithmics Workshop. (FAW'17). Springer Verlag, Heidelberg, 10336, 196-208. http://dx.doi.org/10.1007/978-3-319-59605-1_18.Google Scholar
- http://wgs-assembler.sourceforge.net/Google Scholar
- Zhu, B. 2016. Genomic scaffold filling: a progress report. In Proceeding of the 11th International Frontiers of Algorithmics Workshop. (FAW'16). Springer, Heidelberg, 9711, 8-16. http://dx.doi.org/10.1007/978-3-319-39817-4_2.Google Scholar
- Liu, N., Zou, P. and Zhu, B. H. 2016. A polynomial time solution for permutation scaffold filling. In Proceedings of the 10th Annual International Conference on Combinatorial Optimization and Applications. (COCOA'16). Springer Verlag Heidelberg, 782-789. http://dx.doi.org/10.1007/978-3-319-48749-6_60.Google Scholar
- Jiang, H. T., Fan, C. L., Yang, B. T., Zhong, F. R., Zhu, D. M. and Zhu, B. H. 2016. Genomic scaffold filling revisited. In Proceedings of the 27th Annual Symposium on Combinatorial Pattern Matching. (CPM'16). Schloss Dagstuhl-Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, Dagstuhl, 1-13. http://dx.doi.org/10.4230/LIPIcs.CPM.2016.15.Google Scholar
- Jang, H. T., Qingge, L. T., Zhu, D. M. and Zhu, B. H. 2018. A 2-approximation algotithm for the contig-based genomic scaffold filling problem. Journal of Bioinformatics and Computational Biology, 16, 6 (2018), http://dx.doi.org/10.1142/S0219720018500221.Google Scholar
- Bulteau, L., Fertin, G. and Komusiewicz, C. 2017. Beyond adjacency maximization: Scaffold filling for new string distances. In Proceedings of the 28th Annual Symposium on Combinatorial Pattern Matching. (CPM'17). Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, Dagstuhl. http://dx.doi.org/10.4230/LIPIcs.CPM.2017.27.Google Scholar
- Feng, Q.L., Meng, X. Z., Tan, G. L. and Wang, J. X. 2019. A 2.57-Approximation Algorithm for Contig-Based Genomic Scaffold Filling. In Proceedings of the 13th International Conference on Algorithmic Aspects in Information and Management. (AAIM'19). Springer Verlag, Heidelberg, 95-107. http://dx.doi.org/10.1007/978-3-030-27195Google Scholar
Recommendations
A Polynomial Time Algorithm for a Class of Two-Sided Scaffold Filling Base Contig
CSAE '22: Proceedings of the 6th International Conference on Computer Science and Application EngineeringWith the rapid development of biological sequencing technology, large scale gene scaffold sequences can be obtained by humans in a shorter time. How to fill the incomplete genome scaffold to make them complete has attracted more and more attention. The ...
A 1.5-Approximation Algorithm for Two-Sided Scaffold Filling
The scaffold filling problem aims to set up the whole genomes by filling those missing genes into the scaffolds to optimize a similarity measure of genomes. A typical and frequently used measure for the similarity of two genomes is the number of common ...
Scaffold Filling under the Breakpoint and Related Distances
Motivated by the trend of genome sequencing without completing the sequence of the whole genomes, a problem on filling an incomplete multichromosomal genome (or scaffold) I with respect to a complete target genome G was studied. The objective is to ...
Comments