skip to main content
10.1145/3442705.3442719acmotherconferencesArticle/Chapter ViewAbstractPublication PagesvsipConference Proceedingsconference-collections
research-article

A Polynomial Time Algorithm for a Class of Contig-Based Two-Sided Scaffold Filling

Published:21 March 2021Publication History

ABSTRACT

Recently, the genomic scaffold filling problem has attracted a lot of attention at home and abroad. However, almost current studies assume that the scaffold is given as an incomplete sequence (i.e., missing genes can be inserted anywhere in the incomplete sequence). This differs significantly from most of the real genomic dataset (where a scaffold is given as a list of contigs). In this paper, we review the genomic scaffold filling problem by considering this important case when two scaffolds R and S is given, the missing genes can only be inserted in between the contigs, and the objective is to maximize the number of common adjacencies between the filled genome R’ and S’. For this problem, a polynomial time algorithm is designed by using greedy search strategy, which proves the correctness of the algorithm, analyzes the time complexity of the algorithm, and completes the development of a visual program based on python, which further validates the effectiveness of the algorithm.

References

  1. Munoz, A., Zheng, C. F., Zhu, Q., Albert, V. A., Roubsley, S., and Sankoff, D. 2010. Scaffold filling, contig fusion and comparative gene order inference. BMC Bioinformatics, (June 2010), 11:304. http://dx.doi.org/10.1186/1471-2105-11-304.Google ScholarGoogle Scholar
  2. Yancopoulos, S., Attie, O. and Friedberg, R. 2005. Efficient sorting of genomic permutations by translocation, inversion and block interchange. BMC Bioinformatics, 21 (June 2005), 3340-3346. http://dx.doi.org/10.1093/bioinformatics/bti535.Google ScholarGoogle Scholar
  3. Jiang, H. T., Zheng, C. F., Sankoff, D. and Zhu, B. H. 2012. Scaffold filling under the breakpoint, related distances. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 9, 4 (April 2012), 1220-1229. http://dx.doi.org/10.1109/TCBB.2012.57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Angibaud, S., Fertin, G. and Rusu, I. 2009. On the approximability of comparing genomes with duplicates. Journal of Graph Algorithms and Applicatins, 13, 1 (June 2009), 19-53. http://dx.doi.org/10.7155/jgaa.00175.Google ScholarGoogle Scholar
  5. Blin, G., Fertin, G., Sikora, F. and Vialette, S. 2009. The exemplar breakpoint distance for non-trivial genomes cannot be approximated. In Proceedings of the 3rd International Workshop on Algorithms and Computation (WALCOM'09). Springer Verlag,Heidelberg, 357-368. http://dx.doi.org/10.1007/978-3-540-77891-2_4.Google ScholarGoogle Scholar
  6. Shao, M. F. and Bernard, M. E. M. 2016. On computing breakpoint distance for genomes with duplicate genes. In Proceedings of the 20th Annual Conference on Research in Computational Molecular Biology (RECOMB'16). Springer Verlag, Heidelberg, 189-203. http://dx.doi.org/10.1007/978-3-319-31957-5_14.Google ScholarGoogle Scholar
  7. Chen, Z. X., Fowler, R. H., Fu, B. and Zhu, B. H. 2008. On the inapproximability of the exemplar conserved interval distance problem of genomes. Journal of Combinatorial Optimization, 15, 2 (February 2008), 201-221. http://dx.doi.org/10.1007/s10878-007-9077-1.Google ScholarGoogle ScholarCross RefCross Ref
  8. Jiang, M. H. 2010. The zero exemplar distance problem. In Proceeding of the 8th Annual RECOMB Satellite Workshop on Comparative Genomics. (RECOMB-CG'10). Springer Verlag, Heidelberg, 74-82. http://dx.doi.org/10.1007/978-3-642-16181-0_7.Google ScholarGoogle ScholarCross RefCross Ref
  9. Chen, Z. X., Fu, B., Xu, J. H., Yang, B. T., Zhao, Z. Y. and Zhu, B. H. 2007. Non-breaking similarity of genomes with gene repetitions. In Proceeding of the 18th Annual Symposium on Combinatorial Pattern Matching. (CPM'07). Springer Verlag, Heidelberg, 119-130. http://dx.doi.org/10.1007/978-3-540-73437-6_14.Google ScholarGoogle Scholar
  10. Chen, Z. X., Fu, B., Goebel, R., Lin, G. H., Tong, W. T., Xu, J. H. 2014. On the approximability of the exemplar adjacency number problem of genomes with gene repetitions. Theoretical Computer Science. 550, C, (July 2014), 59-65. http://dx.doi.org/10.1016/j.tcs.2014.07.011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cormode, G. and Muthukrishnan, S. 2002. The string edit distance matching problem with moves. In Proceedings of the 13th ACM-SIAM Symposium on Discrete Algorithms. (SODA'02). ACM Press, New York, USA, 2002, 667–676. http://dx.doi.org/10.1145/1186810.1186812.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Angibaud, S., Fertin, G., Rusu, I., Thevenin, A. and Vialette, S. 2007. A pseudo-boolean programming approach for computing the breakpoint distance between two genomes with duplicate genes. In Proceedings of the 5th Annual RECOMB Satellite Workshop on Comparative Genomics 2007. (RECOMB-CG'07). Springer Verlag, Heidelberg, 16-29. DOI= http://dx.doi.org/10.1007/978-3-540-74960-8_2.Google ScholarGoogle Scholar
  13. Angibaud, S., Fertin, G. and Rusu, I. 2008. On the approximability of comparing genomes with duplicates. In Proceedings of the 2nd International Workshop on Algorithms and Computation. (WALCOM'08). Springer Verlag, Heidelberg, 34-45. http://dx.doi.org/10.1007/978-3-540-77891-2_4.Google ScholarGoogle Scholar
  14. Jiang, H. T., Zhong, F. R. and Zhu, B. H. 2011. Filling scaffolds with gene repetitions: maximizing the number of adjacencies. Lecture Notes in Computer Science. v6661, (June 2011), 55-64. http://dx.doi.org/10.1007/978-3-642-21458-5_7.Google ScholarGoogle Scholar
  15. Liu, N., Jiang, H. T., Zhu, D. M. and Zhu, B. H. 2013. An improved approximation algorithm for scaffold filling to maximize the common adjacencies. IEEE/ACM Transactions on Computational Biology and Bioinformatics. 10, 4 (July/August 2013), 905-913. http://dx.doi.org/10.1109/TCBB.2013.100.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Jiang, H. T., Ma, J. J., Luan, J. F. and Zhu, D. M. 2015. Approximation and nonapproximability for the one-sided scaffold filling problem. In Proceeding of the 21st International Conference on Computing and Combinatorics Conference. (COCOON'15). Springer Verlag, Heidelberg, 251-263. http://dx.doi.org/10.1007/978-3-319-21398-9_20.Google ScholarGoogle Scholar
  17. Ma, J. J. and Jiang, H. T. 2016. Notes on the 6/5-Approximation Algorithm for One-Sided Scaffold Filling. In Proceeding of the 10th International Workshop on Frontiers in Algorithmics. (FAW'16). Springer Verlag, Heidelberg, 2016, 145-157. http://dx.doi.org/10.1007/978-3-319-39817-4_15.Google ScholarGoogle Scholar
  18. Liu, N., Zhu, D. M., Jiang, H. T., Zhu, B. H. 2016. A 1.5-approximation algorithm for two-sided scaffold filling. Algorithmica. 74, 1 (January 2016), 91-116. http://dx.doi.org/10.1007/s00453-014-9938-9.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ma, J. J., Jiang, H. T., Zhu, D. M. and Zhang, S. 2017. A 1.4-approximation algorithm for two-sided scaffold filling. In Proceeding of the 11th International Frontiers of Algorithmics Workshop. (FAW'17). Springer Verlag, Heidelberg, 10336, 196-208. http://dx.doi.org/10.1007/978-3-319-59605-1_18.Google ScholarGoogle Scholar
  20. http://wgs-assembler.sourceforge.net/Google ScholarGoogle Scholar
  21. Zhu, B. 2016. Genomic scaffold filling: a progress report. In Proceeding of the 11th International Frontiers of Algorithmics Workshop. (FAW'16). Springer, Heidelberg, 9711, 8-16. http://dx.doi.org/10.1007/978-3-319-39817-4_2.Google ScholarGoogle Scholar
  22. Liu, N., Zou, P. and Zhu, B. H. 2016. A polynomial time solution for permutation scaffold filling. In Proceedings of the 10th Annual International Conference on Combinatorial Optimization and Applications. (COCOA'16). Springer Verlag Heidelberg, 782-789. http://dx.doi.org/10.1007/978-3-319-48749-6_60.Google ScholarGoogle Scholar
  23. Jiang, H. T., Fan, C. L., Yang, B. T., Zhong, F. R., Zhu, D. M. and Zhu, B. H. 2016. Genomic scaffold filling revisited. In Proceedings of the 27th Annual Symposium on Combinatorial Pattern Matching. (CPM'16). Schloss Dagstuhl-Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, Dagstuhl, 1-13. http://dx.doi.org/10.4230/LIPIcs.CPM.2016.15.Google ScholarGoogle Scholar
  24. Jang, H. T., Qingge, L. T., Zhu, D. M. and Zhu, B. H. 2018. A 2-approximation algotithm for the contig-based genomic scaffold filling problem. Journal of Bioinformatics and Computational Biology, 16, 6 (2018), http://dx.doi.org/10.1142/S0219720018500221.Google ScholarGoogle Scholar
  25. Bulteau, L., Fertin, G. and Komusiewicz, C. 2017. Beyond adjacency maximization: Scaffold filling for new string distances. In Proceedings of the 28th Annual Symposium on Combinatorial Pattern Matching. (CPM'17). Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, Dagstuhl. http://dx.doi.org/10.4230/LIPIcs.CPM.2017.27.Google ScholarGoogle Scholar
  26. Feng, Q.L., Meng, X. Z., Tan, G. L. and Wang, J. X. 2019. A 2.57-Approximation Algorithm for Contig-Based Genomic Scaffold Filling. In Proceedings of the 13th International Conference on Algorithmic Aspects in Information and Management. (AAIM'19). Springer Verlag, Heidelberg, 95-107. http://dx.doi.org/10.1007/978-3-030-27195Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    VSIP '20: Proceedings of the 2020 2nd International Conference on Video, Signal and Image Processing
    December 2020
    108 pages
    ISBN:9781450388931
    DOI:10.1145/3442705

    Copyright © 2020 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 21 March 2021

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article
    • Research
    • Refereed limited
  • Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format