Skip to main content

Filling a Protein Scaffold with a Reference

  • Conference paper
  • First Online:
Bioinformatics Research and Applications (ISBRA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9683))

Included in the following conference series:

  • 1486 Accesses

Abstract

In mass spectrometry-based de novo protein sequencing, it is hard to complete the sequence of the whole protein. Motivated by this we study the (one-sided) problem of filling a protein scaffold \(\mathcal{S}\) with some missing amino acids, given a sequence of contigs none of which is allowed to be altered, with respect to a complete reference protein \(\mathcal{P}\) of length n, such that the BLOSUM62 score between \(\mathcal{P}\) and the filled sequence \(\mathcal{S}'\) is maximized. We show that this problem is polynomial-time solvable in \(O(n^{26})\) time. We also consider the case when the contigs are not of high quality and they are concatenated into an (incomplete) sequence \(\mathcal{I}\), where the missing amino acids can be inserted anywhere in \(\mathcal{I}\) to obtain \(\mathcal{I}'\), such that the BLOSUM62 score between \(\mathcal{P}\) and \(\mathcal{I}'\) is maximized. We show that this problem is polynomial-time solvable in \(O(n^{22})\) time. Due to the high running time, both of these algorithms are impractical, we hence present several algorithms based on greedy and local search, trying to solve the problems practically. The empirical results show that the algorithms can fill protein scaffolds almost perfectly, provided that a good pair of scaffold and reference are given.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bandeira, N., Pham, V., Pevzner, P., Arnott, D., Lill, J.: Beyond Edman degradation: automated de novo protein sequencing of monoclonal antibodies. Nat. Biotechnol. 26(12), 1336–1338 (2008)

    Article  Google Scholar 

  2. Bandeira, N., Tang, H., Bafna, V., Pevzner, P.: Shotgun protein sequencing by tandem mass spectra assembly. Anal. Chem. 76, 7221–7233 (2004)

    Article  Google Scholar 

  3. Bulteau, L., Carrieri, A.P., Dondi, R.: Fixed-parameter algorithms for scaffold filling. Theo. Comput. Sci. 568, 72–83 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  4. Henikoff, S., Henikoff, J.: Amino acid substitution matrices from protein blocks. PNAS 89(22), 10915–10919 (1992)

    Article  Google Scholar 

  5. Jiang, H., Zhong, F., Zhu, B.: Filling scaffolds with gene repetitions: maximizing the number of adjacencies. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 55–64. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  6. Jiang, H., Zheng, C., Sankoff, D., Zhu, B.: Scaffold filling under the breakpoint and related distances. IEEE/ACM Trans. Comput. Biol. Bioinf. 9(4), 1220–1229 (2012)

    Article  Google Scholar 

  7. Jiang, H., Ma, J., Luan, J., Zhu, D.: Approximation and nonapproximability for the one-sided scaffold filling problem. In: Xu, D., Du, D., Du, D. (eds.) COCOON 2015. LNCS, vol. 9198, pp. 251–263. Springer, Heidelberg (2015)

    Chapter  Google Scholar 

  8. Liu, N., Jiang, H., Zhu, D., Zhu, B.: An improved approximation algorithm for scaffold filling to maximize the common adjacencies. IEEE/ACM Trans. Comput. Biol. Bioinf. 10(4), 905–913 (2013)

    Article  MATH  Google Scholar 

  9. Liu, N., Zhu, D., Jiang, H., Zhu, B.: A 1.5-approximation algorithm for two-sided scaffold filling. Algorithmica 74(1), 91–116 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  10. Liu, X., Han, Y., Yuen, D., Ma, B.: Automated protein (re)sequencing with MS/MS and a homologous database yields almost full coverage and accuracy. Bioinformatics 25, 2174–2180 (2009)

    Article  Google Scholar 

  11. Liu, X., Dekker, L., Wu, S., Vanduijn, M., Luider, T., Tolic, N., Kou, Q., Dvorkin, M., Alexandrova, S., Vyatkina, K., Pasa-Tolic, L., Pevzner, P.: De Novo protein sequencing by combining top-down and bottom-up tandem mass spectra. J. Proteome Res. 13, 3241–3248 (2014)

    Article  Google Scholar 

  12. Ma, B., Zhang, K., Hendrie, C., Liang, C., Li, M., Doherty-Kirby, A., Lajoie, G.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17(20), 2337–2342 (2003)

    Article  Google Scholar 

  13. Ma, B., Zhang, K., Liang, C.: An effective algorithm for peptide de novo sequencing from MA/MS spectra. J. Comput. Syst. Sci. 70(3), 418–430 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  14. Muñoz, A., Zheng, C., Zhu, Q., Albert, V., Rounsley, S., Sankoff, D.: Scaffold filling, contig fusion and gene order comparison. BMC Bioinf. 11, 304 (2010)

    Article  Google Scholar 

  15. Needleman, S., Wunsch, C.: A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol. 48(3), 443–453 (1970)

    Article  Google Scholar 

  16. Pietrokovski, S., Henikoff, J., Henikoff, S.: The Blocks database - a system for protein classification. Nucl. Acids Res. 24(1), 197–200 (1996)

    Article  Google Scholar 

Download references

Acknowledgments

This research is partially supported by NSF of China under grant 60928006 and by the Opening Fund of Top Key Discipline of Computer Software and Theory in Zhejiang Provincial Colleges at Zhejiang Normal University. We also thank anonymous reviewers for several useful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Binhai Zhu .

Editor information

Editors and Affiliations

Appendix

Appendix

See Table 5.

Table 5. The BLOSUM62 score matrix.

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Qingge, L., Liu, X., Zhong, F., Zhu, B. (2016). Filling a Protein Scaffold with a Reference. In: Bourgeois, A., Skums, P., Wan, X., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2016. Lecture Notes in Computer Science(), vol 9683. Springer, Cham. https://doi.org/10.1007/978-3-319-38782-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-38782-6_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-38781-9

  • Online ISBN: 978-3-319-38782-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics