research-article

Improvement of template-based protein structure prediction by using chimera alignment

Authors:

Shuichiro Makigaki,

Takashi IshidaAuthors Info & Claims

ICBBB '18: Proceedings of the 2018 8th International Conference on Bioscience, Biochemistry and Bioinformatics

Pages 32 - 37

https://doi.org/10.1145/3180382.3180405

Published: 18 January 2018 Publication History

Abstract

The determination of a protein's structure provides important information that can be used for various practical applications in the biological sciences, such as virtual screening, function prediction, etc. Protein structures can be precisely predicted using template-based modeling if we can find good template structures from a database. However, such predictions sometimes fail even if a template with sufficient quality is found because the sequence alignment used for the modeling is incorrect.

In this paper, we propose a new method for improving sequence alignment in single-template-based modeling. The sequence alignments used as an input of template-based modeling are normally generated by homology search tools, and the alignments vary depending on the search algorithm used. Each single alignment is often imperfect, but most of them have suitable parts for template-based modeling at different positions. Thus, a profile of multiple alignments is typically constructed to obtain a consensus among the alignments by multiple template search tools. Integrated alignments are generated by random sampling, and the final prediction model is selected based on model quality assessment scores and the joint probability of the profile.

We performed evaluation tests using template-based modeling targets in CASP11 and compared the proposed method to several existing major alignment algorithms. The results showed that the proposed method could improve the model accuracy of single-template modeling.

References

[1]

Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman. 1990. Basic local alignment search tool. Journal of Molecular Biology 215, 3 (1990), 403--410.

[2]

Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25, 17 (1997), 3389--3402.

[3]

Vikram Alva, Seung-Zin Nam, Johannes Söding, and Andrei N Lupas. 2016. The MPI bioinformatics Toolkit as an integrative platform for advanced protein sequence and structure analysis. Nucleic Acids Research 44, W1 (2016), W410--W415.

[4]

H M Berman, J Westbrook, Z Feng, G Gilliland, T N Bhat, H Weissig, I N Shindyalov, and P E Bourne. 2000. The Protein Data Bank. Nucleic acids research 28, 1 (2000), 235--42.

[5]

Robert D Finn, Jody Clements, William Arndt, Benjamin L Miller, Travis J Wheeler, Fabian Schreiber, Alex Bateman, and Sean R Eddy. 2015. HMMER web server: 2015 update. Nucleic Acids Research 43, W1 (2015), W30--W38.

[6]

Limin Fu, Beifang Niu, Zhengwei Zhu, Sitao Wu, and Weizhong Li. 2012. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 23 (12 2012), 3150--3152.

Digital Library

[7]

Liisa Holm and Chris Sander. 1995. Dali: a network tool for protein structure comparison. Trends in Biochemical Sciences 20, 11 (1995), 478--480.

[8]

Kazutaka Katoh and Daron M Standley. 2013. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular Biology and Evolution 30, 4 (2013), 772--780.

[9]

Lisa N Kinch, Wenlin Li, Bohdan Monastyrskyy, Andriy Kryshtafovych, and Nick V Grishin. 2016. Evaluation of free modeling targets in CASP11 and ROLL. Proteins: Structure, Function, and Bioinformatics 84, S1 (2016), 51--66.

[10]

Lisa N. Kinch, Wenlin Li, R. Dustin Schaeffer, Roland L. Dunbrack, Bohdan Monastyrskyy, Andriy Kryshtafovych, and Nick V. Grishin. 2016. CASP 11 target classification. Proteins: Structure, Function, and Bioinformatics 84, S1 (2016), 20--33.

[11]

Andriy Kryshtafovych, Alessandro Barbato, Bohdan Monastyrskyy, Krzyszt of Fidelis, Torsten Schwede, and Anna Tramontano. 2016. Methods of model accuracy estimation can help selecting the best models from decoy sets: Assessment of model accuracy estimations in CASP11. Proteins: Structure, Function, and Bioinformatics 84, S1 (2016), 349--369.

[12]

Jesper Lundström, Leszek Rychlewski, Janusz Bujnicki, and Arne Elofsson. 2001. Pcons: A neural-network-based consensus predictor that improves fold recognition. Protein Science 10, 11 (1 2001), 2354--2362.

[13]

John Moult, Krzysztof Fidelis, Andriy Kryshtafovych, Torsten Schwede, and Anna Tramontano. 2016. Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins: Structure, Function, and Bioinformatics 84, S1 (2016), 4--14.

[14]

Cédric Notredame, Desmond G Higgins, and Jaap Heringa. 2000. T-coffee: a novel method for fast and accurate multiple sequence alignment. Journal of Molecular Biology 302, 1 (2000), 205--217.

[15]

Nuala A. O'Leary, MathewW.Wright, J. Rodney Brister, Stacy Ciufo, Diana Haddad, Rich McVeigh, Bhanu Rajput, Barbara Robbertse, Brian Smith-White, Danso Ako-Adjei, Alexander Astashyn, Azat Badretdin, Yiming Bao, Olga Blinkova, Vyacheslav Brover, Vyacheslav Chetvernin, Jinna Choi, Eric Cox, Olga Ermolaeva, Catherine M. Farrell, Tamara Goldfarb, Tripti Gupta, Daniel Haft, Eneida Hatcher, Wratko Hlavina, Vinita S. Joardar, Vamsi K. Kodali, Wenjun Li, Donna Maglott, Patrick Masterson, Kelly M. McGarvey, Michael R. Murphy, Kathleen O'Neill, Shashikant Pujar, Sanjida H. Rangwala, Daniel Rausch, Lillian D. Riddick, Conrad Schoch, Andrei Shkeda, Susan S. Storz, Hanzhen Sun, Francoise Thibaud-Nissen, Igor Tolstoy, Raymond E. Tully, Anjana R. Vatsan, Craig Wallin, David Webb, Wendy Wu, Melissa J. Landrum, Avi Kimchi, Tatiana Tatusova, Michael DiCuccio, Paul Kitts, Terence D. Murphy, and Kim D. Pruitt. 2016. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Research 44, D1 (2016), D733--D745.

[16]

Eric F. Pettersen, Thomas D. Goddard, Conrad C. Huang, Gregory S. Couch, Daniel M. Greenblatt, Elaine C. Meng, and Thomas E. Ferrin. 2004. UCSF Chimera--A visualization system for exploratory research and analysis. Journal of Computational Chemistry 25, 13 (4 2004), 1605--1612.

[17]

Arjun Ray, Erik Lindahl, and Björn Wallner. 2012. Improved model quality assessment using ProQ2. BMC Bioinformatics 13, 1 (12 2012), 1--12.

[18]

INShindyalov and P E Bourne. 1998. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Engineering, Design and Selection 11, 9 (1998), 739--747.

[19]

Fabian Sievers, Andreas Wilm, David Dineen, Toby J Gibson, Kevin Karplus, Weizhong Li, Rodrigo Lopez, Hamish McWilliam, Michael Remmert, Johannes Söding, Julie D Thompson, and Desmond G Higgins. 2011. Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Molecular Systems Biology 7, 1 (11 2011), 539.

[20]

Naomi Siew, Arne Elofsson, Leszek Rychlewski, and Daniel Fischer. 2000. Max-Sub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics 16, 9 (2000), 776--785.

[21]

Andrej Šali and Tom L Blundell. 1993. Comparative Protein Modelling by Satisfaction of Spatial Restraints. Journal of Molecular Biology 234, 3 (1993), 779--815.

[22]

Sitao Wu and Yang Zhang. 2007. LOMETS: A local meta-threading-server for protein structure prediction. Nucleic Acids Research 35, 10 (7 2007), 3375--3382.

[23]

SitaoWu and Yang Zhang. 2008. MUSTER: Improving protein sequence profile-profile alignments by using multiple sources of structure information. Proteins: Structure, Function, and Bioinformatics 72, 2 (8 2008), 547--556.

[24]

Dong Xu, Lukasz Jaroszewski, Zhanwen Li, and Adam Godzik. 2014. FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking. Bioinformatics 30, 5 (2014), 660--667.

[25]

Yuedong Yang, Eshel Faraggi, Huiying Zhao, and Yaoqi Zhou. 2011. Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 27, 15 (11 2011), 2076--2082.

Digital Library

[26]

Adam Zemla. 2003. LGA: A method for finding 3D similarities in protein structures. Nucleic acids research 31, 13 (3 2003), 3370--4.

[27]

Yang Zhang and Jeffrey Skolnick. 2004. Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics 57, 4 (12 2004), 702--710.

[28]

Yang Zhang and Jeffrey Skolnick. 2005. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Research 33, 7 (5 2005), 2302--2309.

Cited By

DİKİCİ SALTUNTAŞ V(2023)Yapay Sinir Ağları Kullanılarak Protein Katlanması TanımaProtein Folding Recognition by Artificial Neural NetworksBilişim Teknolojileri Dergisi10.17671/gazibtd.114146816:2(95-105)Online publication date: 30-Apr-2023
https://doi.org/10.17671/gazibtd.1141468
Yan KWen JLiu JXu YLiu B(2020)Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity ScoresIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2020.296645018:5(2008-2016)Online publication date: 13-Jan-2020
https://dl.acm.org/doi/10.1109/TCBB.2020.2966450

Index Terms

Improvement of template-based protein structure prediction by using chimera alignment
1. Applied computing
  1. Life and medical sciences
    1. Bioinformatics
    2. Computational biology
      1. Molecular structural biology
2. Mathematics of computing
  1. Probability and statistics
    1. Probabilistic reasoning algorithms
      1. Kalman filters and hidden Markov models

Recommendations

Using protein-domain information for multiple sequence alignment
BIBE '12: Proceedings of the 2012 IEEE 12th International Conference on Bioinformatics & Bioengineering (BIBE)

Most approaches to multiple sequence alignment rely on primary-sequence information. External sources of information, however, can give valuable hints to possible sequence homologies that may not be obvious from sequence comparison alone. Given the huge ...
Is the protein folding an aim-oriented process? Human haemoglobin as example

The model for protein folding (in silico) simulation is presented. Three steps have been implemented: early stage folding based on the backbone conformation; hydrophobic collapse based on the fuzzy-oil-drop model; aim-oriented structure modification by the ...
Artificial intelligence for template-free protein structure prediction: a comprehensive review
Abstract
Protein structure prediction (PSP) is a grand challenge in bioinformatics, drug discovery, and related fields. PSP is computationally challenging because of an astronomically large conformational space to be searched and an unknown very complex ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICBBB '18: Proceedings of the 2018 8th International Conference on Bioscience, Biochemistry and Bioinformatics

January 2018

164 pages

ISBN:9781450353410

DOI:10.1145/3180382

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

RIED, Tokai Univ., Japan: RIED, Tokai University, Japan

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 January 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

ICBBB 2018

ICBBB 2018: 2018 8th International Conference on Bioscience, Biochemistry and Bioinformatics

January 18 - 20, 2018

Tokyo, Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
83
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

DİKİCİ SALTUNTAŞ V(2023)Yapay Sinir Ağları Kullanılarak Protein Katlanması TanımaProtein Folding Recognition by Artificial Neural NetworksBilişim Teknolojileri Dergisi10.17671/gazibtd.114146816:2(95-105)Online publication date: 30-Apr-2023
https://doi.org/10.17671/gazibtd.1141468
Yan KWen JLiu JXu YLiu B(2020)Protein Fold Recognition by Combining Support Vector Machines and Pairwise Sequence Similarity ScoresIEEE/ACM Transactions on Computational Biology and Bioinformatics10.1109/TCBB.2020.296645018:5(2008-2016)Online publication date: 13-Jan-2020
https://dl.acm.org/doi/10.1109/TCBB.2020.2966450

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten