Abstract
The huge difference between known sequences and known tertiary structures has fostered the development of automated methods and systems for protein analysis.When these systems are learned using machine learning techniques, the capability of training them with suitable data becomes of paramount importance. From this perspective, the search for (and the generation of) specialized datasets that meet specific requirements are prominent activities for researchers. To help researchers in these activities we developed ProDaMa-C, a web application aimed at generating specialized protein structure datasets and fostering the collaboration among researchers. ProDaMa-C provides a collaborative environmentwhere researcherswith similar interests can meet and collaborate to generate new datasets. Datasets are generated selecting proteins through user-defined pipelines of methods/operators. Each pipeline can also be used as starting point for building further pipelines able to enforce additional selection criteria. Freely available as web application at the URL http://iasc.diee.unica.it/prodamac , ProDaMa-C has shown to be a useful tool for researchers involved in the task of generating specialized protein structure datasets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Benson, D.A., Boguski, M.S., Lipman, D.J., Ostell, J., Ouellette, B.F.F., Rapp, B.A., Wheeler, D.L.: GenBank. Nucleic Acids Research 27(1), 12–17 (1998)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The Protein Data Bank. Nucleic Acids Research 28, 235–242 (2000)
Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology 292, 192–202 (1999)
Pollastri, G., Przybylski, D., Rost, B., Baldi, P.: Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins 47, 228–235 (2002)
Cheng, J., Randall, A., Sweredoski, M., Baldi, P.: SCRATCH: a Protein Structure and Structural Feature Prediction Server. Nucleic Acids Research Web Server Issue 33, 72–76 (2005)
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 2289–3402 (1997)
Randall, A., Cheng, J., Sweredosk, M., Baldi, P.: TMBpro: secondary structure, β-contact and tertiary structure prediction of transmembrane β-barrel proteins. Bioinformatics 24(4), 513–520 (2008)
Shepherd, A.J., Gorse, D., Thornton, J.M.: Prediction of the location and type of β-turns in proteins using neural networks. Protein Science 8, 1045–1055 (1999)
Kaur, H., Raghava, G.P.S.: Prediction of beta-turns in proteins from multiple alignment using neural network. Protein Science 12, 627–634 (2003)
Sujansky, W.: Heterogeneous database integration in biomedicine. Journal of Biomededical Informatics 34(4), 285–298 (2001)
Perrire, G., Gouy, M.: WWW-query: An on-line retrieval system for biological sequence banks. Biochimie 78(5), 364–369 (1996)
Etzold, T., Argos, P.: SRS – an indexing and retrieval tool for flat file data libraries. Bioinformatics 9(1), 49–57 (1992)
Stevens, R., Baker, P., Bechhofer, S., Ng, G., Jacoby, A., Paton, N.W., Goble, C.A., Brass, A.: TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources. Bioinformatics 16(2), 184–186 (2000)
Davidson, S.B., Overton, C., Tannen, V., Wong, L.: BioKleisli: a digital library for biomedical researchers. International Journal on Digital Libraries 1(1), 36–53 (1997)
Chapman, B., Chang, J.: Biopython: Python tools for computational biology. ACM SIGBIO Newslett. 20, 15–19 (2000)
Armano, G., Manconi, A.: ProDaMa: an open source Python library to generate protein structure datasets. BMC Research Notes 2, 202 (2009)
Hooft, R.W.W., Sander, C., Scharf, M., Vriend, G.: The PDBFINDER database: a summary of PDB, DSSP and HSSP information with added value. Bioinformatics 12(6), 525–529 (1996)
Kabsch, W., Sander, C.: Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22(12), 2577–2637 (1983)
Schneider, R., de Daruvar, A., Sander, C.: The HSSP database of protein structure-sequence alignments. Nucleic Acids Research 25(1), 226–230 (1997)
Andreeva, A., Howorth, D., Chandonia, J.M., Brenner, S.E., Hubbard, T.J.P., Chothia, C., Murzin, A.G.: Data Growth and its Impact on the SCOP Database: new Developments. Nucleic Acids Research 36, D419–D425 (2008)
Cuff, A.L., Sillitoe, I., Lewis, T., Redfern, O.C., Garratt, R., Thornton, J., Orengo, C.A.: The CATH classification revisited – architectures reviewed and new ways to characterize structural divergence in superfamilies. Nucleic Acids Research 37, D310–D314 (2009)
Bateman, A., Birney, E., Durbin, R., Eddy, S.R., Howe, K.L., Sonnhammer, E.L.L.: The Pfam Protein Families Database. Nucleic Acids Research 28(1), 263–266 (2000)
Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology 233(1), 123–138 (1993)
Holm, L., Rosenstrm, P.: Dali server: conservation mapping in 3D. Nucleic Acids Research 38, W545–W549 (2010)
Senes, A., Gerstein, M., Engelman, D.M.: Statistical analysis of amino acid patterns in transmembrane helices: the GxxxG motif occurs frequently and in association with β-branched residues at neighboring positions. Journal of Molecular Biology 296(3), 921–936 (2000)
Bairoch, A., Boeckmann, B., Ferro, S., Gasteiger, E.: Swiss-Prot: Juggling between evolution and stability. Briefings in Bioinformatics 5, 39–55 (2004)
Jayasinghe, S., Hristova, K., White, S.H.: MPtopo: A database of membrane protein topology. Protein Science 10, 455–458 (2001)
Rost, B., Sander, C.: Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology 232, 584–599 (1993)
Hobohm, U., Sander, C.: Enlarged representative set of protein structures. Protein Science 3(3), 522–524 (1994)
Vriend, G.: WHAT IF: A molecular modeling and drug design program. Journal of Molecular Graphics 8, 52–56 (1990)
Pearson, W.R., Lipman, D.J.: Improved tools for biological sequence comparison. Proceeding of the National Academy of Sciences of the United States of America 85(8), 2444–2448 (1998)
Wang, G., Dunbrack, R.L.: Jr. PISCES: a protein sequence culling server. Bioinformatics 19, 1589–1591 (2003)
Lin, K., Simossis, V.A., Taylor, W.R., Heringa, J.: A simple and fast secondary structure prediction method using hidden neural networks. Bioinformatics 21(2), 152–159 (2005)
Cuff, J.A., Barton, G.J.: Application of Multiple Sequence Alignment Profiles to Improve Protein Secondary Structure Prediction. PROTEINS: Structure, Function, and Genetics 40, 502–511 (2000)
Cuff, J.A., Barton, G.J.: Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins-Structure Function and Genetics 34(4), 508–519 (1999)
Wilson, C.L., Hubbard, S.J., Doig, A.J.: A critical assessment of the secondary structure α-helices and their termini in proteins. Protein Engineering Design and Selection 15(7), 545–554 (2002)
Rost, B., Sander, C.: Conservation and prediction of solvent accessibility in protein families. Proteins 20, 216–226 (1994)
Rost, B., Schneider, R., Sander, C.: Redefining the goals of protein secondary structure prediction. Journal of Molecular Biology 235, 13–26 (1994)
Cole, C., Barber, J.D., Barton, G.J.: The Jpred 3 secondary structure prediction server. Nucleic Acids Research 36(2), W197–W201 (2008)
Sander, C., Schneider, R.: Database of homology derived protein structures and the structural meaning of sequence alignment. Proteins 9, 56–68 (1991)
Goble, C., Stevens, R.: State of the nation in data integration for bioinformatics. Journal of Biomedical Informatics 41(5), 687–693 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Armano, G., Manconi, A. (2011). A Collaborative Web Application for Supporting Researchers in the Task of Generating Protein Datasets. In: Pallotta, V., Soro, A., Vargiu, E. (eds) Advances in Distributed Agent-Based Retrieval Tools. Studies in Computational Intelligence, vol 361. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21384-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-21384-7_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21383-0
Online ISBN: 978-3-642-21384-7
eBook Packages: EngineeringEngineering (R0)