skip to main content
10.1145/1854776.1854856acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Use of ternary similarities in graph based clustering for protein structural family classification

Published: 02 August 2010 Publication History

Abstract

Classification of proteins 3D structures into structural families is reformulated in terms of graph based clustering of objects which are modular as similarities between two 3D structures relies on the local similarities of their matching substructures. Similarities between 3D structures are then represented as edges connecting objects in a graph.
Applying clustering algorithms to such a graph results in the following drawback: subsets of more than two 3D structures belonging to the same cluster may share no similar substructure. To overcome this drawback we propose to introduce constraints about ternary similarities, i.e. constraints on triples of objects. The 3D structures graph is first transformed into its line graph, that represents the adjacencies between the graph edges. The ternary constraints are applied on the line graph, and a maximal line graph is then extracted from the modified line graph. The corresponding 3D structures graph now satisfies the above mentioned ternary constraints. In our experiments applying clustering on the new graph results in a more stable classification which is coherent with the expert classification SCOP.

References

[1]
Antonina Andreeva, Dave Howorth, John-Marc Chandonia, Steven E Brenner, Tim J P Hubbard, Cyrus Chothia, and Alexey G Murzin. Data growth and its impact on the scop database: new developments. Nucleic Acids Res, 36(Database issue):D419--25, 2008.
[2]
H M Berman, J Westbrook, Z Feng, G Gilliland, T N Bhat, H Weissig, I N Shindyalov, and P E Bourne. The protein data bank. Nucleic Acids Res, 28(1):235--242, 2000.
[3]
Bela Bollobas. Modern Graph Theory. Springer, July 1998.
[4]
Mathilde Carpentier, Sophie Brouillet, and Joel Pothier. Yakusa: a fast structural database scanning method. Proteins, 61(1):137--151, 2005.
[5]
C Chothia and A M Lesk. The relation between the divergence of sequence and structure in proteins. EMBO J, 5(4):823--826, Apr 1986.
[6]
V. Chvatal. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 4(3):233--235, 1979.
[7]
Hideya Kawaji, Yoichi Takenaka, and Hideo Matsuda. Graph-based clustering for finding distant relationships in a large set of protein sequences. Bioinformatics, 20(2):243--252, Jan 2004.
[8]
Tao Li. A unified view on clustering binary data. Mach. Learn., 62(3):199--215, 2006.
[9]
Antoine Marin, Joel Pothier, Karel Zimmermann, and Jean-Francois Gibrat. Frost: a filter-based fold recognition method. Proteins, 49(4):493--509, 2002.
[10]
Gabrielle A Reeves, Timothy J Dallman, Oliver C Redfern, Adrian Akpor, and Christine A Orengo. Structural diversity of domain superfamilies in the cath database. J Mol Biol, 360(3):725--741, 2006.
[11]
B Rost. Twilight zone of protein sequence alignments. Protein Eng, 12(2):85--94, Feb 1999.
[12]
Stijn Van Dongen. Graph clustering via a discrete uncoupling process. SIAM J. Matrix Anal. Appl., 30(1):121--141, 2008.

Cited By

View all
  • (2015)3-way Networks: Application of Hypergraphs for Modelling Increased Complexity in Comparative GenomicsPLOS Computational Biology10.1371/journal.pcbi.100407911:3(e1004079)Online publication date: 27-Mar-2015
  • (2012)Automatic classification of protein structures relying on similarities between alignmentsBMC Bioinformatics10.1186/1471-2105-13-23313:1Online publication date: 14-Sep-2012
  1. Use of ternary similarities in graph based clustering for protein structural family classification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      BCB '10: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
      August 2010
      705 pages
      ISBN:9781450304382
      DOI:10.1145/1854776
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 02 August 2010

      Permissions

      Request permissions for this article.

      Check for updates

      Qualifiers

      • Research-article

      Conference

      BCB'10
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 254 of 885 submissions, 29%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 16 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)3-way Networks: Application of Hypergraphs for Modelling Increased Complexity in Comparative GenomicsPLOS Computational Biology10.1371/journal.pcbi.100407911:3(e1004079)Online publication date: 27-Mar-2015
      • (2012)Automatic classification of protein structures relying on similarities between alignmentsBMC Bioinformatics10.1186/1471-2105-13-23313:1Online publication date: 14-Sep-2012

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media