A filtering method for high-speed retrieval of similar active sites
Introduction
It is becoming clear that the function of protein is often activated by a local part of its molecular surface called an active site [1]. Based on this fact, we have proposed a comparison method for evaluating the similarity between two surfaces of a protein, in which a part of the molecular surface that is similar to an active site can be identified [2].
Retrieving an active site that is similar to the portion on the input surface of the protein from the active site database is one of the most important applications of this comparison method, because it has a potential for the protein function identification and prediction. However, it is impractical to compare the input surface with all of the stored active site data, because thousands of active site data whose functions are known have been stored in the active site DB. In addition, it takes several minutes to compare only two surfaces.
This paper proposes a method of active site data filtering, in which only the limited active site data that are expected to be similar to the input protein are roughly filtered for the high-speed retrieval. In this method, characteristic regions, which are defined as parts of the surface that show a similar property, are extracted based on the idea that some similar regions in property and shape are observed in a set of similar protein surfaces. A similarity score between the input protein and an active site is roughly and promptly calculated by using characteristic regions extracted from each of them.
Section snippets
Data of protein molecular surfaces
The molecular surface data of proteins are provided on an eF-site database1 [3]. The surface is represented as a polyhedron that is composed of tens of thousands of triangles [4]. On each vertex of the triangles, the following two types of information are attached.
- •
Electrostatic potential. Based on the kind of amino acids and the kind of the atoms, the degree of electrostatic potential is estimated by solving the Poisson Boltzmann equation. It is
An outline of a method of active site data filtering
There are thousands of active site data of proteins whose functions are known. It is not efficient to compare molecular surface data of proteins whose functions are unknown with all active site data. On the other hand, we can observe that the number of active sites that are similar to a certain protein is slight. Based on this observation, the retrieval process is divided into two stages. First, the data which may be similar to input data in the active site DB are extracted as similar active
Experiment
In order to verify the effectiveness of the method, we made an experiment using both molecular surface data and active site data of an enzyme protein stored in the eF-site.
Conclusion
In this paper, a method of active site data filtering based on extracting characteristic regions was proposed. The proposed method was applied to an enzyme protein database, and the validity of the method was discussed from the viewpoint of the average recall and the average precision. In addition, the efficiency of the method was confirmed.
Acknowledgements
The authors thank Prof. Norihisa Komoda and Dr. Kengo Kinoshita who offered useful discussion related to this research. A part of this research is supported by the Japan Science and Technology Corporation and the Ministry of Education, Culture, Sports Science and Technology, Grant-in-Aid for Scientific Research.
References (5)
- et al.
LIGAND: chemical database for enzyme reactions
Bioinformatics
(1998) - et al.
A method of comparing protein molecular surface based on normal vectors with attributes and its application to function identification