Abstract:
The correct interpretation of many molecular biology experiments depends in an essential way on the accuracy and consistency of the existing annotation databases. Such da...Show MoreMetadata
Abstract:
The correct interpretation of many molecular biology experiments depends in an essential way on the accuracy and consistency of the existing annotation databases. Such databases are meant to act as repositories for our biological knowledge as we acquire and refine it. Hence, by definition they are incomplete at any given time. In this paper we describe a technique that improves our previous method for extracting implicit semantic relationships between genes and functions. We added a number of weighting schemes to our previous latent semantic indexing approach. We used this technique to analyze the current annotations of the human genome. The predictions of 15 different weighting schemes were compared and evaluated. Out of the top 50 functional annotations predicted using the best performing weighting scheme, we found support in the literature for 82% of them. For 10% of our prediction we did not find any relevant publications, and 6% were actually contradicted by existing literature. This weighting scheme also outperformed the simple binary scheme used in our previous approach. Our method is independent of the organism and can be used to analyze and improve the quality of the data of any public or private annotation database
Published in: 2007 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology
Date of Conference: 01-05 April 2007
Date Added to IEEE Xplore: 04 June 2007
Print ISBN:1-4244-0710-9