Abstract
Several authors have suggested similarity measures for objects labeled with terms from a hierarchical taxonomy. We generalize this idea with a definition of information-theoretic similarity for taxonomies that are structured as directed acyclic graphs from which multiple terms may be used to describe an object. We discuss how our definition should be adapted in the presence of ambiguity, and introduce new similarity measures based on our definitions.
We present an implementation of our measures that is integrated with a relational database and scales to large taxonomies and datasets. We evaluate our measures by applying them to an object-matching problem from bioinformatics, and show that, for this task, our new measures outperform those reported in the literature. We also verified the scalability of our approach by applying it to patent similarity search, using patents classified with terms from the taxonomy defined by the United States Patent and Trademark Office.
An erratum to this chapter can be found at http://dx.doi.org/10.1007/11914853_71.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ashburner, M., et al.: Gene ontology: Tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)
Apweiler, R., et al.: Uniprot: the universal protein knowledgebase. Nucleic Acids Res. 32(1), D115–D119 (2004)
Halkidi, M., Nguyen, B., Varlamis, I., Vazirgiannis, M.: Thesus: Organizing web document collections based on semantics and clustering. Technical Report 230, INRIA Project Gemo (2003)
Keller, J.M., Popescu, M., Mitchell, J.: Taxonomy-based soft similarity measures in bioinformatics. In: Proc. of the 2004 IEEE Int’l. Conf. on Fuzzy Systems (2004)
Lin, D.: An information-theoretic definition of similarity. In: Proc. 15th Int’l. Conf. on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)
Lord, P.W., Stevens, R.D., Brass, A., Goble, C.A.: Investigating semantic similarity measures across the gene ontology: The relationship between sequence and annotation. Bioinformatics 19(10), 1275–1283 (2003)
Maguitman, A.G., Menczer, F., Roinestad, H., Vespignani, A.: Algorithmic detection of semantic similarity. In: Proc. of the 14th Int’l World Wide Web Conf., pp. 107–116 (2005)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, pp. 448–453 (1995)
Wang, H., Azuaje, F., Bodenreider, O., Dopazo, J.: Gene expression correlation and gene ontology-based similarity: An assessment of quantitative relationships. In: The 2004 IEEE Symp. on Comp. Intelligence in Bioinformatics and Comp. Biology (CIBCB-2004) (2004)
Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: 32nd. Annual Mtg. of the Assoc. for Comp. Linguistics, New Mexico State Univ., Las Cruces, pp. 133–138 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schwarz, P., Deng, Y., Rice, J.E. (2006). Finding Similar Objects Using a Taxonomy: A Pragmatic Approach. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE. OTM 2006. Lecture Notes in Computer Science, vol 4275. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11914853_67
Download citation
DOI: https://doi.org/10.1007/11914853_67
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-48287-1
Online ISBN: 978-3-540-48289-5
eBook Packages: Computer ScienceComputer Science (R0)