Skip to main content

Finding Similar Objects Using a Taxonomy: A Pragmatic Approach

  • Conference paper
On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE (OTM 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4275))

  • 884 Accesses

Abstract

Several authors have suggested similarity measures for objects labeled with terms from a hierarchical taxonomy. We generalize this idea with a definition of information-theoretic similarity for taxonomies that are structured as directed acyclic graphs from which multiple terms may be used to describe an object. We discuss how our definition should be adapted in the presence of ambiguity, and introduce new similarity measures based on our definitions.

We present an implementation of our measures that is integrated with a relational database and scales to large taxonomies and datasets. We evaluate our measures by applying them to an object-matching problem from bioinformatics, and show that, for this task, our new measures outperform those reported in the literature. We also verified the scalability of our approach by applying it to patent similarity search, using patents classified with terms from the taxonomy defined by the United States Patent and Trademark Office.

An erratum to this chapter can be found at http://dx.doi.org/10.1007/11914853_71.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Ashburner, M., et al.: Gene ontology: Tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000)

    Article  Google Scholar 

  2. Apweiler, R., et al.: Uniprot: the universal protein knowledgebase. Nucleic Acids Res. 32(1), D115–D119 (2004)

    Article  Google Scholar 

  3. Halkidi, M., Nguyen, B., Varlamis, I., Vazirgiannis, M.: Thesus: Organizing web document collections based on semantics and clustering. Technical Report 230, INRIA Project Gemo (2003)

    Google Scholar 

  4. Keller, J.M., Popescu, M., Mitchell, J.: Taxonomy-based soft similarity measures in bioinformatics. In: Proc. of the 2004 IEEE Int’l. Conf. on Fuzzy Systems (2004)

    Google Scholar 

  5. Lin, D.: An information-theoretic definition of similarity. In: Proc. 15th Int’l. Conf. on Machine Learning, pp. 296–304. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  6. Lord, P.W., Stevens, R.D., Brass, A., Goble, C.A.: Investigating semantic similarity measures across the gene ontology: The relationship between sequence and annotation. Bioinformatics 19(10), 1275–1283 (2003)

    Article  Google Scholar 

  7. Maguitman, A.G., Menczer, F., Roinestad, H., Vespignani, A.: Algorithmic detection of semantic similarity. In: Proc. of the 14th Int’l World Wide Web Conf., pp. 107–116 (2005)

    Google Scholar 

  8. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: IJCAI, pp. 448–453 (1995)

    Google Scholar 

  9. Wang, H., Azuaje, F., Bodenreider, O., Dopazo, J.: Gene expression correlation and gene ontology-based similarity: An assessment of quantitative relationships. In: The 2004 IEEE Symp. on Comp. Intelligence in Bioinformatics and Comp. Biology (CIBCB-2004) (2004)

    Google Scholar 

  10. Wu, Z., Palmer, M.: Verb semantics and lexical selection. In: 32nd. Annual Mtg. of the Assoc. for Comp. Linguistics, New Mexico State Univ., Las Cruces, pp. 133–138 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Schwarz, P., Deng, Y., Rice, J.E. (2006). Finding Similar Objects Using a Taxonomy: A Pragmatic Approach. In: Meersman, R., Tari, Z. (eds) On the Move to Meaningful Internet Systems 2006: CoopIS, DOA, GADA, and ODBASE. OTM 2006. Lecture Notes in Computer Science, vol 4275. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11914853_67

Download citation

  • DOI: https://doi.org/10.1007/11914853_67

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-48287-1

  • Online ISBN: 978-3-540-48289-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics