Skip to main content

Domain-Specific Semantic Relatedness from Wikipedia Structure: A Case Study in Biomedical Text

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9041))

Abstract

Wikipedia is becoming an important knowledge source in various domain specific applications based on concept representation. This introduces the need for concrete evaluation of Wikipedia as a foundation for computing semantic relatedness between concepts. While lexical resources like WordNet cover generic English well, they are weak in their coverage of domain specific terms and named entities, which is one of the strengths of Wikipedia. Furthermore, semantic relatedness methods that rely on the hierarchical structure of a lexical resource are not directly applicable to the Wikipedia link structure, which is not hierarchical and whose links do not capture well defined semantic relationships like hyponymy.

In this paper we (1) Evaluate Wikipedia in a domain specific semantic relatedness task and demonstrate that Wikipedia based methods can be competitive with state of the art ontology based methods and distributional methods in the biomedical domain (2) Adapt and evaluate the effectiveness of bibliometric methods of various degrees of sophistication on Wikipedia (3) Propose a new graph-based method for calculating semantic relatedness that outperforms existing methods by considering some specific features of Wikipedia structure.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agirre, E., Alfonseca, E., Hall, K., Kravalova, J., Paşca, M., Soroa, A.: A study on similarity and relatedness using distributional and wordnet-based approaches. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL 2009, Association for Computational Linguistics, Stroudsburg (2009), http://dl.acm.org/citation.cfm?id=1620754.1620758

    Google Scholar 

  2. Agirre, E., Cer, D., Diab, M., Gonzalez-agirre, A., Guo, W.: SEM 2013 shared task: Semantic textual similarity, including a pilot on typed-similarity. In: *SEM 2013: The Second Joint Conference on Lexical and Computational Semantics. Association for Computational Linguistics (2013)

    Google Scholar 

  3. Aronson, A.R., Lang, F.M.: An overview of metamap: historical perspective and recent advances. JAMIA 17(3), 229–236 (2010), http://dblp.uni-trier.de/db/journals/jamia/jamia17.html#AronsonL10

    Google Scholar 

  4. Budanitsky, A.: Lexical Semantic Relatedness and its Application in Natural Language Processing. Ph.D. thesis, University of Toronto, Toronto, Ontario (1999)

    Google Scholar 

  5. Christensen, D.: Fast algorithms for the calculation of Kendall’s τ. Computational Statistics 20(1), 51–62 (2005), http://dx.doi.org/10.1007/BF02736122

    Article  MATH  MathSciNet  Google Scholar 

  6. Cilibrasi, R.L., Vitanyi, P.M.B.: The google similarity distance. IEEE Trans. on Knowl. and Data Eng. 19(3), 370–383 (2007), http://dx.doi.org/10.1109/TKDE.2007.48

    Article  Google Scholar 

  7. Couto, T., Cristo, M., Gonçalves, M.A., Calado, P., Ziviani, N., Moura, E., Ribeiro-Neto, B.: A comparative study of citations and links in document classification. In: Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2006, pp. 75–84. ACM, New York (2006), http://doi.acm.org/10.1145/1141753.1141766

  8. Fagin, R., Kumar, R., Sivakumar, D.: Comparing top k lists. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2003, pp. 28–36. Society for Industrial and Applied Mathematics, Philadelphia (2003)

    Google Scholar 

  9. Finkelstein, L., Gabrilovich, E., Matias, Y., Rivlin, E., Solan, Z., Wolfman, G., Ruppin, E.: Placing search in context: the concept revisited. In: Proceedings of the 10th International Conference on World Wide Web, WWW 2001, pp. 406–414. ACM, New York (2001), http://doi.acm.org/10.1145/371920.372094

  10. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI 2007, pp. 1606–1611. Morgan Kaufmann Publishers Inc., San Francisco (2007), http://dl.acm.org/citation.cfm?id=1625275.1625535

  11. Garla, V., Brandt, C.: Semantic similarity in the biomedical domain: an evaluation across knowledge sources. BMC Bioinformatics 13(1), 1–13 (2012)

    Google Scholar 

  12. Golub, G.H., van der Vorst, H.A.: Eigenvalue computation in the 20th century. Journal of Computational and Applied Mathematics 123(1-2), 35–65 (2000); numerical Analysis 2000. Vol. III: Linear Algebra, http://www.sciencedirect.com/science/article/pii/S0377042700004131

  13. Hersh, W., Buckley, C., Leone, T.J., Hickam, D.: Ohsumed: An interactive retrieval evaluation and new large test collection for research. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1994, pp. 192–201. Springer-Verlag New York, Inc., New York (1994), http://dl.acm.org/citation.cfm?id=188490.188557

  14. Hjrland, B.: Citation analysis: A social and dynamic approach to knowledge organization. Information Processing & Management 49(6), 1313–1325 (2013), http://linkinghub.elsevier.com/retrieve/pii/S0306457313000733

    Article  Google Scholar 

  15. Hughes, T., Ramage, D.: Lexical semantic relatedness with random graph walks. In: EMNLP-CoNLL, pp. 581–589 (2007)

    Google Scholar 

  16. Jabeen, S., Gao, X., Andreae, P.: CPRel: Semantic relatedness computation using wikipedia based context profiles. In: Research in Computing Science, vol. 70, pp. 55–66 (2013)

    Google Scholar 

  17. Jeh, G., Widom, J.: Simrank: a measure of structural-context similarity. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2002, pp. 538–543. ACM, New York (2002)

    Google Scholar 

  18. Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  19. Koopman, B., Zuccon, G., Bruza, P., Sitbon, L., Lawley, M.: An evaluation of corpus-driven measures of medical concept similarity for information retrieval. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 2439–2442. ACM, New York (2012), http://doi.acm.org/10.1145/2396761.2398661

  20. Leacock, C., Chodorow, M.: Combining local context and WordNet similarity for word sense identification. In: Fellbaum, C. (ed.) pp. 305–332. MIT Press (1998)

    Google Scholar 

  21. Lu, W., Janssen, J., Milios, E., Japkowicz, N., Zhang, Y.: Node similarity in the citation graph. Knowledge and Information Systems 11(1), 105–129 (2007), http://dx.doi.org/10.1007/s10115-006-0023-9

    Article  Google Scholar 

  22. McInnes, B.T., Pedersen, T., Pakhomov, S.V.: UMLS-Interface and UMLS-Similarity: open source software for measuring paths and semantic similarity. In: AMIA Annual Symposium Proc. 2009, pp. 431–435 (2009)

    Google Scholar 

  23. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. CoRR abs/1301.3781 (2013), http://arxiv.org/abs/1301.3781

  24. Miller, G.A., Charles, W.G.: Contextual correlates of semantic similarity. Language and Cognitive Processes 6(1), 1–28 (1991)

    Article  Google Scholar 

  25. Milne, D., Witten, I.H.: An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In: Proceedings of AAAI 2008 (2008)

    Google Scholar 

  26. Nguyen, H., Al-Mubaid, H.: New ontology-based semantic similarity measure for the biomedical domain. In: 2006 IEEE International Conference on Granular Computing, pp. 623–628 (2006)

    Google Scholar 

  27. Pakhomov, S., McInnes, B., Adam, T., Liu, Y., Pedersen, T., Melton, G.B.: Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study. In: AMIA Annu. Symp. Proc. 2010, pp. 572–576 (2010)

    Google Scholar 

  28. Pakhomov, S.V.S., Pedersen, T., McInnes, B., Melton, G.B., Ruggieri, A., Chute, C.G.: Towards a framework for developing semantic relatedness reference standards. J. of Biomedical Informatics 44(2), 251–265 (2011)

    Article  Google Scholar 

  29. Pedersen, T., Pakhomov, S.V., Patwardhan, S., Chute, C.G.: Measures of semantic similarity and relatedness in the biomedical domain. Journal of Biomedical Informatics 40(3), 288–299 (2007)

    Article  Google Scholar 

  30. Ponzetto, S.P., Strube, M.: Knowledge derived from wikipedia for computing semantic relatedness. J. Artif. Intell. Res (JAIR) 30, 181–212 (2007)

    MATH  Google Scholar 

  31. Sánchez, D., Batet, M.: Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective. J. of Biomedical Informatics 44(5), 749–759 (2011), http://dx.doi.org/10.1016/j.jbi.2011.03.013

    Article  Google Scholar 

  32. Senellart, P., Blondel, V.D.: Automatic discovery of similar words. In: Berry, M.W., Castellanos, M. (eds.) Survey of Text Mining II: Clustering, Classification and Retrieval, pp. 25–44. Springer-Verlag (January 2008)

    Google Scholar 

  33. Symonds, M., Zuccon, G., Koopman, B., Bruza, P.D., Nguyen, A.: Semantic judgement of medical concepts: combining syntagmatic and paradigmatic information with the tensor encoding model. In: Australasian Language Technology Association Workshop (ALTA 2012). University of Otago, Dunedin (December 2012), http://eprints.qut.edu.au/54722/

  34. Yang, B., Heines, J.M.: Domain-specific semantic relatedness from Wikipedia: can a course be transferred? In: Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Student Research Workshop, NAACL HLT 2012, pp. 35–40. Association for Computational Linguistics, Stroudsburg (2012), http://dl.acm.org/citation.cfm?id=2385736.2385744

  35. Yazdani, M., Popescu-Belis, A.: Computing text semantic relatedness using the contents and links of a hypertext encyclopedia. Artif. Intell. 194, 176–202 (2013), http://dx.doi.org/10.1016/j.artint.2012.06.004

    Article  MATH  MathSciNet  Google Scholar 

  36. Yeh, E., Ramage, D., Manning, C.D.: Wikiwalk: random walks on Wikipedia for semantic relatedness. In: Proceedings of the 2009 Workshop on Graph-based Methods for Natural Language Processing, TextGraphs-4, pp. 41–49. Association for Computational Linguistics, Stroudsburg (2009)

    Chapter  Google Scholar 

  37. Zhao, P., Han, J., Sun, Y.: P-rank: a comprehensive structural similarity measure over information networks. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, pp. 553–562. ACM, New York (2009)

    Google Scholar 

  38. Zou, G.Y.: Toward using confidence intervals to compare correlations. Psychological Methods 12(4), 399–413 (2007), http://dx.doi.org/10.1037/1082-989x.12.4.399

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Armin Sajadi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Sajadi, A., Milios, E.E., Kešelj, V., Janssen, J.C.M. (2015). Domain-Specific Semantic Relatedness from Wikipedia Structure: A Case Study in Biomedical Text. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2015. Lecture Notes in Computer Science(), vol 9041. Springer, Cham. https://doi.org/10.1007/978-3-319-18111-0_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-18111-0_26

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-18110-3

  • Online ISBN: 978-3-319-18111-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics