Skip to main content

Predicting Entity Mentions in Scientific Literature

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11503))

Abstract

Predicting which entities are likely to be mentioned in scientific articles is a task with significant academic and commercial value. For instance, it can lead to monetary savings if the articles are behind paywalls, or be used to recommend articles that are not yet available. Despite extensive prior work on entity prediction in Web documents, the peculiarities of scientific literature make it a unique scenario for this task. In this paper, we present an approach that uses a neural network to predict whether the (unseen) body of an article contains entities defined in domain-specific knowledge bases (KBs). The network uses features from the abstracts and the KB, and it is trained using open-access articles and authors’ prior works. Our experiments on biomedical literature show that our method is able to predict subsets of entities with high accuracy. As far as we know, our method is the first of its kind and is currently used in several commercial settings.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://elastic.co.

  2. 2.

    https://github.com/inspirehep/beard.

  3. 3.

    https://github.com/keras-team/keras.

  4. 4.

    https://www.tensorflow.org/.

  5. 5.

    These experiments are repeated multiple time (\({\ge }5\)).

References

  1. Adamic, L.A., Adar, E.: Friends and neighbors on the web. Soc. Netw. 25(3), 211–230 (2003)

    Article  Google Scholar 

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52

    Chapter  Google Scholar 

  3. Balog, K., Bron, M., De Rijke, M.: Query modeling for entity search based on terms, categories, and examples. ACM Trans. Inf. Syst. (TOIS) 29(4), 22 (2011)

    Article  Google Scholar 

  4. Bishop, C.: Pattern Recognition and Machine Learning. Springer, New York (2006)

    MATH  Google Scholar 

  5. Blanco, R., Cambazoglu, B.B., Mika, P., Torzec, N.: Entity recommendations in web search. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8219, pp. 33–48. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41338-4_3

    Chapter  Google Scholar 

  6. Côté, R.A., College of American Pathologists, et al.: Systematized nomenclature of medicine. College of American Pathologists (1977)

    Google Scholar 

  7. Damljanovic, D., Stankovic, M., Laublet, P.: Linked data-based concept recommendation: comparison of different methods in open innovation scenario. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 24–38. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30284-8_9

    Chapter  Google Scholar 

  8. Frijters, R., Van Vugt, M., Smeets, R., Van Schaik, R., De Vlieg, J., Alkema, W.: Literature mining for the discovery of hidden connections between drugs, genes and diseases. PLoS Comput. Biol. 6(9), e1000943 (2010)

    Article  Google Scholar 

  9. Ghahramani, Z., Heller, K.A.: Bayesian sets. In: Proceedings of NIPS, pp. 435–442 (2005)

    Google Scholar 

  10. Groth, P., Gibson, A., Velterop, J.: The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010)

    Google Scholar 

  11. Hopfield, J.J.: Neural networks and physical systems with emergent collective computational abilities. Proc. Nat. Acad. Sci. 79(8), 2554–2558 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  12. Jayaram, N., Gupta, M., Khan, A., Li, C., Yan, X., Elmasri, R.: GQBE: querying knowledge graphs by example entity tuples. In: Proceedings of ICDE, pp. 1250–1253 (2014)

    Google Scholar 

  13. Jelier, R., Jenster, G., Dorssers, L.C., van der Eijk, C.C., van Mulligen, E.M., Mons, B., Kors, J.A.: Co-occurrence based meta-analysis of scientific texts: retrieving biological relationships between genes. Bioinformatics 21(9), 2049–2058 (2005)

    Article  Google Scholar 

  14. Jiang, J., Lu, W., Rong, X., Gao, Y.: Adapting language modeling methods for expert search to rank Wikipedia entities. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 264–272. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03761-0_27

    Chapter  Google Scholar 

  15. Kastrin, A., Rindflesch, T.C., Hristovski, D.: Link prediction on a network of co-occurring MeSH terms: towards literature-based discovery. Methods Inf. Med. 55(04), 340–346 (2016)

    Article  Google Scholar 

  16. Lindberg, D.A., Humphreys, B.L., McCray, A.T.: The unified medical language system. Methods Inf. Med. 32(04), 281–291 (1993)

    Article  Google Scholar 

  17. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of NIPS, pp. 3111–3119 (2013)

    Google Scholar 

  18. Milne, D., Witten, I.H.: Learning to link with Wikipedia. In: Proceedings of CIKM, pp. 509–518 (2008)

    Google Scholar 

  19. Ni, Y., Xu, Q.K., Cao, F., Mass, Y., Sheinwald, D., Zhu, H.J., Cao, S.S.: Semantic documents relatedness using concept graph representation. In: Proceedings of WSDM, pp. 635–644 (2016)

    Google Scholar 

  20. Noy, N.E., et al.: BioPortal: ontologies and integrated data resources at the click of a mouse. Nucleic Acids Res. 37, W170–W173 (2009)

    Article  Google Scholar 

  21. Perozzi, B., Al-Rfou, R., Skiena, S.: DeepWalk: online learning of social representations. In: Proceedings of KDD, pp. 701–710 (2014)

    Google Scholar 

  22. Piwowar, H., et al.: The state of OA: a large-scale analysis of the prevalence and impact of open access articles. PeerJ 6, e4375 (2018)

    Article  Google Scholar 

  23. Sarmento, L., Jijkuon, V., de Rijke, M., Oliveira, E.: More like these: growing entity classes from seeds. In: Proceedings of CIKM, pp. 959–962 (2007)

    Google Scholar 

  24. Sioutos, N., de Coronado, S., Haber, M.W., Hartel, F.W., Shaiu, W.L., Wright, L.W.: NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information. J. Biomed. Inform. 40(1), 30–43 (2007)

    Article  Google Scholar 

  25. Swanson, D.R.: Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect. Biol. Med. 30(1), 7–18 (1986)

    Article  Google Scholar 

  26. Tirilly, P., Claveau, V., Gros, P.: A review of weighting schemes for bag of visual words image retrieval. Technical report (2009)

    Google Scholar 

  27. Tseytlin, E., Mitchell, K., Legowski, E., Corrigan, J., Chavan, G., Jacobson, R.S.: NOBLE-Flexible concept recognition for large-scale biomedical natural language processing. BMC Bioinformatics 17(1), 32 (2016)

    Article  Google Scholar 

  28. Vercoustre, A.-M., Pehcevski, J., Thom, J.A.: Using Wikipedia categories and links in entity ranking. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 321–335. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-85902-4_28

    Chapter  Google Scholar 

  29. Wang, R.C., Cohen, W.W.: Iterative set expansion of named entities using the web. In: Proceedings of ICDM, pp. 1091–1096 (2008)

    Google Scholar 

  30. Weerkamp, W., Balog, K., Meij, E.: A generative language modeling approach for ranking entities. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 292–299. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03761-0_30

    Chapter  Google Scholar 

  31. Zhang, Y., Xiao, Y., Hwang, S.w., Wang, H., Wang, X.S., Wang, W.: Entity suggestion with conceptual explanation. In: Proceedings of IJCAI, pp. 4244–4250 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jacopo Urbani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, Y., Ezeiza, J., Farzanehpour, M., Urbani, J. (2019). Predicting Entity Mentions in Scientific Literature. In: Hitzler, P., et al. The Semantic Web. ESWC 2019. Lecture Notes in Computer Science(), vol 11503. Springer, Cham. https://doi.org/10.1007/978-3-030-21348-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-21348-0_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-21347-3

  • Online ISBN: 978-3-030-21348-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics