Skip to main content

A Deep Learning Approach for Scientific Paper Semantic Ranking

  • Conference paper
  • First Online:
Intelligent Interactive Multimedia Systems and Services 2017 (KES-IIMSS-18 2018)

Abstract

In this paper we proposed a novel Deep Learning approach to realize a Word Embeddings (WEs) similarity based search tool, considering the medical literature as case study. Using the compositional properties of the WEs we defined a methodology to aggregate the information coming from each word to obtain a vector corresponding to the abstracts of each PubMed article. Through this paradigm it is possible to capture the semantic content of the papers and, consequently, to evaluate and rank the similarity among them. The preliminary results with the proposed approach are obtained analysing a subset of the whole the PubMed collection. The results correctness has been verified by human domain experts, showing that the methodology is promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.lextek.com/manuals/onix/stopwords1.html.

References

  1. Alicante, A., Corazza, A., Isgrò, F., Silvestri, S.: Semantic cluster labeling for medical relations. Innov. Med. Healthcare 2016(60), 183–193 (2016)

    Google Scholar 

  2. Amato, F., Gargiulo, F., Mazzeo, A., Romano, S., Sansone, C.: Combining syntactic and semantic vector space models in the health domain by using a clustering ensemble. In: Proceedings of the International Conference on Health Informatics, pp. 382–385 (2013)

    Google Scholar 

  3. Beel, J., Gipp, B.: Google scholar’s ranking algorithm: an introductory overview. In: Proceedings of the 12th International Conference on Scientometrics and Informetrics, vol. 1, pp. 230–241 (2009)

    Google Scholar 

  4. Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: the Konstanz information miner. In: Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007). Springer (2007)

    Google Scholar 

  5. Dai, A.M., Olah, C., Le, Q.V.: Document embedding with paragraph vectors. arXiv preprint arXiv:1507.07998 (2015)

  6. Ghazi, M.R., Gangodkar, D.: Hadoop, MapReduce and HDFS: a developers perspective. Procedia Comput. Sci. 48, 45–50 (2015)

    Article  Google Scholar 

  7. Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. J. Mach. Learn. Res. 13, 307–361 (2012)

    MathSciNet  MATH  Google Scholar 

  8. Huang, W., Wu, Z., Chen, L., Mitra, P., Giles, C.L.: A neural probabilistic model for context based citation recommendation. In: AAAI, pp. 2404–2410 (2015)

    Google Scholar 

  9. Kenter, T., de Rijke, M.: Short text similarity with word embeddings. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1411–1420. ACM (2015)

    Google Scholar 

  10. Krebs, A., Paperno, D.: When hyperparameters help: beneficial parameter combinations in distributional semantic models. In: Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics (*SEM 2016), pp. 97–101 (2016)

    Google Scholar 

  11. Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q., et al.: From word embeddings to document distances. ICML 15, 957–966 (2015)

    Google Scholar 

  12. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. ICML 14, 1188–1196 (2014)

    Google Scholar 

  13. Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: 14th International Conference on Cognitive Informatics and Cognitive Computing, pp. 136–140. IEEE (2015)

    Google Scholar 

  14. Ma, W., Suel, T.: Structural sentence similarity estimation for short texts. In: FLAIRS Conference, pp. 232–237 (2016)

    Google Scholar 

  15. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  16. Martín, G.H., Schockaert, S., Cornelis, C., Naessens, H.: Using semi-structured data for assessing research paper similarity. Inf. Sci. 221, 245–261 (2013)

    Article  MATH  Google Scholar 

  17. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  18. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of 27th Annual Conference on Neural Information Processing Systems 2013, pp. 3111–3119 (2013)

    Google Scholar 

  19. Nalisnick, E., Mitra, B., Craswell, N., Caruana, R.: Improving document ranking with dual word embeddings. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 83–84 (2016)

    Google Scholar 

  20. Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, May 2010

    Google Scholar 

  21. Salehi, B., Cook, P., Baldwin, T.: A word embedding approach to predicting the compositionality of multiword expressions. In: HLT-NAACL, pp. 977–983 (2015)

    Google Scholar 

  22. Sayers, E., Miller, V.: Entrez programming utilities help [internet]. The E-utilities in-depth: parameters, syntax and more (2014)

    Google Scholar 

  23. Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 373–382. ACM (2015)

    Google Scholar 

  24. Song, Y., Roth, D.: Unsupervised sparse vector densification for short text similarity. In: HLT-NAACL, pp. 1275–1280 (2015)

    Google Scholar 

  25. Xing, C., Wang, D., Zhang, X., Liu, C.: Document classification with distributions of word vectors. In: Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association, pp. 1–5. IEEE (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefano Silvestri .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Gargiulo, F., Silvestri, S., Fontanella, M., Ciampi, M., De Pietro, G. (2018). A Deep Learning Approach for Scientific Paper Semantic Ranking. In: De Pietro, G., Gallo, L., Howlett, R., Jain, L. (eds) Intelligent Interactive Multimedia Systems and Services 2017. KES-IIMSS-18 2018. Smart Innovation, Systems and Technologies, vol 76. Springer, Cham. https://doi.org/10.1007/978-3-319-59480-4_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59480-4_47

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59479-8

  • Online ISBN: 978-3-319-59480-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics