A Deep Learning Approach for Scientific Paper Semantic Ranking

Gargiulo, Francesco; Silvestri, Stefano; Fontanella, Mariarosaria; Ciampi, Mario; De Pietro, Giuseppe

doi:10.1007/978-3-319-59480-4_47

Francesco Gargiulo⁷,
Stefano Silvestri⁷,
Mariarosaria Fontanella⁷,
Mario Ciampi⁷ &
…
Giuseppe De Pietro⁷

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 76))

Included in the following conference series:

International Conference on Intelligent Interactive Multimedia Systems and Services

1775 Accesses
6 Citations

Abstract

In this paper we proposed a novel Deep Learning approach to realize a Word Embeddings (WEs) similarity based search tool, considering the medical literature as case study. Using the compositional properties of the WEs we defined a methodology to aggregate the information coming from each word to obtain a vector corresponding to the abstracts of each PubMed article. Through this paradigm it is possible to capture the semantic content of the papers and, consequently, to evaluate and rank the similarity among them. The preliminary results with the proposed approach are obtained analysing a subset of the whole the PubMed collection. The results correctness has been verified by human domain experts, showing that the methodology is promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.lextek.com/manuals/onix/stopwords1.html.

References

Alicante, A., Corazza, A., Isgrò, F., Silvestri, S.: Semantic cluster labeling for medical relations. Innov. Med. Healthcare 2016(60), 183–193 (2016)
Google Scholar
Amato, F., Gargiulo, F., Mazzeo, A., Romano, S., Sansone, C.: Combining syntactic and semantic vector space models in the health domain by using a clustering ensemble. In: Proceedings of the International Conference on Health Informatics, pp. 382–385 (2013)
Google Scholar
Beel, J., Gipp, B.: Google scholar’s ranking algorithm: an introductory overview. In: Proceedings of the 12th International Conference on Scientometrics and Informetrics, vol. 1, pp. 230–241 (2009)
Google Scholar
Berthold, M.R., Cebron, N., Dill, F., Gabriel, T.R., Kötter, T., Meinl, T., Ohl, P., Sieb, C., Thiel, K., Wiswedel, B.: KNIME: the Konstanz information miner. In: Studies in Classification, Data Analysis, and Knowledge Organization (GfKL 2007). Springer (2007)
Google Scholar
Dai, A.M., Olah, C., Le, Q.V.: Document embedding with paragraph vectors. arXiv preprint arXiv:1507.07998 (2015)
Ghazi, M.R., Gangodkar, D.: Hadoop, MapReduce and HDFS: a developers perspective. Procedia Comput. Sci. 48, 45–50 (2015)
Article Google Scholar
Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. J. Mach. Learn. Res. 13, 307–361 (2012)
MathSciNet MATH Google Scholar
Huang, W., Wu, Z., Chen, L., Mitra, P., Giles, C.L.: A neural probabilistic model for context based citation recommendation. In: AAAI, pp. 2404–2410 (2015)
Google Scholar
Kenter, T., de Rijke, M.: Short text similarity with word embeddings. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1411–1420. ACM (2015)
Google Scholar
Krebs, A., Paperno, D.: When hyperparameters help: beneficial parameter combinations in distributional semantic models. In: Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics (*SEM 2016), pp. 97–101 (2016)
Google Scholar
Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q., et al.: From word embeddings to document distances. ICML 15, 957–966 (2015)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. ICML 14, 1188–1196 (2014)
Google Scholar
Lilleberg, J., Zhu, Y., Zhang, Y.: Support vector machines and word2vec for text classification with semantic features. In: 14th International Conference on Cognitive Informatics and Cognitive Computing, pp. 136–140. IEEE (2015)
Google Scholar
Ma, W., Suel, T.: Structural sentence similarity estimation for short texts. In: FLAIRS Conference, pp. 232–237 (2016)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Association for Computational Linguistics (ACL) System Demonstrations, pp. 55–60 (2014)
Google Scholar
Martín, G.H., Schockaert, S., Cornelis, C., Naessens, H.: Using semi-structured data for assessing research paper similarity. Inf. Sci. 221, 245–261 (2013)
Article MATH Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of 27th Annual Conference on Neural Information Processing Systems 2013, pp. 3111–3119 (2013)
Google Scholar
Nalisnick, E., Mitra, B., Craswell, N., Caruana, R.: Improving document ranking with dual word embeddings. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 83–84 (2016)
Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, May 2010
Google Scholar
Salehi, B., Cook, P., Baldwin, T.: A word embedding approach to predicting the compositionality of multiword expressions. In: HLT-NAACL, pp. 977–983 (2015)
Google Scholar
Sayers, E., Miller, V.: Entrez programming utilities help [internet]. The E-utilities in-depth: parameters, syntax and more (2014)
Google Scholar
Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 373–382. ACM (2015)
Google Scholar
Song, Y., Roth, D.: Unsupervised sparse vector densification for short text similarity. In: HLT-NAACL, pp. 1275–1280 (2015)
Google Scholar
Xing, C., Wang, D., Zhang, X., Liu, C.: Document classification with distributions of word vectors. In: Annual Summit and Conference on Asia-Pacific Signal and Information Processing Association, pp. 1–5. IEEE (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for High Performance Computing and Networking, ICAR-CNR, Via Pietro Castellino 111, 80131, Naples, Italy
Francesco Gargiulo, Stefano Silvestri, Mariarosaria Fontanella, Mario Ciampi & Giuseppe De Pietro

Authors

Francesco Gargiulo
View author publications
You can also search for this author in PubMed Google Scholar
Stefano Silvestri
View author publications
You can also search for this author in PubMed Google Scholar
Mariarosaria Fontanella
View author publications
You can also search for this author in PubMed Google Scholar
Mario Ciampi
View author publications
You can also search for this author in PubMed Google Scholar
Giuseppe De Pietro
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefano Silvestri .

Editor information

Editors and Affiliations

National Research Council of Italy (CNR-ICAR), Institute for High-Performance Computing and Networking, Naples, Italy
Giuseppe De Pietro
National Research Council of Italy (CNR-ICAR), Institute for High-Performance Computing and Networking, Naples, Italy
Luigi Gallo
Fern Barrow, Bournemouth University, Poole, Dorset, United Kingdom
Robert J. Howlett
University of Canberra, Canberra, Aust Capital Terr, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gargiulo, F., Silvestri, S., Fontanella, M., Ciampi, M., De Pietro, G. (2018). A Deep Learning Approach for Scientific Paper Semantic Ranking. In: De Pietro, G., Gallo, L., Howlett, R., Jain, L. (eds) Intelligent Interactive Multimedia Systems and Services 2017. KES-IIMSS-18 2018. Smart Innovation, Systems and Technologies, vol 76. Springer, Cham. https://doi.org/10.1007/978-3-319-59480-4_47

Download citation

DOI: https://doi.org/10.1007/978-3-319-59480-4_47
Published: 28 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59479-8
Online ISBN: 978-3-319-59480-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics