Assessing plausibility of scientific claims to support high-quality content in digital collections

González Pinto, José María; Balke, Wolf-Tilo

doi:10.1007/s00799-018-0256-8

Assessing plausibility of scientific claims to support high-quality content in digital collections

Published: 28 October 2018

Volume 21, pages 47–60, (2020)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

295 Accesses
3 Citations
Explore all metrics

Abstract

This paper presents a formalization and extension of a novel approach to support high-quality content in digital libraries. Building on the concept of plausibility used in cognitive sciences, we aim at judging the plausibility of new scientific papers in light of prior knowledge. In particular, our work proposes a novel assessment of scientific papers to qualitatively support the work of reviewers. To do this, our approach focuses on the key aspect of scientific papers: claims. Claims are sentences found in empirical scientific papers that state statistical associations between entities and correspond to the core contributions of the papers. We can find these types of claims, for instance, in medicine, chemistry, and biology, where the consumption of a drug, a substance, or a product causes an effect on some other type of entity such as a disease, or another drug or substance. To operationalize the notion of plausibility, we promote claims as first-class citizens for scientific digital libraries and exploit state-of-the-art neural embedding representations of text and topic models. As a proof of concept of the potential usefulness of this notion of plausibility, we study and report extensive experiments on documents with scientific papers from the PubMed digital library.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

Article Open access 30 April 2020

The Use of Artificial Intelligence in Writing Scientific Review Articles

Article Open access 16 January 2024

Artificial intelligence to automate the systematic review of scientific literature

Article Open access 11 May 2023

Notes

PubMed comprises more than 28 million citations for biomedical literature from MEDLINE, life science journals, and online books.
More information about UMLS in https://www.nlm.nih.gov/research/umls/.

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Kaiser, L., Kudlur, M., Levenberg, J., Man, D., Monga, R., Moore, S., Murray, D., Shlens, J., Steiner, B., Sutskever, I., Tucker, P., Vanhoucke, V., Vasudevan, V., Vinyals, O., Warden, P., Wicke, M., Yu, Y., Zheng, X.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467v2 p. 19 (2015). URLhttp://download.tensorflow.org/paper/whitepaper2015.pdf
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations, pp. 1–15 (2015). https://doi.org/10.1146/annurev.neuro.26.041002.131047
Article Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003). https://doi.org/10.1162/153244303322533223
Article MATH Google Scholar
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994). https://doi.org/10.1109/72.279181
Article Google Scholar
Bertsimas, D., Tsitsiklis, J.N.: Introduction to Linear Optimization. Athena Scientific, Belmont (1997)
Google Scholar
Blei, D.M.: Probabilistic topic models. Commun. ACM 55(4), 77 (2012)
Article Google Scholar
Blei, D.M., Lafferty, J.D.: Topic models. In: Srivastava AN, Sahami M (eds) Text Mining: Classification, Clustering, and Applications, chap. 4. Data Mining and Knowledge Discovery Series, Chapman & Hall/CRC, pp. 71–89 (2009). https://doi.org/10.1145/1143844.1143859
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://doi.org/10.1162/jmlr.2003.3.4-5.993
Article MATH Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information 5, 135–146 (2016). DOI 1511.09249v1. arXiv:1607.04606
Chollet, F.: Deep Learning with Python, 1st edn. Manning Publications, Shelter Island (2017)
Google Scholar
Chollet, F., others: Keras. (2015) https://github.com/keras-team/keras
Ciccarese, P., Wu, E., Wong, G., Ocana, M., Kinoshita, J., Ruttenberg, A., Clark, T.: The SWAN biomedical discourse ontology. J. Biomed. Inform. 41(5), 739–751 (2008). https://doi.org/10.1016/j.jbi.2008.04.010
Article Google Scholar
Connell, L., Keane, M.T.: A model of plausibility. Cognit. Sci. 30(1), 95–120 (2006). https://doi.org/10.1207/s15516709cog0000_53
Article Google Scholar
Dalvi, N., Ré, C., Suciu, D.: Probabilistic databases: diamonds in the dirt. Commun. ACM 52(7), 86–94 (2009). https://doi.org/10.1145/1538788.1538810
Article Google Scholar
González Pinto J.M.; Balke, W.T.: Can plausibility help to support high quality content in digital libraries? In: TPDL 2017 21st International Conference on Theory and Practice of Digital Libraries. Thessaloniki, Greece (2017)
Chapter Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, vol. 521(7553). MIT Press, Cambridge (2016). https://doi.org/10.1038/nmeth.3707
Book MATH Google Scholar
Graves, a., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: 38th International Conference on Acoustics, Speech, and Signal Processing, pp. 6645 – 6649 (2013). https://doi.org/10.1109/ICASSP.2013.6638947
Greff, K., Srivastava, R.K., Koutnik, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: a search space odyssey (2016). https://doi.org/10.1109/TNNLS.2016.2582924
Article MathSciNet Google Scholar
Groth, P., Gibson, A., Velterop, J.: The anatomy of a nanopublication. Inf. Serv. Use 30(1–2), 51–56 (2010). https://doi.org/10.3233/ISU-2010-0613
Article Google Scholar
Groth, P., Loizou, A., Gray, A.J.G., Goble, C., Harland, L., Pettifer, S.: API-centric linked data integration: the open PHACTS discovery platform case study. J. Web Semant. 29, 12–18 (2014). https://doi.org/10.1016/j.websem.2014.03.003
Article Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). arXiv:1207.0580
Hochreiter, S., Urgen Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 1398, 137–142 (1998). https://doi.org/10.1007/s13928716
Article Google Scholar
Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., Rindflesch, T.C.: SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics 28(23), 3158–3160 (2012). https://doi.org/10.1093/bioinformatics/bts591
Article Google Scholar
Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP, pp. 1746–1751 (2014). https://doi.org/10.3115/v1/D14-1181. arXiv:1408.5882
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. Int. Conf. Learn. Represent. 2015, 1–15 (2015)
Google Scholar
Kristal, A.R., Till, C., Platz, E.A., Song, X., King, I.B., Neuhouser, M.L., Ambrosone, C.B., Thompson, I.M.: Serum lycopene concentration and prostate cancer risk: results from the prostate cancer prevention trial. Cancer Epidemiol. Biomark. Prev. 20(4), 638–646 (2011). https://doi.org/10.1158/1055-9965.EPI-10-1221
Article Google Scholar
Kuhn, T., Barbano, P.E., Nagy, M.L., Krauthammer, M.: Broadening the scope of nanopublications. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 7882 LNCS, pp. 487–501 (2013). https://doi.org/10.1007/978-3-642-38288-8-33
Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. In: Proceedings of The 32nd international conference on machine learning vol. 37, pp. 957–966 (2015)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. International Conference on Machine Learning - ICML 2014, vol. 32, pp. 1188–1196 (2014). https://doi.org/10.1145/2740908.2742760
Manning, C.D., Raghavan, P.: An introduction to information retrieval (2009). https://doi.org/10.1109/LPT.2009.2020494. URLhttp://dspace.cusat.ac.in/dspace/handle/123456789/2538
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. Nips pp. 1–9 (2013). https://doi.org/10.1162/jmlr.2003.3.4-5.951
Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR 2013) pp. 1–12 (2013). https://doi.org/10.1162/153244303322533223. arXiv:1301.3781v3.pdf
Mikolov, T., Yih, W.t., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL-HLT, June, pp. 746–751 (2013)
Palangi, H., Deng, L., Shen, Y., Gao, J., He, X., Chen, J., Song, X., Ward, R.: Deep Sentence embedding using long short-term memory networks: analysis and application to information retrieval. IEEE/ACM Trans. Audio Speech and Language Process. 24(4), 694–707 (2016). https://doi.org/10.1109/TASLP.2016.2520371
Article Google Scholar
Pele, O., Werman, M.: Fast and robust earth mover’s distances. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 460–467 (2009). https://doi.org/10.1109/ICCV.2009.5459199
Peleteiro, B., Lopes, C., Figueiredo, C., Lunet, N.: Salt intake and gastric cancer risk according to Helicobacter pylori infection, smoking, tumour site and histological type. British Journal of Cancer 104(1), 198–207 (2011). https://doi.org/10.1038/sj.bjc.6605993. URLhttp://www.nature.com/doifinder/10.1038/sj.bjc.6605993
Article Google Scholar
Pennington, J., Socher, R., Manning, C.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014). https://doi.org/10.3115/v1/D14-1162. URLhttp://aclweb.org/anthology/D14-1162
Price, B.Y.S., Flach, P.A.: Computational support for academic peer review: a perspective from artificial intelligence. Commun. ACM 60(3), 70–79 (2017)
Article Google Scholar
Rehurek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks pp. 45–50 (2010). https://doi.org/10.13140/2.1.2393.1847
Rindflesch, T.C., Fiszman, M.: The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. J. Biomed. Inform. 36(6), 462–477 (2003). https://doi.org/10.1016/j.jbi.2003.11.003
Article Google Scholar
Schoenfeld, J.D., Ioannidis, J.P.A.: Is everything we eat associated with cancer? A systematic cookbook review. Am. J. Clin. Nutr. 97(1), 127–134 (2013). https://doi.org/10.3945/ajcn.112.047142
Article Google Scholar
Toulmin, S.: The uses of argument. Ethics 70(1), vi, 264 (1958). https://doi.org/10.2307/2183556
Article Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp 384–394 (2010)
Velterop, J.: Nanopublications: the future of coping with information overload. LOGOS: J. World Book Community 21, 3–4 (2010)
Article Google Scholar
Verheij, B.: The toulmin argument model in artificial intelligence. In: Rahwan I (ed) Argumentation in Artificial Intelligence, pp. 219–238. Springer (2009). https://doi.org/10.1007/978-0-387-98197-0
Google Scholar
Wang, P., Xu, J., Xu, B., Liu, C.l., Zhang, H., Wang, F., Hao, H.: Semantic clustering and convolutional neural network for short text categorization. In: Proceedings ACL 2015 pp. 352–357 (2015). https://doi.org/10.1016/j.neucom.2015.09.096
Article Google Scholar
Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification. In: Proceedings of the The 8th International Joint Conference on Natural Language Processing, pp. 253–263 (2017). arXiv:1510.03820
Zhao, J., Stockwell, T., Roemer, A., Chikritzhs, T., Bostwick, Dea: Is alcohol consumption a risk factor for prostate cancer? A systematic review and metaanalysis. BMC Cancer 16(1), 845 (2016). https://doi.org/10.1186/s12885-016-2891-z
Article Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universitat Braunschweig, Mühlenpfordtstrae 23, 2.OG Room 232, Braunschweig, Germany
José María González Pinto
Technische Universitat Braunschweig, Mühlenpfordtstrae 23, 2.OG Room 237, Braunschweig, Germany
Wolf-Tilo Balke

Authors

José María González Pinto
View author publications
You can also search for this author in PubMed Google Scholar
Wolf-Tilo Balke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José María González Pinto.

Rights and permissions

Reprints and permissions

About this article

Cite this article

González Pinto, J.M., Balke, WT. Assessing plausibility of scientific claims to support high-quality content in digital collections. Int J Digit Libr 21, 47–60 (2020). https://doi.org/10.1007/s00799-018-0256-8

Download citation

Received: 02 February 2018
Revised: 08 October 2018
Accepted: 13 October 2018
Published: 28 October 2018
Issue Date: March 2020
DOI: https://doi.org/10.1007/s00799-018-0256-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Assessing plausibility of scientific claims to support high-quality content in digital collections

Abstract

Access this article

Similar content being viewed by others

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

The Use of Artificial Intelligence in Writing Scientific Review Articles

Artificial intelligence to automate the systematic review of scientific literature

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Assessing plausibility of scientific claims to support high-quality content in digital collections

Abstract

Access this article

Similar content being viewed by others

How to Write and Publish a Research Paper for a Peer-Reviewed Journal

The Use of Artificial Intelligence in Writing Scientific Review Articles

Artificial intelligence to automate the systematic review of scientific literature

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation