Abstract
In this paper, we promote the idea of automatic semantic characterization of scientific claims to explore entity-entity relationships in Digital collections. Our proposed approach aims at alleviating time-consuming analysis of query results when the information need is not just one document but an overview over a set of documents. With the semantic characterization, we propose to find what we called “dominant” claims and rely on two core properties: the consensual support of a claim in the light of the collection’s previous knowledge as well as the authors’ assertiveness of the language used when expressing it. We will discuss useful features to efficiently capture these two core properties and formalize the idea of finding “dominant” claims by relying on Pareto dominance. We demonstrate the effectiveness of our method regarding quality by a practical evaluation using a real-world document collection from the medical domain to show the potential of our approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Balke, W.-T., Zheng, J.X., Güntzer, U.: Approaching the efficient frontier: cooperative database retrieval using high-dimensional skylines. In: Zhou, L., Ooi, B.C., Meng, X. (eds.) DASFAA 2005. LNCS, vol. 3453, pp. 410–421. Springer, Heidelberg (2005). https://doi.org/10.1007/11408079_37
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003). https://doi.org/10.1162/jmlr.2003.3.4-5.993
Borzsony, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings of the 17th International Conference on Data Engineering, pp. 1–20 (2001). https://doi.org/10.1109/icde.2001.914855
Brysbaert, M., Warriner, A.B., Kuperman, V.: Concreteness ratings for 40 thousand generally known English word lemmas. Behav. Res. Methods 46, 904–911 (2014). https://doi.org/10.3758/s13428-013-0403-5
Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter. In: Proceedings of the 20th International Conference on World Wide Web - WWW 2011, p. 675 (2011). https://doi.org/10.1145/1963405.1963500
Connell, L., Keane, M.T.: A model of plausibility. Cogn. Sci. 30, 95–120 (2006). https://doi.org/10.1207/s15516709cog0000_53
Gabbay, D.M., Guenthner, F.: Handbook of Philosophical Logic. Springer, Dordrecht (2002). https://doi.org/10.1007/978-94-017-0462-5
Godfrey, P.: Skyline cardinality for relational processing. In: Seipel, D., Turull-Torres, J.M. (eds.) FoIKS 2004. LNCS, vol. 2942, pp. 78–97. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24627-5_7
González Pinto, J.M., Balke, W.-T.: Can plausibility help to support high quality content in digital libraries? In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 169–180. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_14
González Pinto, J.M., Balke, W.-T.: Result set diversification in digital libraries through the use of paper’s claims. In: Choemprayong, S., Crestani, F., Cunningham, S.J. (eds.) ICADL 2017. LNCS, vol. 10647, pp. 225–236. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70232-2_19
González Pinto, J.M., Balke, W.-T.: Offering answers for claim-based queries: a new challenge for digital libraries. In: Choemprayong, S., Crestani, F., Cunningham, S.J. (eds.) ICADL 2017. LNCS, vol. 10647, pp. 3–13. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-70232-2_1
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning, vol. 521, p. 800. MIT Press, Cambridge (2016). https://doi.org/10.1038/nmeth.3707
Habernal, I., Gurevych, I.: Which argument is more convincing? Analyzing and predicting convincingness of web arguments using bidirectional LSTM. In: ACL, pp. 1589–1599 (2016)
Islamaj Dogan, R., Murray, G.C., Névéol, A., Lu, Z.: Understanding PubMed® user search behavior through log analysis. Database (2009). https://doi.org/10.1093/database/bap018
Kumar, S., West, R., Leskovec, J.: Disinformation on the web: impact, characteristics, and detection of wikipedia hoaxes. In: WWW, pp. 591–602 (2016). https://doi.org/10.1145/2872427.2883085
Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. In: Proceedings of the 32nd International Conference on Machine Learning, vol. 37, pp. 957–966 (2015)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning - ICML 2014, vol. 32, pp. 1188–1196 (2014). https://doi.org/10.1145/2740908.2742760
Lev, G., Klein, B., Wolf, L.: In defense of word embedding for generic text representation. In: Biemann, C., Handschuh, S., Freitas, A., Meziane, F., Métais, E. (eds.) NLDB 2015. LNCS, vol. 9103, pp. 35–50. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19581-0_3
Lippi, M., Torroni, P.: Argumentation mining: state of the art and emerging trends. ACM Trans. Internet Technol. 16, 10 (2016). https://doi.org/10.1145/2850417
Lofi, C., Balke, W.-T.: On skyline queries and how to choose from pareto sets. In: Catania, B., Jain, L.C. (eds.) Advanced Query Processing, vol. 36, pp. 15–36. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-28323-9_2
Mikolov, T., Corrado, G., Chen, K., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of the International Conference on Learning Representations (ICLR 2013), pp. 1–12 (2013). https://doi.org/10.1162/153244303322533223
Mukherjee, S., Weikum, G.: Leveraging joint interactions for credibility analysis in news communities. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 353–362 (2015)
Mukherjee, S., Weikum, G., Danescu-Niculescu-Mizil, C.: People on drugs: credibility of user statements in health communities. In: KDD 2014 Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 65–74 (2014). https://doi.org/10.1145/2623330.2623714
Priem, J.: Altmetrics. In: Beyond Bibliometrics: Harnessing Multidimensional Indicators of Scholarly Impact, pp. 263–287 (2014)
Pyysalo, S., Ginter, F., Moen, H., Salakoski, T., Ananiadou, S.: Distributional semantics resources for biomedical text processing. In: Proceedings of LBM 2013 (2013)
Recasens, M., Danescu-Niculescu-Mizil, C., Jurafsky, D.: Linguistic models for analyzing and detecting biased language. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pp. 1650–1659 (2013)
Schoenfeld, J.D.: Is everything we eat associated with cancer? A systematic. Am. J. Clinincal Nutr. 97, 127–134 (2013). https://doi.org/10.3945/ajcn.112.047142.1
Zhang, Y., Wallace, B.: A sensitivity analysis of (and practitioners’ guide to) convolutional neural networks for sentence classification, pp. 253–263 (2015)
IBM Debating Technologies. http://researcher.watson.ibm.com/researcher/view_group.php?id=5443. Accessed 11 Oct 2017
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
González Pinto, J.M., Balke, WT. (2018). Scientific Claims Characterization for Claim-Based Analysis in Digital Libraries. In: Méndez, E., Crestani, F., Ribeiro, C., David, G., Lopes, J. (eds) Digital Libraries for Open Knowledge. TPDL 2018. Lecture Notes in Computer Science(), vol 11057. Springer, Cham. https://doi.org/10.1007/978-3-030-00066-0_22
Download citation
DOI: https://doi.org/10.1007/978-3-030-00066-0_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00065-3
Online ISBN: 978-3-030-00066-0
eBook Packages: Computer ScienceComputer Science (R0)