Utilizing emerging knowledge to support medical argument retrieval

Christian Nawroth; Felix Engel; Matthias Hemmje

doi:10.1515/itit-2020-0049

Published by De Gruyter Oldenbourg March 27, 2021

Utilizing emerging knowledge to support medical argument retrieval

Christian Nawroth
Christian Nawroth is PhD Student at University of Hagen, Chair of Multimedia and Internet Applications, Faculty of Mathematics and Computer Science. He studied Information Technology (B. Sc.) and Information Systems (M. Sc.).
, Felix Engel
Dr.-Ing. Felix Engel studied applied computer science at the University of Duisburg-Essen and received a PhD degree from the Department of Multimedia and Internet Applications from the University of Hagen. He has contributed to national and international projects, in the area of digital preservation and has co-authored various publications at national and international conferences.
and Matthias Hemmje
Prof. Dr.-Ing. Matthias Hemmje received a PhD degree from Department of Computer Science of the Technical University of Darmstadt in 1999. From 1999 until 2004 he was manager of the DELITE - Virtual Information and Knowledge Environments research division at Fraunhofer IPSI in Darmstadt, Germany. Since 2004 he is affiliated as full professor for Computer Science with the FernUniversität in Hagen, Department of Mathematics and Computer Science, where he holds the Chair of Multimedia and Internet Applications. Since 2009, Matthias Hemmje is director and chairman of the board of the Research Institute for Telecommunications and Cooperation, FTK.

From the journal it - Information Technology

https://doi.org/10.1515/itit-2020-0049

Showing a limited preview of this publication:

Abstract

This article summarizes selected aspects of a dissertation project and prior publications related to the DFG-funded RecomRatio research project. As such, it provides an end-to-end overview of a research project that aims at extracting and utilizing Emerging Knowledge represented by two concepts that we define as Emerging Named Entities and Emerging Argument Entities to support medical argumentation retrieval. We use these two concepts to model novelty in general scientific literature and, in particular, in medical argumentation. Therefore, this paper will provide an overview of Emerging Knowledge and definitions of Emerging Named Entities and Emerging Argument Entities. It includes a review of state-of-the-art and related work. A preparatory study shows that Emerging Argument Entities are in use in the medical literature. Based on the state of the art review and the preparatory study, a conceptual system design based on Emerging Named Entity Recognition and a state-of-the-art Argumentation Mining framework (ArgumenText) is introduced to extract Emerging Argument Entities from medical literature and make them available for Argument Retrieval. The conceptual system design supports two Argument Retrieval use cases: 1.) Ranking of result sets based on Emerging Argument Entities, and 2.) Highlighting Emerging Argument Entities within result sets. A case study for the extraction and visualization of Emerging Named Entities and Emerging Argument Entities is implemented based on the conceptual design. This proof-of-concept system is used to conduct technical evaluations regarding the Emerging Named Entity Recognition. Furthermore, prior results of an expert-based evaluation are presented. The article finishes with a conclusion and brief outlook of future work, e. g., supporting the Argument Interchange Format.

Keywords: Emerging Knowledge; Emerging Named Entities; Emerging Argument Entities

ACM CCS: Information systems; Information retrieval; Retrieval tasks and goals; Information extraction

Funding source: Deutsche Forschungsgemeinschaft

Award Identifier / Grant number: 376059226

Funding statement: This work has been funded by the Deutsche Forschungsgemeinschaft (DFG) within the project Empfehlungsrationalisierung, Grant Number 376059226, as part of the Priority Program ”Robust Argumentation Machines (RATIO)” (SPP-1999).

About the authors

Christian Nawroth

Christian Nawroth is PhD Student at University of Hagen, Chair of Multimedia and Internet Applications, Faculty of Mathematics and Computer Science. He studied Information Technology (B. Sc.) and Information Systems (M. Sc.).

Felix Engel

Dr.-Ing. Felix Engel studied applied computer science at the University of Duisburg-Essen and received a PhD degree from the Department of Multimedia and Internet Applications from the University of Hagen. He has contributed to national and international projects, in the area of digital preservation and has co-authored various publications at national and international conferences.

Matthias Hemmje

Prof. Dr.-Ing. Matthias Hemmje received a PhD degree from Department of Computer Science of the Technical University of Darmstadt in 1999. From 1999 until 2004 he was manager of the DELITE - Virtual Information and Knowledge Environments research division at Fraunhofer IPSI in Darmstadt, Germany. Since 2004 he is affiliated as full professor for Computer Science with the FernUniversität in Hagen, Department of Mathematics and Computer Science, where he holds the Chair of Multimedia and Internet Applications. Since 2009, Matthias Hemmje is director and chairman of the board of the Research Institute for Telecommunications and Cooperation, FTK.

References

1. Gustavo Aguilar, Suraj Maharjan, Adrian Pastor López-Monroy, and Thamar Solorio. A Multi-task Approach for Named Entity Recognition in Social Media Data. In Proceedings of the 3rd Workshop on Noisy User-Generated Text, pages 148–153, 2017. arXiv:1906.04135.10.18653/v1/W17-4419Search in Google Scholar

2. Allen AI. scispacy, 2020. Library Catalog: allenai.github.io.Search in Google Scholar

3. Marcia J. Bates. The design of browsing and berrypicking techniques for the online search interface. Online Review, 13(5):407–424, 1989.10.1108/eb024320Search in Google Scholar

4. Marcia J. Bates. Information Behavior. In Encyclopedia of Library and Information Sciences, vol. 3, pages 2381–2391, 2010. CRC Press, New York.10.1081/E-ELIS3-120043263Search in Google Scholar

5. Gustavo E.A.P.A. Batista, Ronaldo C. Prati, and Maria Carolina Monard. A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1):20–29, 2004.10.1145/1007730.1007735Search in Google Scholar

6. Nicholas J. Belkin, Robert N. Oddy, and Helen M. Brooks. ASK for information retrieval: Part I. Background and theory. Journal of Documentation, 1982. MCB UP Ltd.10.1108/eb026722Search in Google Scholar

7. Fako Berkers. fako_arguing_lexicon, 2018. original-date: 2018-06-14T19:03:36Z.Search in Google Scholar

8. Marco Brambilla, Stefano Ceri, Emanuele Della Valle, Riccardo Volonterio, and Felix Xavier Acero Salazar. Extracting Emerging Knowledge from Social Media. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, pages 795–804, Republic and Canton of Geneva, Switzerland, 2017. International World Wide Web Conferences Steering Committee.10.1145/3038912.3052697Search in Google Scholar

9. Mario Cataldi, Luigi Di Caro, and Claudio Schifanella. Emerging Topic Detection on Twitter Based on Temporal and Social Terms Evaluation. In Proceedings of the Tenth International Workshop on Multimedia Data Mining – MDMKDD ’10, pages 1–10. ACM Press, Washington, D. C., 2010.10.1145/1814245.1814249Search in Google Scholar

10. Nitesh V. Chawla, Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357, 2002.10.1613/jair.953Search in Google Scholar

11. Yan Chen, Hadi Amiri, Zhoujun Li, and Tat Seng Chua. Emerging Topic Detection for Organizations from Microblogs. In SIGIR 2013 – Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 43–52, 2013.10.1145/2484028.2484057Search in Google Scholar

12. Johannes Daxenberger, Benjamin Schiller, Chris Stahlhut, Erik Kaiser, and Iryna Gurevych. ArgumenText: argument classification and clustering in a generalized search scenario. Datenbank-Spektrum, 20(2):115–121, July 2020.10.1007/s13222-020-00347-7Search in Google Scholar

13. Leon Derczynski, Eric Nichols, Marieke van Erp, and Nut Limsopatham. Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 140–147, Copenhagen, September 2017. Association for Computational Linguistics.10.18653/v1/W17-4418Search in Google Scholar

14. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs], May 2019.Search in Google Scholar

15. Apache Foundation. Apache Solr -, 2020.Search in Google Scholar

16. Michael Färber, Achim Rettinger, and Boulos El Asmar. On Emerging Entity Detection. In Eva Blomqvist, Paolo Ciancarini, Francesco Poggi, and Fabio Vitali, editors, Knowledge Engineering and Knowledge Management, Lecture Notes in Computer Science, pages 223–238. Springer International Publishing, Cham, 2016.10.1007/978-3-319-49004-5_15Search in Google Scholar

17. Aurélien Géron. Hands-on Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems, 1st edition. O’Reilly Media, Sebastopol, CA, 2017.Search in Google Scholar

18. Johannes Hoffart, Yasemin Altun, and Gerhard Weikum. Discovering Emerging Entities with Ambiguous Names. In WWW 2014 – Proceedings of the 23rd International Conference on World Wide Web, WWW ’14, pages 385–395. ACM, New York, NY, USA, 2014.10.1145/2566486.2568003Search in Google Scholar

19. Matthew Honnibal and Ines Montani. spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing, 2017.Search in Google Scholar

20. Jose L. Hurtado, Ankur Agarwal, and Xingquan Zhu. Topic discovery and future trend forecasting for texts. Journal of Big Data, 3(1):7, April 2016.10.1186/s40537-016-0039-2Search in Google Scholar

21. Raul Incze. The Cost of Machine Learning Projects, September 2019.Search in Google Scholar

22. Patrick Jansson and Shuhua Liu. Distributed Representation, LDA Topic Modelling and Deep Learning for Emerging Named Entity Recognition from Social Media. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 154–159, Copenhagen, Denmark, September 2017. Association for Computational Linguistics.10.18653/v1/W17-4420Search in Google Scholar

23. jupyter.org. The Jupyter Notebook — Jupyter Notebook 6.0.3 documentation, 2020.Search in Google Scholar

24. Glenn E. Krasner, Stephen T. Pope, et al. A description of the model-view-controller user interface paradigm in the smalltalk-80 system. Journal of Object Oriented Programming, 1(3):26–49, 1988.Search in Google Scholar

25. Guillaume Lemaître, Fernando Nogueira, and Christos K Aridas. Imbalanced-learn: A Python Toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17):1–5, 2017.Search in Google Scholar

26. Bill Y. Lin, Frank Xu, Zhiyi Luo, and Kenny Zhu. Multi-channel BiLSTM-CRF Model for Emerging Named Entity Recognition in Social Media. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 160–165, Copenhagen, Denmark, September 2017. Association for Computational Linguistics.10.18653/v1/W17-4421Search in Google Scholar

27. Ling Charles X. and Victor S. Sheng. Class Imbalance Problem. In Sammut Claude and Geoffrey I. Webb, editors, Encyclopedia of Machine Learning, page 171. Springer US, Boston, MA, 2010.Search in Google Scholar

28. Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, 2008.10.1017/CBO9780511809071Search in Google Scholar

29. Gary Marchionini. Information-seeking strategies of novices using a full-text electronic encyclopedia. Journal of the American Society for Information Science, 40(1):54–66, 1989.10.1002/(SICI)1097-4571(198901)40:1<54::AID-ASI6>3.0.CO;2-RSearch in Google Scholar

30. Raquel Mochales and Marie-Francine Moens. Argumentation mining. Artificial Intelligence and Law, 19(1):1–22, 2011.10.1007/s10506-010-9104-xSearch in Google Scholar

31. David Nadeau and Satoshi Sekine. A survey of named entity recognition and classification. Lingvisticae Investigationes, 30(1):3–26, 2007.10.1075/bct.19.03nadSearch in Google Scholar

32. C. Nawroth, F. Engel, T. Eljasik-Swoboda, and M. L. Hemmje. Towards enabling emerging named entity recognition as a clinical information and argumentation support. In DATA 2018 – Proceedings of the 7th International Conference on Data Science, Technology and Applications, 2018.10.5220/0006853200470055Search in Google Scholar

33. Christian Nawroth, Alexander Duttenhöfer, and Matthias Hemmje. Argumentationsunterstützung durch emergentes Wissen in der Medizin. In Wilhelm Bauer, Joachim Warschat, editors, Innovation durch Natural Language Processing – Mit Künstlicher Intelligenz die Wettbewerbsfähigkeit verbessern. Carl Hanser Verlag GmbH, 2021.Search in Google Scholar

34. Christian Nawroth, Felix Engel, and Matthias Hemmje. Emerging named entity recognition in a medical knowledge management ecosystem. In Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management – Volume 2: KEOD, pages 29–41. SciTePress, 2020.10.5220/0010061200290041Search in Google Scholar

35. Christian Nawroth, Felix Engel, Paul Mc Kevitt, and Matthias L. Hemmje. Emerging Named Entity Recognition on Retrieval Features in an Affective Computing Corpus. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pages 2860–2868, November 2019.10.1109/BIBM47256.2019.8983247Search in Google Scholar

36. Christian Nawroth, Marc Herrmann, Felix Engel, Paul Mc Kevitt, and Matthias Hemmje. Emerging Knowledge Extraction and Visualization in Medical Document Corpora. In Accepted for Proceedings Collaborative European Research Conference (CERC 2020), Belfast, 2020.Search in Google Scholar

37. Mark Neumann, Daniel King, Iz Beltagy, and Waleed Ammar. Scispacy: Fast and robust models for biomedical natural language processing. arXiv:1902.07669, 2019.10.18653/v1/W19-5034Search in Google Scholar

38. US National Library of Medicine. Open Access Subset, 2019.Search in Google Scholar

39. Medical Subject Headings – Home Page, 2020. Library Catalog: www.nlm.nih.gov. U. S. National Library of Medicine.Search in Google Scholar

40. MEDLINE®: Description of the Database, 2020. Library Catalog: www.nlm.nih.gov. U. S. National Library of Medicine.Search in Google Scholar

41. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, and Vincent Dubourg. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12:2825–2830, 2011.Search in Google Scholar

42. About the Oxford 3000 and 5000 word lists at Oxford Learner’s Dictionaries, 2020. Oxford University Press.Search in Google Scholar

43. Iyad Rahwan and Chris Reed. The Argument Interchange Format. In Guillermo Simari and Iyad Rahwan, editors, Argumentation in Artificial Intelligence, pages 383–402. Springer US, Boston, MA, 2009.10.1007/978-0-387-98197-0_19Search in Google Scholar

44. Utpal Kumar Sikdar and Björn Gambäck. A Feature-based Ensemble Approach to Recognition of Emerging and Rare Named Entities. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 177–181, Copenhagen, Denmark, September 2017. Association for Computational Linguistics.10.18653/v1/W17-4424Search in Google Scholar

45. Swapna Somasundaran, Josef Ruppenhofer, and Janyce Wiebe. Detecting arguing and sentiment in meetings. In Proceedings of the SIGdial Workshop on Discourse and Dialogue, vol. 6, 2007.Search in Google Scholar

46. Christian Stab, Johannes Daxenberger, Chris Stahlhut, Tristan Miller, Benjamin Schiller, Christopher Tauchmann, Steffen Eger, and Iryna Gurevych. ArgumenText: Searching for Arguments in Heterogeneous Sources. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pages 21–25, New Orleans, June 2018. Association for Computational Linguistics.10.18653/v1/N18-5005Search in Google Scholar

47. Christian Stab, Tristan Miller, and Iryna Gurevych. Cross-topic argument mining from heterogeneous sources using attention-based neural networks. arXiv:1802.05758, 2018.10.18653/v1/D18-1402Search in Google Scholar

48. Synced. The Staggering Cost of Training SOTA AI Models, June 2019.Search in Google Scholar

49. Keras Team. Keras documentation: Why choose Keras?, 2020. Library Catalog: keras.io.Search in Google Scholar

50. Pius von Däniken and Mark Cieliebak. Transfer Learning and Sentence Level Features for Named Entity Recognition on Tweets. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 166–171, Copenhagen, Denmark, September 2017. Association for Computational Linguistics.10.18653/v1/W17-4422Search in Google Scholar

51. Binh Vu and Matthias Hemmje. Supporting Taxonomy Development and Evolution by Means of Crowdsourcing. In Proceedings of the 11th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, pages 351–358, Vienna, Austria, 2019. ScitePress.10.5220/0008348003510358Search in Google Scholar

52. Henning Wachsmuth, Martin Potthast, Khalid Al Khatib, Yamen Ajjour, Jana Puschmann, Jiani Qu, Jonas Dorsch, Viorel Morari, Janek Bevendorff, and Benno Stein. Building an argument search engine for the web. In Proceedings of the 4th Workshop on Argument Mining, pages 49–59, 2017.10.18653/v1/W17-5106Search in Google Scholar

53. Beibei Wang, Bo Yang, Shuangshuang Shan, and Hechang Chen. Detecting hot topics from academic big data. IEEE Access, 7:185916–185927, 2019.10.1109/ACCESS.2019.2960285Search in Google Scholar

54. G. Wiederhold. Mediators in the architecture of future information systems. Computer, 25(3):38–49, 1992.10.1109/2.121508Search in Google Scholar

55. Jake Williams and Giovanni Santia. Context-sensitive recognition for emerging and rare entities. In Proceedings of the 3rd Workshop on Noisy User-generated Text, pages 172–176, 2017.10.18653/v1/W17-4423Search in Google Scholar

56. M. Rita, I. Young and Ying Xiong. Influence of vitamin D on cancer risk and treatment: Why the variability? Trends in Cancer Research, 13:43–53, 2018.Search in Google Scholar

Received: 2020-11-25

Revised: 2021-03-10

Accepted: 2021-03-15

Published Online: 2021-03-27

Published in Print: 2021-02-23

Utilizing emerging knowledge to support medical argument retrieval

Abstract

About the authors

References

Journal and Issue

Articles in the same Issue