Abstract
The field of Personal Knowledge Management (PKM) has seen a surge in popularity in recent years. Interestingly, Natural Language Processing (NLP) and Large Language Models are also becoming mainstream, but PKM has not seen much integration with NLP. With this motivation, this article first introduces a methodology to automatically interconnect isolated text collections using NLP techniques combined with Knowledge Graphs. The text connections are generated by exploring the semantic relatedness of the texts and the concepts they share. The article proceeds to describe PKM Assistants that incorporate the methodology to assist users in understanding and exploring the knowledge contained in text collections using a Knowledge Management tool called Tana. The article continues with an assessment of the methodology using a text collection composed of several books and book passages collected for each book. Finally, the article concludes with a discussion of the proposed methodology, with special attention to the potential use cases.



















Similar content being viewed by others
Data Availability Statement
Data is available.
Notes
References
Ahrens S. How to Take Smart Notes: One Simple Technique to Boost Writing, Learning and Thinking. Sönke Ahrens, - (2017)
Forte T. Building a Second Brain: A Proven Method to Organize Your Digital Life and Unlock Your Creative Potential vol. 1. Atria Books, - (2022)
Fraga F, Poggi M, Casanova M, Leme L. On the Automatic Generation of Knowledge Connections, pp. 43–54 (2023). https://doi.org/10.5220/0011781100003467
Mendes P.N, Jakob M, García-Silva A, Bizer C. DBpedia spotlight: shedding light on the web of documents. In: Proc. 7th International Conference on Semantic Systems, pp. 1–8 (2011). https://doi.org/10.1145/2063518.2063519
Chabchoub M, Gagnon M, Zouaq A. FICLONE: Improving DBpedia Spotlight Using Named Entity Recognition and Collective Disambiguation 5(1), 17 (2018)
Finkel J.R, Grenager T, Manning C.D. Incorporating non-local information into information extraction systems by gibbs sampling. In: Proc. 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pp. 363–370 (2005)
Geiß J, Spitz A, Gertz M. NECKAr: A Named Entity Classifier for Wikidata, pp. 115–129 (2018). https://doi.org/10.1007/978-3-319-73706-5_10
Shanaz A.L.F, Ragel R.G. Named entity extraction of wikidata items. In: 2019 14th Conference on Industrial and Information Systems (ICIIS), pp. 40–45 (2019).https://doi.org/10.1109/ICIIS47346.2019.9063300
Fahl W, Holzheim T, Lange C, Decker S. Semantification of ceur-ws with wikidata as a target knowledge graph (2023)
Becker M, Korfhage K, Frank A. COCO-EX: A tool for linking concepts from texts to ConceptNet. In: Proc. 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, pp. 119–126 (2021)
Fang S, Huang Z, He M, Tong S, Huang X, Liu Y, Huang J, Liu Q. Guided Attention Network for Concept Extraction, vol. 2, pp. 1449–1455 (2021). https://doi.org/10.24963/ijcai.2021/200 . https://www.ijcai.org/proceedings/2021/200
Li J, Sun A, Han J, Li C. A Survey on Deep Learning for Named Entity Recognition. IEEE Trans Knowl Data Eng. 2020;34(1):50–70. https://doi.org/10.1109/TKDE.2020.2981314.
Canales RF, Murillo EC. Evaluation of Entity Recognition Algorithms in Short Texts. CLEI Electronic Journal. 2017;20(1):13.
SpazioDati . Dandelion API Semantic Text Analytics as a service (2012). http://www.dandelion.eu
Fetahu B, Kar S, Chen Z, Rokhlenko O, Malmasi S. SemEval-2023 task 2: Fine-grained multilingual named entity recognition (MultiCoNER 2). In: Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023), pp. 2247–2265. Association for Computational Linguistics, Toronto, Canada (2023). https://doi.org/10.18653/v1/2023.semeval-1.310 . https://aclanthology.org/2023.semeval-1.310
Gomaa WH, Fahmy AA. A Survey of Text Similarity Approaches. International Journal of Computer Applications. 2013;68(13):13–8. https://doi.org/10.5120/11638-7118.
Speer R, Chin J, Havasi C. ConceptNet 5.5: An Open Multilingual Graph of General Knowledge. In: Proc. 31st AAAI Conference on Artificial Intelligence - AAAI’17, pp. 4444–4451 (2017)
Yazdani M, Popescu-Belis A. Computing text semantic relatedness using the contents and links of a hypertext encyclopedia. Artif Intell. 2013;194:176–202. https://doi.org/10.1016/j.artint.2012.06.004.
Ni Y, Xu Q.K, Cao F, Mass Y, Sheinwald D, Zhu H.J, Cao S.S. Semantic Documents Relatedness using Concept Graph Representation. In: Proc. 9th ACM International Conference on Web Search and Data Mining, pp. 635–644 (2016). https://doi.org/10.1145/2835776.2835801
Resnik P. Using Information Content to Evaluate Semantic Similarity in a Taxonomy. In: Proc. 14th International Joint Conference on Artificial Intelligence - Volume 1 - IJCAI’95, pp. 448–453 (1995)
Piao G, Breslin J.G. Computing the semantic similarity of resources in dbpedia for recommendation purposes. In: Joint International Semantic Technology Conference, pp. 185–200. Springer, - (2015)
Leal J.P, Rodrigues V, Queirós R. Computing semantic relatedness using dbpedia. In: Proc. 1st Symposium on Languages, Applications and Technologies. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, - (2012)
Mikolov T, Sutskever I, Chen K, Corrado G.S, Dean J. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems 26 (2013)
Devlin J, Chang M.-W, Lee K, Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proc. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT, pp. 4171–4186 (2019). https://doi.org/10.18653/v1/n19-1423
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A.N, Kaiser L, Polosukhin I. Attention is All You Need. In: Proc. 31st International Conference on Neural Information Processing Systems - NIPS’17, pp. 6000–6010 (2017)
Lan Z, Chen M, Goodman S, Gimpel K, Sharma P, Soricut R. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019)
Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Sanh V, Debut L, Chaumond J, Wolf T. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv:1910.01108 (2019)
Joshi M, Levy O, Zettlemoyer L, Weld D. BERT for Coreference Resolution: Baselines and Analysis. In: Proc. 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 5803–5808. Association for Computational Linguistics, Hong Kong, China (2019). https://doi.org/10.18653/v1/D19-1588 . https://aclanthology.org/D19-1588
Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Proc. 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pp. 3982–3992 (2019)
Dickson B. How to use LLMs to create custom embedding models. https://bdtechtalks.com/2024/01/08/microsoft-llm-embeddings/
Jiang T, Huang S, Luan Z, Wang D, Zhuang F. Scaling Sentence Embeddings with Large Language Models (2023)
Becker M, Korfhage K, Paul D, Frank A. CO-NNECT: A Framework for Revealing Commonsense Knowledge Paths as Explicitations of Implicit Knowledge in Texts. In: Proc. 14th International Conference on Computational Semantics (IWCS), pp. 21–32 (2021). https://aclanthology.org/2021.iwcs-1.3
Dessì D, Osborne F, Recupero DR, Buscaldi D, Motta E. Generating Knowledge Graphs by Employing Natural Language Processing and Machine Learning Techniques within the Scholarly Domain. Futur Gener Comput Syst. 2021;116:253–64. https://doi.org/10.1016/j.future.2020.10.026.
Auer S, Oelen A, Haris M, Stocker M, D’Souza J, Farfar KE, Vogt L, Prinz M, Wiens V, Jaradeh MY. Improving access to scientific literature with knowledge graphs. Bibliothek Forschung und Praxis. 2020;44(3):516–29. https://doi.org/10.1515/bfp-2020-2042.
Ilkou E. Personal Knowledge Graphs: Use Cases in e-Learning Platforms. In: Proc. WWW ’22: Companion Proceedings of the Web Conference 2022, pp. 344–348 (2022). https://doi.org/10.1145/3487553.3524196
Blanco-Fernández Y, Gil-Solla A, Pazos-Arias JJ, Ramos-Cabrer M, Daif A, López-Nores M. Distracting users as per their knowledge: Combining linked open data and word embeddings to enhance history learning. Expert Syst Appl. 2020;143: 113051. https://doi.org/10.1016/j.eswa.2019.113051.
Li S, Xu E. Obsidian [Computer software] (2020). https://obsidian.md/
Vassbotn T, Kriken O. Tana [Computer Software] (2022). https://tana.inc/
Brown T.B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh,A, Ziegler D.M, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskever I, Amodei D. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs] (2020)
Mantel N. The detection of disease clustering and a generalized regression approach. Cancer research 27(2 Part 1), 209–220 (1967)
Swinscow T.D.V, Campbell M.J. Statistics at Square One. Bmj London, - (2002)
Burkhard R.A. Towards a framework and a model for knowledge visualization: Synergies between information and knowledge visualization. In: Knowledge and Information Visualization, pp. 238–255. Springer, - (2005)
Clark A, Chalmers D. The extended mind analysis. 1998;58(1):7–19.
Bush V. As we may think. The atlantic monthly. 1945;176(1):101–8.
Engelbart D. Augmenting society’s collective IQs. In: Proc. 15th ACM Conference on Hypertext and Hypermedia, p. 1 (2004)
Engelbart D.C. Toward high-performance organizations: A strategic role for groupware. In: Proc. GroupWare, vol. 92, pp. 3–5. Citeseer, - (1992)
Acknowledgements
This work was partly funded by FAPERJ under grant E-26/202.818/2017; by CAPES under grants 88881.310592-2018/01, 88881.134081/2016-01, and 88882.164913/2010-01; and by CNPq under grant 302303/2017-0.
Funding
The information is provided in the Acknowledgments above.
Author information
Authors and Affiliations
Contributions
The authors confirm contribution to the paper as follows: study conception and design: Felipe Poggi A. Fraga; analysis and interpretation of results: Felipe Poggi A. Fraga, Marcus Poggi; draft manuscript preparation: Marco A. Casanova, Luiz André P. Paes Leme. All authors reviewed the results and approved the final version of the manuscript.
Corresponding author
Ethics declarations
Conflict of Interest
All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript.
Research Involving Humans and/or Animals
The article does not involve experiments with humans and/or animals.
Informed Consent
The article does not involve experiments with humans.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Recent Trends on Enterprise Information Systems” guest edited by Joaquim Filipe, Michał Śmiałek, Alexander Brodsky and Slimane Hammoudi.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fraga, F.P.A., Poggi, M., Casanova, M.A. et al. Creating Automatic Connections for Personal Knowledge Management. SN COMPUT. SCI. 5, 525 (2024). https://doi.org/10.1007/s42979-024-02876-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-024-02876-4