Building a knowledge graph by using cross-lingual transfer method and distributed MinIE algorithm on apache spark

Do, Phuc; Phan, Trung; Le, Hung; Gupta, Brij B.

doi:10.1007/s00521-020-05495-1

Building a knowledge graph by using cross-lingual transfer method and distributed MinIE algorithm on apache spark

S.I. : WorldCIST'20
Published: 24 November 2020

Volume 34, pages 8393–8409, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Phuc Do¹,
Trung Phan¹,
Hung Le¹ &
…
Brij B. Gupta^2,3,4

649 Accesses
Explore all metrics

Abstract

The simplest and effective way to store human knowledge through centuries was using text. Along with the advancement of technology nowadays, the volume of text has grown to be larger and larger. To extract useful information from this amount of text becomes an exceptionally complex task. As an effort to solve that problem, in this paper, we present a pipeline to extract core knowledge from large quantity text using distributed computing. The components of our pipeline are systems that were known to yield good results. The outputs of our proposed system are stored in a knowledge graph. A knowledge graph is a graph for storing knowledge in the form of triples (head, relation, tail). Some of the existing knowledge graphs in the world are Google knowledge graph, YAGO, DBLP, or DBpedia. These knowledge graphs have one thing in common—they are in English. The English language is studied by many researchers in the world and it had become a rich-resource language (with many natural language processing tools and data set). Vietnamese, on the other hand, is a low-resource language. Therefore, we use cross-lingual transfer method to build a Vietnamese knowledge graph. Firstly, we collect data in form of text about Vietnam tourism, which was written mostly in Vietnamese, using Google search and Wikipedia. In the next step, we translate them into English with Google Translate and use English Natural Language Processing tools like Stanford Parser, Co-referencing, ClausIE, MinIE to extract useful triples from this text. Lastly, the triples are translated back to Vietnamese to build a Vietnam tourism knowledge graph. Since we are working with massive text, we develop a distributed algorithm to extract triples from sentences of massive text. This is a distributed version of MinIE, which was originally developed for a single machine model. In Apache Spark framework, we divide massive text into many smaller parts and move them to the worker nodes with distributed MinIE function. Spark distributed MinIE will extract the triples of sentences in the local text of this worker node in parallel. Finally, the result of worker nodes will be sent back to the master node for building the knowledge graph. We conduct experiments with the distributed MinIE on spark cluster to prove the outperformance of our proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Building a Knowledge Graph of Vietnam Tourism from Text

Chapter 7 Scalable Knowledge Graph Processing Using SANSA

Building a Large-Scale Cross-Lingual Knowledge Base from Heterogeneous Online Wikis

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Hossain MS, Muhammad G, Abdul W, Song B et al (2018) Cloud-assisted secure video transmission and sharing framework for smart cities. Future Gener Comput Sys 83:596–606
Article Google Scholar
Dorgham O, Al-Rahamneh B, Almomani A, Khatatneh KF (2018) Enhancing the security of exchanging and storing DICOM medical images on the cloud. Int J Cloud Appl Compu (IJCAC) 8(1):154–172
Google Scholar
Hossain K, Rahman M, Roy S (2019) Iot data compression and optimization techniques in cloud storage: current prospects and future directions. Int J Cloud Appl Compu (IJCAC) 9(2):43–59
Google Scholar
Lazib L, Zhao Y, Qin B, Liu T (2019) Negation scope detection with recurrent neural networks models in review texts. Int J High Perform Comput Netw 13(2):211–221
Article Google Scholar
Al-Ayyoub M, Nuseir A, Alsmearat K, Jararweh Y, Gupta B (2018) Deep learning for Arabic NLP: a survey. J Comput Sci 26:522–531
Article Google Scholar
P Do, A System for Natural Language Interaction With the Heterogeneous Information Network, (2019) in Handbook of Research on Cloud Computing and Big Data Applications in IoT, IGI Global Publishing, 271–301.
Caroro RA, Paredes RK, Lumasag JM (2020) Rules for Orthographic Word Parsing of the Philippines’ Cebuano-Visayan Language Using Context-Free Grammars. International J Softw Sci and Comput Intell (IJSSCI) 12(2):34–49
Article Google Scholar
Jadad HA, Touzene A, Day K, Alziedi N, Arafeh B (2019) Context-aware prediction model for offloading mobile application tasks to mobile cloud environments. International J Cloud Appl and Comput (IJCAC) 9(3):58–74
Google Scholar
Al-Smadi M, Qawasmeh O, Al-Ayyoub M, Jararweh Y, Gupta B (2018) Deep recurrent neural network versus support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. J comput sci 27:386–393
Article Google Scholar
Gavrilov AD, Jordache A, Vasdani M, Deng J (2018) Preventing model overfitting and underfitting in convolutional neural networks. International J Softw Sci and Comput Intell (IJSSCI) 10(4):19–28
Article Google Scholar
T Wolf, L Debut, V Sanh, J Chaumond, C Delangue, P Cistac, T Rault, R Louf, M Funtowicz and J Brew, (2019) "Transformers: State-of-the-art Natural Language Processing," ArXiv
L Ehrlinger and W. Wöß, (2016) Towards a Definition of Knowledge Graphs
R Yadav (2015) Spark Cookbook., Packt Publishing
L Corro and R Gemulla, (2013) ClausIE: Clause-based open information extraction, WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web, 355–366
RGLdC Kiril Gashteovski, (2017) MinIE: Minimizing Facts in Open Information Extraction,in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP
D Talb, Introducing the Natural Language Processing Library for Apache Spark, Posted in Engineering Blog , 19 yOctobe 2017 . [Online]. Available: https://databricks.com/blog/2017/10/19/introducing-natural-language-processing-library-apache-spark.html. [Accessed 12 4 2019].
Kejriwal M (2019) Domain-specific knowledge graph construction. Springer, Heidelberg
Book Google Scholar
Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes P, Hellmann S, Morsey M, Van Kleef P, Auer S, Bizer C (2014) DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant Web J 6(167):195
Google Scholar
Andreas Blumauer and Helmut Nagy (2020) The Knowledge Graph CookBook, Semantic Web Company
F Suchanek, G Kasneci and G Weikum, (2007) "YAGO: a core of semantic knowledge," 16th International World Wide Web Conference, WWW2007, 697–706
TP Tanon, G Weikum and F Suchanek, (2020) YAGO 4: A Reason-able Knowledge Base, in The Semantic Web, 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31–June 4, 2020, Proceedings.
"Spark NLP," 2019. [Online]. Available: https://nlp.johnsnowlabs.com/. [Accessed 12 Jan 2020].
"Resolving coreference with neuralcoref," [Online]. Available: https://www.kaggle.com/mamamot/resolving-coreference-with-neuralcoref. [Accessed 12 5 2019].
T. S. N. L. P. Group "The Stanford Parser: A statistical parser," [Online]. Available: https://nlp.stanford.edu/software/lex-parser.shtml. [Accessed 12 8 2017].
D. L. Tomasz Drabas (2017) Learning PySpark., Packt Publishing
Al-Qerem A, Alauthman M, Almomani A et al (2020) IoT transaction processing through cooperative concurrency control on fog–cloud computing environment. Soft Comput 24(8):5695–5711
Article Google Scholar
Bhushan K, Gupta BB (2019) Distributed denial of service (DDoS) attack mitigation in software defined network (SDN)-based cloud computing environment. J Ambient Intel Human Comput 10(5):1985–1997
Article Google Scholar

Download references

Acknowledgements

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under the Grant Number DS2020-26-01

Author information

Authors and Affiliations

University of Information Technology, Vietnam National University, Ho Chi Minh City, Vietnam
Phuc Do, Trung Phan & Hung Le
National Institute of Technology, Kurukshetra, India
Brij B. Gupta
Department of Computer Science and Information Engineering, Asia University, Taichung, Taiwan
Brij B. Gupta
Staffordshire University, Stoke-on-Trent, ST4 2DE, UK
Brij B. Gupta

Authors

Phuc Do
View author publications
You can also search for this author inPubMed Google Scholar
Trung Phan
View author publications
You can also search for this author inPubMed Google Scholar
Hung Le
View author publications
You can also search for this author inPubMed Google Scholar
Brij B. Gupta
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors contributed equally.

Corresponding author

Correspondence to Brij B. Gupta.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Do, P., Phan, T., Le, H. et al. Building a knowledge graph by using cross-lingual transfer method and distributed MinIE algorithm on apache spark. Neural Comput & Applic 34, 8393–8409 (2022). https://doi.org/10.1007/s00521-020-05495-1

Download citation

Received: 18 June 2020
Accepted: 27 October 2020
Published: 24 November 2020
Issue Date: June 2022
DOI: https://doi.org/10.1007/s00521-020-05495-1

Keywords

Part of a collection:

S.I. : WorldCIST'20

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Building a knowledge graph by using cross-lingual transfer method and distributed MinIE algorithm on apache spark

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Building a Knowledge Graph of Vietnam Tourism from Text

Chapter 7 Scalable Knowledge Graph Processing Using SANSA

Building a Large-Scale Cross-Lingual Knowledge Base from Heterogeneous Online Wikis

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now