Skip to main content

Advertisement

Log in

Building a knowledge graph by using cross-lingual transfer method and distributed MinIE algorithm on apache spark

  • S.I. : WorldCIST'20
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The simplest and effective way to store human knowledge through centuries was using text. Along with the advancement of technology nowadays, the volume of text has grown to be larger and larger. To extract useful information from this amount of text becomes an exceptionally complex task. As an effort to solve that problem, in this paper, we present a pipeline to extract core knowledge from large quantity text using distributed computing. The components of our pipeline are systems that were known to yield good results. The outputs of our proposed system are stored in a knowledge graph. A knowledge graph is a graph for storing knowledge in the form of triples (head, relation, tail). Some of the existing knowledge graphs in the world are Google knowledge graph, YAGO, DBLP, or DBpedia. These knowledge graphs have one thing in common—they are in English. The English language is studied by many researchers in the world and it had become a rich-resource language (with many natural language processing tools and data set). Vietnamese, on the other hand, is a low-resource language. Therefore, we use cross-lingual transfer method to build a Vietnamese knowledge graph. Firstly, we collect data in form of text about Vietnam tourism, which was written mostly in Vietnamese, using Google search and Wikipedia. In the next step, we translate them into English with Google Translate and use English Natural Language Processing tools like Stanford Parser, Co-referencing, ClausIE, MinIE to extract useful triples from this text. Lastly, the triples are translated back to Vietnamese to build a Vietnam tourism knowledge graph. Since we are working with massive text, we develop a distributed algorithm to extract triples from sentences of massive text. This is a distributed version of MinIE, which was originally developed for a single machine model. In Apache Spark framework, we divide massive text into many smaller parts and move them to the worker nodes with distributed MinIE function. Spark distributed MinIE will extract the triples of sentences in the local text of this worker node in parallel. Finally, the result of worker nodes will be sent back to the master node for building the knowledge graph. We conduct experiments with the distributed MinIE on spark cluster to prove the outperformance of our proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Hossain MS, Muhammad G, Abdul W, Song B et al (2018) Cloud-assisted secure video transmission and sharing framework for smart cities. Future Gener Comput Sys 83:596–606

    Article  Google Scholar 

  2. Dorgham O, Al-Rahamneh B, Almomani A, Khatatneh KF (2018) Enhancing the security of exchanging and storing DICOM medical images on the cloud. Int J Cloud Appl Compu (IJCAC) 8(1):154–172

    Google Scholar 

  3. Hossain K, Rahman M, Roy S (2019) Iot data compression and optimization techniques in cloud storage: current prospects and future directions. Int J Cloud Appl Compu (IJCAC) 9(2):43–59

    Google Scholar 

  4. Lazib L, Zhao Y, Qin B, Liu T (2019) Negation scope detection with recurrent neural networks models in review texts. Int J High Perform Comput Netw 13(2):211–221

    Article  Google Scholar 

  5. Al-Ayyoub M, Nuseir A, Alsmearat K, Jararweh Y, Gupta B (2018) Deep learning for Arabic NLP: a survey. J Comput Sci 26:522–531

    Article  Google Scholar 

  6. P Do, A System for Natural Language Interaction With the Heterogeneous Information Network, (2019) in Handbook of Research on Cloud Computing and Big Data Applications in IoT, IGI Global Publishing, 271–301.

  7. Caroro RA, Paredes RK, Lumasag JM (2020) Rules for Orthographic Word Parsing of the Philippines’ Cebuano-Visayan Language Using Context-Free Grammars. International J Softw Sci and Comput Intell (IJSSCI) 12(2):34–49

    Article  Google Scholar 

  8. Jadad HA, Touzene A, Day K, Alziedi N, Arafeh B (2019) Context-aware prediction model for offloading mobile application tasks to mobile cloud environments. International J Cloud Appl and Comput (IJCAC) 9(3):58–74

    Google Scholar 

  9. Al-Smadi M, Qawasmeh O, Al-Ayyoub M, Jararweh Y, Gupta B (2018) Deep recurrent neural network versus support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews. J comput sci 27:386–393

    Article  Google Scholar 

  10. Gavrilov AD, Jordache A, Vasdani M, Deng J (2018) Preventing model overfitting and underfitting in convolutional neural networks. International J Softw Sci and Comput Intell (IJSSCI) 10(4):19–28

    Article  Google Scholar 

  11. T Wolf, L Debut, V Sanh, J Chaumond, C Delangue, P Cistac, T Rault, R Louf, M Funtowicz and J Brew, (2019) "Transformers: State-of-the-art Natural Language Processing," ArXiv

  12. L Ehrlinger and W. Wöß, (2016) Towards a Definition of Knowledge Graphs

  13. R Yadav (2015) Spark Cookbook., Packt Publishing

  14. L Corro and R Gemulla, (2013) ClausIE: Clause-based open information extraction, WWW 2013 - Proceedings of the 22nd International Conference on World Wide Web, 355–366

  15. RGLdC Kiril Gashteovski, (2017) MinIE: Minimizing Facts in Open Information Extraction,in Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, EMNLP

  16. D Talb, Introducing the Natural Language Processing Library for Apache Spark, Posted in Engineering Blog , 19 yOctobe 2017 . [Online]. Available: https://databricks.com/blog/2017/10/19/introducing-natural-language-processing-library-apache-spark.html. [Accessed 12 4 2019].

  17. Kejriwal M (2019) Domain-specific knowledge graph construction. Springer, Heidelberg

    Book  Google Scholar 

  18. Lehmann J, Isele R, Jakob M, Jentzsch A, Kontokostas D, Mendes P, Hellmann S, Morsey M, Van Kleef P, Auer S, Bizer C (2014) DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant Web J 6(167):195

    Google Scholar 

  19. Andreas Blumauer and Helmut Nagy (2020) The Knowledge Graph CookBook, Semantic Web Company

  20. F Suchanek, G Kasneci and G Weikum, (2007) "YAGO: a core of semantic knowledge," 16th International World Wide Web Conference, WWW2007, 697–706

  21. TP Tanon, G Weikum and F Suchanek, (2020) YAGO 4: A Reason-able Knowledge Base, in The Semantic Web, 17th International Conference, ESWC 2020, Heraklion, Crete, Greece, May 31–June 4, 2020, Proceedings.

  22. "Spark NLP," 2019. [Online]. Available: https://nlp.johnsnowlabs.com/. [Accessed 12 Jan 2020].

  23. "Resolving coreference with neuralcoref," [Online]. Available: https://www.kaggle.com/mamamot/resolving-coreference-with-neuralcoref. [Accessed 12 5 2019].

  24. T. S. N. L. P. Group "The Stanford Parser: A statistical parser," [Online]. Available: https://nlp.stanford.edu/software/lex-parser.shtml. [Accessed 12 8 2017].

  25. D. L. Tomasz Drabas (2017) Learning PySpark., Packt Publishing

  26. Al-Qerem A, Alauthman M, Almomani A et al (2020) IoT transaction processing through cooperative concurrency control on fog–cloud computing environment. Soft Comput 24(8):5695–5711

    Article  Google Scholar 

  27. Bhushan K, Gupta BB (2019) Distributed denial of service (DDoS) attack mitigation in software defined network (SDN)-based cloud computing environment. J Ambient Intel Human Comput 10(5):1985–1997

    Article  Google Scholar 

Download references

Acknowledgements

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under the Grant Number DS2020-26-01

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed equally.

Corresponding author

Correspondence to Brij B. Gupta.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Do, P., Phan, T., Le, H. et al. Building a knowledge graph by using cross-lingual transfer method and distributed MinIE algorithm on apache spark. Neural Comput & Applic 34, 8393–8409 (2022). https://doi.org/10.1007/s00521-020-05495-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05495-1

Keywords

Navigation