Skip to main content
Log in

Efficient rule mining and compression for RDF style KB based on Horn rules

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The volume of published linked open datasets in RDF format is growing exponentially in the last decades. With this continuous proliferation of this growth, demands for managing, accessing, and compressing the RDF dataset have become increasingly important. Most approaches are focused on the structured compression technique while a very few researches have been done for compact representation of the RDF dataset. In this paper, we have proposed an efficient rule mining and compression approach for RDF datasets through various meaningful semantic association rules determined from the RDF graph. We have introduced grammar-based pattern system, clustering of rules, rules pruning, and Top-k scheme to improve the expressiveness of rule patterns, identify the similarity within the random pair of rules, extract the most delicate rules, find the accurate mining threshold, and efficiently learn the rules during the rule mining process from RDF Knowledge Base. Our proposed system uses Horn rules to achieve better compression through storing the triples matched with the precedent part while deleting the triples matched with the head part of the rules. For decreasing the mining time, we have introduced the ranking of the rules. The experimental result on the benchmark dataset asserts that our proposed rule mining and compression scheme has achieved approximately 22.10%, 40.5%, and 44% better compression than the exiting AMIE+, Rule-based compression, and TripleBit approaches, respectively. Our system also has achieved better performance both in terms of compression time and rule mining cost.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. https://www.cs.ox.ac.uk/activities/programinduction/Aleph/aleph_toc.html.

  2. https://github.com/greatsky13/TripleBit.

  3. https://dl.dropbox.com/u/65933145/rbc_download.

  4. https://github.com/lajus/amie.

  5. http://linkeddatacatalog.dws.informatik.uni-mannheim.de/dataset/semantic-web-dog-food.

  6. http://data.archiveshub.ac.uk/.

  7. https://www.cs.toronto.edu/oktie/linkedmdb/.

  8. https://www.rdfhdt.org/datasets/.

  9. http://swat.cse.lehigh.edu/projects/lubm/.

References

  1. Manola F, Miller E, McBride B et al (2004) RDF primer. W3C Recomm 10(1–107):6

    Google Scholar 

  2. Huang J, Abadi DJ, Ren K (2011) Scalable SPARQL querying of large RDF graphs. Proc VLDB Endow 4(11):1123–1134

    Article  Google Scholar 

  3. He H, Balakrishnan A, Eric M, Liang P (2017) Learning symmetric collaborative dialogue agents with dynamic knowledge graph embeddings. arXiv preprint arXiv:1704.07130

  4. Young T, Cambria E, Chaturvedi I, Zhou H, Biswas S, Huang M (2018) Augmenting end-to-end dialogue systems with commonsense knowledge. In: Thirty-Second AAAI Conference on Artificial Intelligence

  5. Berant J, Chou A, Frostig R, Liang P (2013) Semantic parsing on freebase from question-answer pairs. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp 1533–1544

  6. Lopez V, Unger C, Cimiano P, Motta E (2013) Evaluating question answering over linked data. J Web Semant 21:3–13

    Article  Google Scholar 

  7. Blog GO (2012) Introducing the knowledge graph: thing, not strings. https://blog.google/products/search/introducingknowledge-graph-things-not.html

  8. Maillot P, Bobed C (2018) Measuring structural similarity between RDF graphs. In: Proceedings of the 33rd Annual ACM Symposium on Applied Computing, pp 1960–1967

  9. Álvarez-García S, Brisaboa N, Fernández JD, Martínez-Prieto MA, Navarro G (2015) Compressed vertical partitioning for efficient RDF management. Knowl Inf Syst 44(2):439–474

    Article  Google Scholar 

  10. Fernández JD, Martínez-Prieto MA, Gutiérrez C, Polleres A, Arias M (2013) Binary RDF representation for publication and exchange (HDT). J Web Semant 19:22–41

    Article  Google Scholar 

  11. Joshi AK, Hitzler P, Dong G (2013) Logical linked data compression. In: Extended Semantic Web Conference. Springer, pp 170–184

  12. Völker J, Niepert M (2011) Statistical schema induction. In: Extended Semantic Web Conference. Springer, pp 124–138

  13. Fleischhacker D, Völker J, Stuckenschmidt H (2012) Mining RDF data for property axioms. In: OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”. Springer, pp 718–735

  14. Vanhoof K, Depaire B (2010) Structure of association rule classifiers: a review. In: 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering. IEEE, pp 9–12

  15. Hand DJ (2002) Pattern detection and discovery. In: Pattern Detection and Discovery. Springer, pp 1–12

  16. Barati M, Bai Q, Liu Q (2017) Mining semantic association rules from RDF data. Knowl Based Syst 133:183–196

    Article  Google Scholar 

  17. Lehmann J, Sejdiu G, Bühmann L, Westphal P, Stadler C, Ermilov I, Bin S, Chakraborty N, Saleem M, Ngonga Ngomo AC et al (2017) Distributed semantic analytics using the Sansa stack. In: International Semantic Web Conference. Springer, pp 147–155

  18. Sultana T, Lee YK (2021) Expressive rule pattern based compression with ranking in Horn rules on RDF style kb. In: 2021 IEEE International Conference on Big Data and Smart Computing (BigComp). IEEE, pp 13–19

  19. Fernández JD, Gutierrez C, Martínez-Prieto MA (2010) RDF compression: basic approaches. In: Proceedings of the 19th International Conference on World Wide Web, pp 1091–1092

  20. Beckett D, McBride B (2004) RDF/XML syntax specification (revised). W3C Recomm 10(2.3). https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.454.3972&rep=rep1&type=pdf

  21. Alexander K (2008) RDF in JSON: a specification for serialising RDF in JSON. In: SFSW 2008

  22. Hernández-Illera A, Martínez-Prieto MA, Fernández J (2015) Serializing RDF in compressed space, vol 2015. https://doi.org/10.1109/DCC.2015.16

  23. Sultana T, Qudus U, Umair M, Kim T, Morshed MG, Lee YK (2020) Efficient frequent pattern management and compression system in multiple named graphs, Proc. of the KIISE Korea Computer Congress 2020, (KCC 2020), July 2-4, pp 38−40, Busan, South Korea, 2020

  24. Besta M, Hoefler T (2018) Survey and taxonomy of lossless graph compression and space-efficient graph representations. arXiv preprint arXiv:1806.01799

  25. Hernández-Illera A, Martínez-Prieto MA, Fernández J (2019) RDF-TR: exploiting structural redundancies to boost RDF compression. Inf Sci. https://doi.org/10.1016/j.ins.2019.08.081

    Article  Google Scholar 

  26. Lee W, Song JJ, Lee CC, Jo TC, Lee JJ (2021) Graph threshold algorithm. J Supercomput 77(9):9827–9847

    Article  Google Scholar 

  27. Bok K, Han J, Lim J, Yoo J (2019) Provenance compression scheme based on graph patterns for large RDF documents. J Supercomput 76(8):6376–6398

    Article  Google Scholar 

  28. Boldi P, Vigna S (2004) The webgraph framework I: compression techniques. In: Proceedings of the 13th International Conference on World Wide Web, pp 595–602

  29. Neumann T, Weikum G (2010) The RDF-3x engine for scalable management of RDF data. VLDB J 19(1):91–113

    Article  Google Scholar 

  30. Yuan P, Liu P, Wu B, Jin H, Zhang W, Liu L (2013) Triplebit: a fast and compact system for large scale RDF data. Proc VLDB Endow 6(7):517–528

    Article  Google Scholar 

  31. Lee J, Moon D, Kim I, Lee Y (2019) A semantic approach to improving machine readability of a large-scale attack graph. J Supercomput 75(6):3028–3045

    Article  Google Scholar 

  32. Brisaboa NR, Cerdeira-Pena A, Farina A, Navarro G (2015) A compact RDF store using suffix arrays. In: International Symposium on String Processing and Information Retrieval. Springer, pp 103–115

  33. Sadakane K (2003) New text indexing functionalities of the compressed suffix arrays. J Algorithms 48(2):294–313

    Article  MathSciNet  Google Scholar 

  34. Pibiri GE, Perego R, Venturini R (2020) Compressed indexes for fast search of semantic data. IEEE Trans Knowl Data Eng

  35. Um JH, Lee S, Kim TH, Jeong CH, Song SK, Jung H (2016) Distributed RDF store for efficient searching billions of triples based on Hadoop. J Supercomput 72(5):1825–1840

    Article  Google Scholar 

  36. Buehrer G, Chellapilla K (2008) A scalable pattern mining approach to web graph compression with communities. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp 95–106

  37. Da Silva IAR, Calinescu G, De Graaf N (2020) Faster compression of patterns to rectangle rule lists. Theor Comput Sci 828:1–18

    MathSciNet  MATH  Google Scholar 

  38. Applegate DA, Calinescu G, Johnson DS, Karloff H, Ligett K, Wang J (2007) Compressing rectilinear pictures and minimizing access control list. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms. Society for Industrial and Applied Mathematics, Philadelphia, PA, pp 1066–1075

  39. Röder M, Frerk P, Conrads F, Ngomo ACN (2021) Applying grammar-based compression to RDF. In: European Semantic Web Conference. Springer, pp 93–108

  40. Maneth S, Peternek F (2018) Grammar-based graph compression. Inf Syst 76:19–45

    Article  Google Scholar 

  41. Alagarsamy R, Sahaaya Arul Mary S (2020) Intelligent rule-based approach for effective information retrieval and dynamic storage in local repositories. J Supercomput 76(6):3984–3998

    Article  Google Scholar 

  42. Wookey Lee CCLTCJJJHL, Song JJS (2021) Graph threshold algorithm. J Supercomput 77:9827–9847

    Article  Google Scholar 

  43. Pan JZ, Pérez JMG, Ren Y, Wu H, Wang H, Zhu M (2014) Graph pattern based RDF data compression. In: Joint International Semantic Technology Conference. Springer, pp 239–256

  44. Sung M, Su H, Yu R, Guibas LJ (2018) Deep functional dictionaries: learning consistent semantic structures on 3d models from functions. In: Advances in Neural Information Processing Systems, vol 31

  45. Karim F, Vidal ME, Auer S (2020) Compacting frequent star patterns in RDF graphs. J Intell Inf Syst 55(3):561–585

    Article  Google Scholar 

  46. Zhou D, Ouyang M, Kuang Z, Li Z, Zhou JP, Cheng X (2019) Incremental association rule mining based on matrix compression for edge computing. IEEE Access 7:173044–173053

    Article  Google Scholar 

  47. Dehaspe L, Toivonen H (2001) Discovery of relational association rules. In: Relational Data Mining. Springer, Berlin, Heidelberg. pp 189–212. https://doi.org/10.1007/978-3-662-04599-2_8

  48. Galárraga LA, Teflioudi C, Hose K, Suchanek F (2013) Amie: association rule mining under incomplete evidence in ontological knowledge bases. In: Proceedings of the 22nd International Conference on World Wide Web, pp 413–422

  49. Meesala SR, Subramanian S (2022) Feature based opinion analysis on social media tweets with association rule mining and multi-objective evolutionary algorithms. Concurr Comput Pract Exp 34(3):e6586

    Article  Google Scholar 

  50. Gayathri V, Kumar PS (2015) Horn-rule based compression technique for RDF data. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp 396–401

  51. Galárraga L, Teflioudi C, Hose K, Suchanek FM (2015) Fast rule mining in ontological knowledge bases with AMIE++. VLDB J 24(6):707–730

    Article  Google Scholar 

  52. Schoenmackers S, Davis J, Etzioni O, Weld D (2010) Learning first-order horn clauses from web text. In: Proceedings of the 2010 Conference on Empirical Methods on Natural Language Processing, pp 1088–1098

  53. Hahsler M, Chelluboina S, Hornik K, Buchta C (2011) The arules R-package ecosystem: analyzing interesting patterns from large transaction data sets. J Mach Learn Res 12:2021–2025

    MATH  Google Scholar 

  54. Lajus J, Galárraga L, Suchanek F (2020) Fast and exact rule mining with AMIE 3. In: European Semantic Web Conference. Springer, pp 36–52

  55. Beckett D, Berners-Lee T, Prud’hommeaux E, Carothers G (2014) RDF 1.1 turtle. World Wide Web Consortium, pp 18–31

Download references

Acknowledgments

This work was supported by Institute for Information & communications Technology Promotion(IITP) grant funded by the Korea government(MSIP) (No. IITP-2021-2021-0-00859, Development of a distributed graph DBMS for intelligent processing of big graphs).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Young-Koo Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sultana, T., Lee, YK. Efficient rule mining and compression for RDF style KB based on Horn rules. J Supercomput 78, 16553–16580 (2022). https://doi.org/10.1007/s11227-022-04519-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04519-y

Keywords

Navigation