Skip to main content

Towards Mining Generalized Patterns from RDF Data and a Domain Ontology

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2021)

Abstract

Nowadays, linked data (LD) are ubiquitous and mining them for knowledge, e.g. frequent patterns, needs not be argued for.

A domain ontology (DO) on top of a LD dataset enables the discovery of abstract patterns, a.k.a. generalized, capturing –rather than identical sub-structures–conceptual regularities in data. Yet with the resulting ontologically-generalized graph patterns (OGP), a miner faces the combined challenges of graph topology and a label hierarchy, which amplifies well-known difficulties with graphs such as support counting or non redundant pattern listing. As OGP mining is yet to be addressed in its generality, we propose a formalization and study two workaround methods that avoid tackling it head-on, i.e. deal with each aspect separately. Both perform pure graph mining with adapted label sets: gSpan-OF merely strips labels of hierarchical structure while Tax-ON first mines frequent graph topologies with only root classes as labels, then successively refines labels on each topology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Pattern labels in \({\mathcal L}_p\) will be prefixed by & to differentiate them from ontology entities.

References

  1. Adda, M., et al.: On the discovery of semantically enhanced sequential patterns. In: 4th ICMLA, p. 8. IEEE (2005)

    Google Scholar 

  2. Adda, M., et al.: A framework for mining meaningful usage patterns within a semantically enhanced web portal. In: 3rd C* Conference on Computer Science and Software Engineering, pp. 138–147 (2010)

    Google Scholar 

  3. Aggarwal, C.C., Han, J. (eds.): Frequent Pattern Mining. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07821-2

    Book  MATH  Google Scholar 

  4. Anand, S., et al.: The role of domain knowledge in data mining. In: CIKM, pp. 37–43 (1995)

    Google Scholar 

  5. Berendt, B.: Using and learning semantics in frequent subgraph mining. In: Nasraoui, O., Zaïane, O., Spiliopoulou, M., Mobasher, B., Masand, B., Yu, P.S. (eds.) WebKDD 2005. LNCS (LNAI), vol. 4198, pp. 18–38. Springer, Heidelberg (2006). https://doi.org/10.1007/11891321_2

    Chapter  Google Scholar 

  6. Brett, D., et al.: A survey of semantic web technology for agriculture. Inf. Process. Agric. 6, 487–501 (2019)

    Google Scholar 

  7. Cakmak, A., Ozsoyoglu, G.: Taxonomy-superimposed graph mining. In: 11th EDBT, pp. 217–228. ACM (2008)

    Google Scholar 

  8. Cannataro, M., Santos, R.D., et al.: Biomedical and bioinformatics challenges to computer science. Procedia Comput. Sci. 1(1), 931–933 (2010)

    Article  Google Scholar 

  9. Dou, D., et al.: Semantic data mining: a survey of ontology-based approaches. In: IEEE ICSC, pp. 244–251 (2015)

    Google Scholar 

  10. Fuentes, V., et al.: Dairy ontology to support precision farming. In: 12th ICBO (2021)

    Google Scholar 

  11. Gonçalves Frasco, C., et al.: Towards an effective decision-making system based on cow profitability using deep learning. In: 12th ICAART, pp. 949–958 (2020)

    Google Scholar 

  12. Inokuchi, A.: Mining generalized substructures from a set of labeled graphs. In: Fourth IEEE International Conference on Data Mining (ICDM 2004), pp. 415–418. IEEE (2004)

    Google Scholar 

  13. Inokuchi, A., Washio, T., Motoda, H.: An apriori-based algorithm for mining frequent substructures from graph data. In: Zighed, D.A., Komorowski, J., Żytkow, J. (eds.) PKDD 2000. LNCS (LNAI), vol. 1910, pp. 13–23. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45372-5_2

    Chapter  Google Scholar 

  14. Jiang, T., et al.: Mining generalized associations of semantic relations from textual web content. IEEE Trans. Knowl. Data Eng. 19(2), 164–179 (2007)

    Article  Google Scholar 

  15. Kramer, F., Beißbarth, T.: Working with ontologies. In: Keith, J.M. (ed.) Bioinformatics. MMB, vol. 1525, pp. 123–135. Springer, New York (2017). https://doi.org/10.1007/978-1-4939-6622-6_6

    Chapter  Google Scholar 

  16. Martin, T., et al.: Leveraging a domain ontology in (neural) learning from heterogeneous data. In: CIKM (Workshops) (2020)

    Google Scholar 

  17. Monnin, P.: Matching and mining in knowledge graphs of the web of data-applications in pharmacogenomics. Ph.D. thesis, Université de Lorraine (2020)

    Google Scholar 

  18. Nijssen, S., Kok, J.: A quickstart in frequent structure mining can make a difference. In: 10th ACM KDD, pp. 647–652 (2004)

    Google Scholar 

  19. Rettinger, A., et al.: Mining the semantic web. DMKD 24(3), 613–662 (2012)

    MATH  Google Scholar 

  20. Srikant, R., Agrawal, R.: Mining generalized association rules. Futur. Gener. Comput. Syst. 13(2–3), 161–180 (1997)

    Article  Google Scholar 

  21. Szathmary, L., et al.: Towards rare itemset mining. In: 19th IEEE ICTAI, vol. 1, pp. 305–312, October 2007

    Google Scholar 

  22. Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: IEEE ICDM, pp. 721–724 (2002)

    Google Scholar 

  23. Zhang, X., et al.: Mining link patterns in linked data. In: 13th WAIM, pp. 83–94 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tomas Martin .

Editor information

Editors and Affiliations

Appendix

Appendix

Fig. 6.
figure 6

gSpan’s flattened exploration of \(\mathcal {L}_p\)

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Martin, T. et al. (2021). Towards Mining Generalized Patterns from RDF Data and a Domain Ontology. In: Kamp, M., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2021. Communications in Computer and Information Science, vol 1524. Springer, Cham. https://doi.org/10.1007/978-3-030-93736-2_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-93736-2_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-93735-5

  • Online ISBN: 978-3-030-93736-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics