Skip to main content

PCSG: Pattern-Coverage Snippet Generation for RDF Datasets

  • Conference paper
  • First Online:
The Semantic Web – ISWC 2021 (ISWC 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12922))

Included in the following conference series:

Abstract

For reusing an RDF dataset, understanding its content is a prerequisite. To support the comprehension of its large and complex structure, existing methods mainly generate an abridged version of an RDF dataset by extracting representative data patterns as a summary. As a complement, recent attempts extract a representative subset of concrete data as a snippet. We extend this line of research by injecting the strength of summary into snippet. We propose to generate a pattern-coverage snippet that best exemplifies the patterns of entity descriptions and links in an RDF dataset. Our approach incorporates formulations of group Steiner tree and set cover problems to generate compact snippets. This extensible approach is also capable of modeling query relevance to be used with dataset search. Experiments on thousands of real RDF datasets demonstrate the effectiveness and practicability of our approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Campinas, S., Delbru, R., Tummarello, G.: Efficiency and precision trade-offs in graph summary algorithms. In: IDEAS 2013, pp. 38–47 (2013)

    Google Scholar 

  2. Čebirić, Š., et al.: Summarizing semantic graphs: a survey. VLDB J. 28(3), 295–327 (2018). https://doi.org/10.1007/s00778-018-0528-3

  3. Chapman, A., et al.: Dataset search: a survey. VLDB J. 29(1), 251–272 (2019). https://doi.org/10.1007/s00778-019-00564-x

    Article  Google Scholar 

  4. Chen, J., Wang, X., Cheng, G., Kharlamov, E., Qu, Y.: Towards more usable dataset search: from query characterization to snippet generation. In: CIKM 2019, pp. 2445–2448 (2019)

    Google Scholar 

  5. Chen, Z., Jia, H., Heflin, J., Davison, B.D.: Leveraging schema labels to enhance dataset search. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 267–280. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_18

    Chapter  Google Scholar 

  6. Cheng, G., Jin, C., Ding, W., Xu, D., Qu, Y.: Generating illustrative snippets for open data on the web. WSDM 2017, 151–159 (2017)

    Article  Google Scholar 

  7. Cheng, G., Jin, C., Qu, Y.: HIEDS: a generic and efficient approach to hierarchical dataset summarization. In: IJCAI 2016, pp. 3705–3711 (2016)

    Google Scholar 

  8. Ellefi, M.B., et al.: RDF dataset profiling - a survey of features, methods, vocabularies and applications. Semant. Web 9(5), 677–705 (2018)

    Article  Google Scholar 

  9. Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)

    Article  MathSciNet  Google Scholar 

  10. Fokoue, A., Meneguzzi, F., Sensoy, M., Pan, J.Z.: Querying linked ontological data through distributed summarization. In: AAAI 2012 (2012)

    Google Scholar 

  11. Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K., Umbrich, J.: Data summaries for on-demand queries over linked data. In: WWW 2010 (2010)

    Google Scholar 

  12. Heling, L., Acosta, M.: Estimating characteristic sets for RDF dataset profiles based on sampling. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 157–175. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_10

    Chapter  Google Scholar 

  13. Kacprzak, E., Koesten, L., Tennison, J., Simperl, E.: Characterising dataset search queries. In: WWW 2018, pp. 1485–1488 (2018)

    Google Scholar 

  14. Khatchadourian, S., Consens, M.P.: ExpLOD: summary-based exploration of interlinking and RDF usage in the linked open data cloud. In: Aroyo, L., et al. (eds.) ESWC 2010. LNCS, vol. 6089, pp. 272–287. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13489-0_19

    Chapter  Google Scholar 

  15. Liu, D., Cheng, G., Liu, Q., Qu, Y.: Fast and practical snippet generation for RDF datasets. ACM Trans. Web 13(4), 19:1–19:38 (2019)

    Google Scholar 

  16. Liu, Q., Cheng, G., Gunaratna, K., Qu, Y.: Entity summarization: State of the art and future challenges. CoRR abs/1910.08252 (2019)

    Google Scholar 

  17. Palmonari, M., Rula, A., Porrini, R., Maurino, A., Spahiu, B., Ferme, V.: ABSTAT: linked data summaries with abstraction and statistics. In: ESWC 2015 Satellite Events, pp. 128–132 (2015)

    Google Scholar 

  18. Pan, J.Z.: Resource description framework. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies. IHIS, pp. 71–90. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-92673-3_3

    Chapter  Google Scholar 

  19. Parvizi, A., Mellish, C., van Deemter, K., Ren, Y., Pan, J.Z.: Selecting ontology entailments for presentation to users. In: KEOD 2014, pp. 382–387 (2014)

    Google Scholar 

  20. Rietveld, L., Hoekstra, R., Schlobach, S., Guéret, C.: Structural properties as proxy for semantic relevance in RDF graph sampling. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8797, pp. 81–96. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11915-1_6

    Chapter  Google Scholar 

  21. Safavi, T., Belth, C., Faber, L., Mottin, D., Müller, E., Koutra, D.: Personalized knowledge graph summarization: from the cloud to your pocket. In: ICDM 2019, pp. 528–537 (2019)

    Google Scholar 

  22. Shi, Y., Cheng, G., Kharlamov, E.: Keyword search over knowledge graphs via static and dynamic hub labelings. In: WWW 2020, pp. 235–245 (2020)

    Google Scholar 

  23. Song, Q., Wu, Y., Lin, P., Dong, X., Sun, H.: Mining summaries for knowledge graph search. IEEE Trans. Knowl. Data Eng. 30(10), 1887–1900 (2018)

    Article  Google Scholar 

  24. Spahiu, B., Porrini, R., Palmonari, M., Rula, A., Maurino, A.: ABSTAT: ontology-driven linked data summaries with pattern minimalization. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 381–395. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_51

    Chapter  Google Scholar 

  25. Wang, K., Wang, Z., Topor, R.W., Pan, J.Z., Antoniou, G.: Eliminating concepts and roles from ontologies in expressive descriptive logics. Comput. Intell. 30(2), 205–232 (2014)

    Article  MathSciNet  Google Scholar 

  26. Wang, X., et al.: A framework for evaluating snippet generation for dataset search. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11778, pp. 680–697. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30793-6_39

    Chapter  Google Scholar 

  27. Wang, X., Cheng, G., Kharlamov, E.: Towards multi-facet snippets for dataset search. In: PROFLILES & SemEx 2019, pp. 1–6 (2019)

    Google Scholar 

  28. Zneika, M., Lucchese, C., Vodislav, D., Kotzinos, D.: Summarizing linked data RDF graphs using approximate graph pattern mining. In: EDBT 2016, pp. 684–685 (2016)

    Google Scholar 

  29. Zneika, M., Vodislav, D., Kotzinos, D.: Quality metrics for RDF graph summarization. Semant. Web 10(3), 555–584 (2019)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the NSFC (62072224).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gong Cheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, X. et al. (2021). PCSG: Pattern-Coverage Snippet Generation for RDF Datasets. In: Hotho, A., et al. The Semantic Web – ISWC 2021. ISWC 2021. Lecture Notes in Computer Science(), vol 12922. Springer, Cham. https://doi.org/10.1007/978-3-030-88361-4_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-88361-4_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-88360-7

  • Online ISBN: 978-3-030-88361-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics