Abstract
For reusing an RDF dataset, understanding its content is a prerequisite. To support the comprehension of its large and complex structure, existing methods mainly generate an abridged version of an RDF dataset by extracting representative data patterns as a summary. As a complement, recent attempts extract a representative subset of concrete data as a snippet. We extend this line of research by injecting the strength of summary into snippet. We propose to generate a pattern-coverage snippet that best exemplifies the patterns of entity descriptions and links in an RDF dataset. Our approach incorporates formulations of group Steiner tree and set cover problems to generate compact snippets. This extensible approach is also capable of modeling query relevance to be used with dataset search. Experiments on thousands of real RDF datasets demonstrate the effectiveness and practicability of our approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Campinas, S., Delbru, R., Tummarello, G.: Efficiency and precision trade-offs in graph summary algorithms. In: IDEAS 2013, pp. 38–47 (2013)
Čebirić, Š., et al.: Summarizing semantic graphs: a survey. VLDB J. 28(3), 295–327 (2018). https://doi.org/10.1007/s00778-018-0528-3
Chapman, A., et al.: Dataset search: a survey. VLDB J. 29(1), 251–272 (2019). https://doi.org/10.1007/s00778-019-00564-x
Chen, J., Wang, X., Cheng, G., Kharlamov, E., Qu, Y.: Towards more usable dataset search: from query characterization to snippet generation. In: CIKM 2019, pp. 2445–2448 (2019)
Chen, Z., Jia, H., Heflin, J., Davison, B.D.: Leveraging schema labels to enhance dataset search. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 267–280. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_18
Cheng, G., Jin, C., Ding, W., Xu, D., Qu, Y.: Generating illustrative snippets for open data on the web. WSDM 2017, 151–159 (2017)
Cheng, G., Jin, C., Qu, Y.: HIEDS: a generic and efficient approach to hierarchical dataset summarization. In: IJCAI 2016, pp. 3705–3711 (2016)
Ellefi, M.B., et al.: RDF dataset profiling - a survey of features, methods, vocabularies and applications. Semant. Web 9(5), 677–705 (2018)
Feige, U.: A threshold of ln n for approximating set cover. J. ACM 45(4), 634–652 (1998)
Fokoue, A., Meneguzzi, F., Sensoy, M., Pan, J.Z.: Querying linked ontological data through distributed summarization. In: AAAI 2012 (2012)
Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K., Umbrich, J.: Data summaries for on-demand queries over linked data. In: WWW 2010 (2010)
Heling, L., Acosta, M.: Estimating characteristic sets for RDF dataset profiles based on sampling. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 157–175. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_10
Kacprzak, E., Koesten, L., Tennison, J., Simperl, E.: Characterising dataset search queries. In: WWW 2018, pp. 1485–1488 (2018)
Khatchadourian, S., Consens, M.P.: ExpLOD: summary-based exploration of interlinking and RDF usage in the linked open data cloud. In: Aroyo, L., et al. (eds.) ESWC 2010. LNCS, vol. 6089, pp. 272–287. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13489-0_19
Liu, D., Cheng, G., Liu, Q., Qu, Y.: Fast and practical snippet generation for RDF datasets. ACM Trans. Web 13(4), 19:1–19:38 (2019)
Liu, Q., Cheng, G., Gunaratna, K., Qu, Y.: Entity summarization: State of the art and future challenges. CoRR abs/1910.08252 (2019)
Palmonari, M., Rula, A., Porrini, R., Maurino, A., Spahiu, B., Ferme, V.: ABSTAT: linked data summaries with abstraction and statistics. In: ESWC 2015 Satellite Events, pp. 128–132 (2015)
Pan, J.Z.: Resource description framework. In: Staab, S., Studer, R. (eds.) Handbook on Ontologies. IHIS, pp. 71–90. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-92673-3_3
Parvizi, A., Mellish, C., van Deemter, K., Ren, Y., Pan, J.Z.: Selecting ontology entailments for presentation to users. In: KEOD 2014, pp. 382–387 (2014)
Rietveld, L., Hoekstra, R., Schlobach, S., Guéret, C.: Structural properties as proxy for semantic relevance in RDF graph sampling. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8797, pp. 81–96. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11915-1_6
Safavi, T., Belth, C., Faber, L., Mottin, D., Müller, E., Koutra, D.: Personalized knowledge graph summarization: from the cloud to your pocket. In: ICDM 2019, pp. 528–537 (2019)
Shi, Y., Cheng, G., Kharlamov, E.: Keyword search over knowledge graphs via static and dynamic hub labelings. In: WWW 2020, pp. 235–245 (2020)
Song, Q., Wu, Y., Lin, P., Dong, X., Sun, H.: Mining summaries for knowledge graph search. IEEE Trans. Knowl. Data Eng. 30(10), 1887–1900 (2018)
Spahiu, B., Porrini, R., Palmonari, M., Rula, A., Maurino, A.: ABSTAT: ontology-driven linked data summaries with pattern minimalization. In: Sack, H., Rizzo, G., Steinmetz, N., Mladenić, D., Auer, S., Lange, C. (eds.) ESWC 2016. LNCS, vol. 9989, pp. 381–395. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47602-5_51
Wang, K., Wang, Z., Topor, R.W., Pan, J.Z., Antoniou, G.: Eliminating concepts and roles from ontologies in expressive descriptive logics. Comput. Intell. 30(2), 205–232 (2014)
Wang, X., et al.: A framework for evaluating snippet generation for dataset search. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11778, pp. 680–697. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30793-6_39
Wang, X., Cheng, G., Kharlamov, E.: Towards multi-facet snippets for dataset search. In: PROFLILES & SemEx 2019, pp. 1–6 (2019)
Zneika, M., Lucchese, C., Vodislav, D., Kotzinos, D.: Summarizing linked data RDF graphs using approximate graph pattern mining. In: EDBT 2016, pp. 684–685 (2016)
Zneika, M., Vodislav, D., Kotzinos, D.: Quality metrics for RDF graph summarization. Semant. Web 10(3), 555–584 (2019)
Acknowledgements
This work was supported by the NSFC (62072224).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, X. et al. (2021). PCSG: Pattern-Coverage Snippet Generation for RDF Datasets. In: Hotho, A., et al. The Semantic Web – ISWC 2021. ISWC 2021. Lecture Notes in Computer Science(), vol 12922. Springer, Cham. https://doi.org/10.1007/978-3-030-88361-4_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-88361-4_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88360-7
Online ISBN: 978-3-030-88361-4
eBook Packages: Computer ScienceComputer Science (R0)