Abstract
With the development of the Linked Data, an increasing number of RDF data sets are published in many application domains. To understand the underlying meaning and characteristics of large RDF data, and to reuse popular domain terms when publishing data, capturing emerging pragmatic patterns is critical. In this paper, we propose the notion of term co-instantiation graph (TIG) and a method to build a TIG for a given RDF dataset. We also describe a clustering-based approach to distill a set of pragmatic patterns from a TIG, which reveal the pragmatic custom of highly-correlated terms. Through extensive experiments on a real big dataset containing 21 M RDF documents, we analyze the macroscopic structure of the term co-instantiation graph and pragmatic patterns from the complex network point of view, and demonstrate our approach can not only give an elaborated ontology partitioning from the pragmatic perspective to ease the ontology reuse, but also provide a new way to explore the Linked Data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
- 3.
- 4.
- 5.
It is refined from the raw TIM matrix by using heuristic rules, c.f. Sect. 3.1.
- 6.
- 7.
http://xmlns.com/foaf/spec/20091215.html. Our experimental dataset is crawled in 2009. By then, the FOAF’s version is 0.96.
References
Ding, L., Finin, T., Joshi, A.: Analyzing social networks on the semantic web. IEEE Intell. Syst. 9(1), 451–458 (2005)
Campinas, S., Perry, T.E., Ceccarelli, D., Delbru, R., Tummarello, G.: Introducing rdf graph summary with application to assisted sparql formulation. In: 2012 23rd International Workshop on Database and Expert Systems Applications (DEXA), pp. 261–266. IEEE (2012)
Zhang, Z., Gentile, A.L., Blomqvist, E., Augenstein, I., Ciravegna, F.: Statistical knowledge patterns: identifying synonymous relations in large linked datasets. In: Alani, H., et al. (eds.) ISWC 2013, Part I. LNCS, vol. 8218, pp. 703–719. Springer, Heidelberg (2013)
Cheng, G., Zhang, Y., Qu, Y.: Explass: exploring associations between entities via top-K ontological patterns and facets. In: Mika, P., et al. (eds.) ISWC 2014, Part II. LNCS, vol. 8797, pp. 422–437. Springer, Heidelberg (2014)
Cheng, G., Ge, W., Qu, Y.: Falcons: searching and browsing entities on the semantic web. In: Proceedings of WWW, pp. 1101–1102 (2008)
Salton, G., McGill, M.H.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
Guha, S., Rastogi, R., Shim, K.: Rock: a robust clustering algorithm for categorical attributes. In: Proceedings of ICDE, pp. 512–521 (1999)
Kannan, R., Vempala, S., Vetta, A.: On clustering: good, bad and spectral. J. ACM 51(3), 497–515 (2004)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20(1), 53–65 (1987)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers Inc., San Francisco (2005)
de Nooy, W., Mrvar, A., Batagelj, V.: Exploratory Social Network Analysis with Pajek. Cambridge University Press, Cambridge (2005)
Gangemi, A.: Ontology design patterns for semantic web content. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 262–276. Springer, Heidelberg (2005)
Józefowska, J., Lawrynowicz, A., Lukaszewski, T.: Faster frequent pattern mining from the semantic web. In: Intelligent Information Processing and Web Mining. Advances in Soft Computing, vol. 35, pp. 121–130. Springer, Heidelberg (2006)
Fanizzi, N., dAmato, C., Esposito, F.: Metric-based stochastic conceptual clustering for ontologies. Inf. Syst. 34(8), 792–806 (2009)
Lisi, F.A., Esposito, F.: Mining the semantic web: a logic-based methodology. In: Hacid, M.-S., Murray, N.V., Ras, Z.W., Tsumoto, S. (eds.) ISMIS 2005. LNCS (LNAI), vol. 3488, pp. 102–111. Springer, Heidelberg (2005)
Galárraga, L.A., Teflioudi, C., Hose, K., Suchanek, F.: Amie: association rule mining under incomplete evidence in ontological knowledge bases. In: Proceedings of the 22nd international conference on World Wide Web, pp. 413–422. International World Wide Web Conferences Steering Committee (2013)
Nebot, V., Berlanga, R.: Finding association rules in semantic web data. Knowl.-Based Syst. 25(1), 51–62 (2012)
Chen, H., Ng, T.D., Martinez, J., Schatz, B.R.: A concept space approach to addressing the vocabulary problem in scientific information retrieval: an experiment on the worm community system. J. Am. Soc. Inform. Sci. 48(1), 17–31 (1997)
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of KDD, pp. 269–274 (2001)
Acknowledgments
This work is supported by the National Natural Science Foundation of China (NSFC) under Grants 61402426 and partially supported by Collaborative Innovation Center of Novel Software Technology and Industrialization.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ge, W., Hu, W., He, C., Zong, S. (2015). Emerging Pragmatic Patterns in Large-Scale RDF Data. In: Qiang, W., Zheng, X., Hsu, CH. (eds) Cloud Computing and Big Data. CloudCom-Asia 2015. Lecture Notes in Computer Science(), vol 9106. Springer, Cham. https://doi.org/10.1007/978-3-319-28430-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-28430-9_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-28429-3
Online ISBN: 978-3-319-28430-9
eBook Packages: Computer ScienceComputer Science (R0)