Abstract
We focus on the problem of predicting missing class memberships and property assertions in Web Ontologies. We start from the assumption that related entities influence each other, and they may be either similar or dissimilar with respect to a given set of properties: the former case is referred to as homophily, and the latter as heterophily. We present an efficient method for predicting missing class and property assertions for a set of individuals within an ontology by: identifying relations that are likely to encode influence relations between individuals (learning phase) and Leveraging such relations for propagating property information across related entities (inference phase). We show that the complexity of both inference and learning is nearly linear in the number of edges in the influence graph, and we provide an empirical evaluation of the proposed method.
Similar content being viewed by others
Notes
OWL 2 W3C Recommendation: http://www.w3.org/TR/owl-overview/.
Plase note that class memberships can be regarded as type, or is-a, properties: the assertion “x is American” can be encoded as \(\text {American}(x)\) or as \(\text {nationality}(x, \text {American})\).
For instance, see the probabilistic interpretation of the penalty term in the end of this section.
A matrix \(\mathbf {A}\) is SDD iff \(\mathbf {A}\) is symmetric (i.e. \(\mathbf {A}= \mathbf {A}^{T}\)) and \(\forall i : \mathbf {A}_{ii} \ge \sum _{i \ne j} |\mathbf {A}_{ij}|\).
The main difference between SPARQL and SPARQL-DL queries is that, in SPARQL-DL queries, one needs to specify whether a variable occurring in place of a role name refers to an object property or a data property.
Pellet v2.3.1—http://clarkparsia.com/pellet/.
Static dump version V2012-02-21 retrieved from http://www.aifb.kit.edu/web/Wissensmanagement/Portal.
http://data.bgs.ac.uk/ as of March 2014.
References
Aggarwal CC (ed) (2011) Social network data analytics. Springer, New York
Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives ZG Dbpedia (2007) A nucleus for a web of open data. In: Aberer K et al (eds) The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 \(+\)ASWC2007, Busan,Korea, November 11–15, 2007, LNCS, vol 4825. Springer, Berlin, pp 722–735
Baader F, Calvanese D, McGuinness DL, Nardi D, Patel-Schneider PF (eds) (2007) The description logic handbook. Cambridge University Press, Cambridge
Bengio Y, Delalleau O, Le Roux N (2006) Semi-Supervised Learning. In: Chapelle O, Schölkopf B, Zien A (eds) Label propagation and quadratic criterion. MIT Press, Cambridge, pp 193–216
Berners-Lee T, Hendler J, Lassila O (2001) The semantic web. Sci Am 284(5):34–43
Bhagat S, Cormode G, Muthukrishnan S (2011) Node classification in social networks. In: Aggarwal CC [2], pp 115–148
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Bizer C, Heath T, Berners-Lee T (2009) Linked data—the story so far. Int J Semant Web Inf Syst 5(3):1–22
Bizer C, Lehmann J, Kobilarov G, Auer S, Becker C, Cyganiak R, Hellmann S (2009) DBpedia—a crystallization point for the web of data. J Web Sem 7(3):154–165
Bloehdorn S, Sure Y (2007) Kernel methods for mining instance data in ontologies. In: Aberer K et al (eds) The Semantic Web, 6th International Semantic Web Conference, 2nd Asian Semantic Web Conference, ISWC 2007 \(+\)ASWC2007, Busan,Korea, November 11–15, 2007, LNCS, vol 4825. Springer, Berlin, pp 58–71
Bollacker KD, Evans C, Paritosh P, Sturge T, Taylor J (2008) Freebase: a collaboratively created graph database for structuring human knowledge. In: Wang JT (ed) Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, BC, Canada, June 10–12, 2008, pp 1247–1250. ACM
Bordes A, Gabrilovich E (2014) Constructing and mining web-scale knowledge graphs: KDD 2014 tutorial. In: Macskassy SA et al (eds) The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA—August 24–27, 2014. ACM
Bordes A, Glorot X, Weston J, Bengio Y (2014) A semantic matching energy function for learning with multi-relational data—application to word-sense disambiguation. Mach Learn 94(2):233–259
Bordes A, Usunier N, García-Durán A, Weston J, Yakhnenko O (2013) Translating embeddings for modeling multi-relational data. In: Burges CJC et al (eds) Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 2787–2795
Bordes A, Weston J, Collobert R, Bengio Y (2011) Learning structured embeddings of knowledge bases. In: Burgard W et al (eds) Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011, San Francisco, California, USA, August 7–11, 2011. AAAI Press
Socher R, Chen D, Manning CD, Ng AY (2013) Reasoning with neural tensor networks for knowledge base completion. In: Burges CJC et al (eds) Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5–8, 2013, Lake Tahoe, Nevada, United States, pp 926–934
Chapelle O, Schölkopf B, Zien A (eds) (2006) Semi-supervised learning. MIT Press, Cambridge
Cohen MB, Kyng R, Miller GL, Pachocki JW, Peng R, Rao A, Xu SC (2014) Solving SDD linear systems in nearly mlog1/2n time. In: Shmoys DB (ed) Symposium on Theory of Computing, STOC 2014, New York, NY, USA,May 31—June 03, 2014. ACM, New York, pp 343–352
d’Amato C, Fanizzi N, Esposito F (2010) Inductive learning for the semantic web: what does it buy? Semantic Web 1(1–2):53–59. doi:10.3233/SW-2010-0007
Davis J, Goadrich M (2006) The relationship between Precision-Recall and ROC curves. In: Cohen W et al (eds) Proceedings of ICML’06. ACM, pp 233–240
de Vries GKD (2013) A Fast Approximation of the Weisfeiler–Lehman Graph Kernel for RDF Data. In: Blockeel H et al (eds) Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23–27, 2013, Proceedings, Part I, LNCS, vol 8188. Springer, pp 606–621
Delalleau O, Bengio Y, Roux NL (2005) Efficient non-parametric function induction in semi-supervised learning. In: Cowell RG et al (eds) Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, AISTATS 2005, Bridgetown, Barbados, January 6–8, 2005. Society for Artificial Intelligence and Statistics
Domingos P, Lowd D, Kok S, Poon H, Richardson M, Singla P (2008) Just Add Weights: Markov Logic for the Semantic Web. In: da Costa PCG et al (eds) Uncertainty Reasoning for the Semantic Web I, LNAI, vol 5327. Springer, Berlin, pp 1–25
Dong X, Gabrilovich E, Heitz G, Horn W, Lao N, Murphy K, Strohmann T, Sun S, Zhang W (2014) Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: Macskassy SA et al (eds) The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA—August 24–27, 2014. ACM, pp 601–610
Fergus R,Weiss Y, Torralba A (2006) Semi-supervised learning in gigantic image collections. In: Bengio Y et al (eds) Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7–10 December 2009, Vancouver, British Columbia, Canada. Curran Associates, Inc, pp 522–530
Franz T, Schultz A, Sizov S, Staab S (2009) Triplerank: ranking semantic web data by tensor decomposition. In: Bernstein A et al (eds) International Semantic Web Conference, LNCS, vol 5823. Springer, Heidelberg, pp 213–228
Galárraga LA, Teflioudi C, Hose K, Suchanek FM (2013) AMIE: association rule mining under incomplete evidence in ontological knowledge bases. In: Schwabe D et al (eds) 22nd International World Wide Web Conference, WWW ’13, Rio de Janeiro, Brazil, May 13–17, 2013. International World Wide Web Conferences Steering Committee/ACM, pp 413–422
Goldberg AB, Zhu X, Wright SJ (2007) Dissimilarity in graph-based semi-supervised classification. In: Meila M et al (eds) Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, AISTATS 2007, San Juan, Puerto Rico, March 21–24, 2007, JMLR Proceedings, vol 2, pp 155–162. JMLR.org
Harris S, Seaborne A (2013) SPARQL 1.1 Query Language . http://www.w3.org/TR/sparql11-query/
Hastie T, Tibshirani R, Friedman J (2001) The elements of statistical learning. Springer series in statistics. Springer New York Inc., New York
Hastie T, Tibshirani R, Friedman J (2008) The elements of statistical learning: data mining, inference and prediction, 2nd edn. Springer, Berlin
Hellmann S, Lehmann J, Auer S (2009) Learning of OWL class descriptions on very large knowledge bases. Int J Semant Web Inform Syst 5(2):25–48
Hitzler P, Krötzsch M, Rudolph S (2009) Foundations of semantic web technologies. Chapman & Hall/CRC, Boca Raton
Ji M, Sun Y, Danilevsky M, Han J, Gao J (2010) Graph regularized transductive classification on heterogeneous information networks. In: Balcázar JL et al (eds) ECML/PKDD (1), LNCS, vol 6321. Springer, Heidelberg, pp 570–586
Kok S, Domingos PM (2007) Statistical predicate invention. In: Ghahramani Z(ed) Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 20–24, 2007, ACM International Conference Proceeding Series, vol 227, pp 433–440. ACM, New York
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge
Koutra D, Ke TY, Kang U, Chau DH, Pao HKK, Faloutsos C (2011) Unifying Guilt-by-Association Approaches: Theorems and Fast Algorithms. In: Gunopulos D et al (eds) Proceedings of ECML/PKDD’11, LNCS, vol 6912, Springer, Berlin, pp 245–260
Krompaß D, Nickel M, Tresp V (2014) Querying factorized probabilistic triple databases. In: Mika P et al (eds) The Semantic Web—ISWC 2014—13th International Semantic Web Conference, Riva del Garda, Italy, October 19–23, 2014. Proceedings, Part II, LNCS, vol 8797. Springer, New York, pp 114–129
LeCun Y, Chopra S, Hadsell R, Ranzato M, Huang F (2006) Predicting Structured Data. In: Bakir G et al (eds) A tutorial on energy-based learning. MIT Press, Cambridge
Lin HT, Koul N, Honavar V (2011) Learning Relational Bayesian Classifiers from RDF Data. In: Aroyo L et al (eds) International Semantic Web Conference (1), LNCS, vol 7031. Springer, Berlin, pp 389–404
Liu W, He J, Chang S (2010) Large graph construction for scalable semi-supervised learning. In: Fürnkranz J et al (eds) Proceedings of the 27th International Conference on Machine Learning (ICML-10), June 21–24, 2010, Haifa, Israel. Omnipress, Haifa, pp 679–686
Livne OE, Brandt A (2012) Lean algebraic multigrid (LAMG): fast graph laplacian linear solver. SIAM J Sci Comput 34(4):499–522
Lösch U, Bloehdorn S, Rettinger A (2012) Graph kernels for RDF data. In: Simperl E et al (eds) The Semantic Web: Research and Applications—9th Extended Semantic Web Conference, ESWC 2012, Heraklion, Crete, Greece, May 27–31, 2012. Proceedings, LNCS, vol 7295. Springer, Heidelberg, pp 134–148
Luo C, Guan R, Wang Z, Lin C (2014) Hetpathmine: A novel transductive classification algorithm on heterogeneous information networks. In: de Rijke M et al (eds) Advances in Information Retrieval—36th European Conference on IR Research, ECIR 2014, Amsterdam, The Netherlands, April 13–16, 2014. Proceedings, LNCS, vol 8416. Springer, Berlin, pp 210–221
McPherson M, Lovin LS, Cook JM (2001) Birds of a feather: homophily in social networks. Ann Rev Sociol 27(1):415–444
Miller KT, Griffiths TL (2009) Jordan MI Nonparametric latent feature models for link prediction. In: Bengio Y et al (eds) Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7–10 December 2009, Vancouver, British Columbia, Canada. Curran Associates, Inc, pp 1276–1284
Minervini P, d’Amato C, Fanizzi N, Esposito F (2014) Adaptive knowledge propagation in web ontologies. In: Janowicz K et al (eds) Knowledge Engineering and Knowledge Management—19th International Conference, EKAW 2014, Linköping, Sweden, November 24–28, 2014. Proceedings, LNCS, vol. 8876. Springer, Berlin, pp 304–319
Minervini P, d’Amato C, Fanizzi N, Tresp V (2014) Learning to propagate knowledge in web ontologies. In: Bobillo F et al (eds) Proceedings of the 10th International Workshop on Uncertainty Reasoning for the Semantic Web (URSW 2014) co-located with the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, October 19, 2014., CEUR Workshop Proceedings, vol 1259. CEUR-WS.org, pp 13–24
Nayak R, Senellart P, Suchanek FM, Varde AS (2012) Discovering interesting information with advances in web technology. SIGKDD Explor 14(2):63–81
Nickel M, Murphy K, Tresp V, Gabrilovich E (2016) A review of relational machine learning for knowledge graphs. Proc IEEE 104(1):11–33
Nickel M, Tresp V, Kriegel H (2011) A three-way model for collective learning on multi-relational data. In: Getoor L et al (eds) Proceedings of the 28th International Conference on Machine Learning, ICML 2011, Bellevue, Washington, USA, June 28–July 2, 2011. Omnipress, pp 809–816
Nickel M, Tresp V, Kriegel H (2012) Factorizing YAGO: scalable machine learning for linked data. In: Mille A et al (eds) Proceedings of the 21st World Wide Web Conference 2012, WWW 2012, Lyon, France, April 16–20, 2012. ACM, pp 271–280
Peng R (2014) Spielman DA An efficient parallel solver for SDD linear systems. In: Shmoys DB (ed) Symposium on Theory of Computing, STOC 2014, New York, NY, USA, May 31—June 03, 2014. ACM, New York, pp 333–342
Rasmussen CE, Williams CKI (2005) Gaussian processes for machine learning (adaptive computation and machine learning). MIT Press, Cambridge
Rettinger A, Lösch U, Tresp V, d’Amato C, Fanizzi N (2012) Mining the Semantic Web: Statistical learning for next generation knowledge bases. Data Min Knowl Discov 24(3):613–662
Rettinger A, Nickles M, Tresp V (2009) Statistical relational learning with formal ontologies. In: Buntine WL et al (eds) Machine Learning and Knowledge Discovery in Databases, European Conference, ECML PKDD 2009, Bled, Slovenia, September 7–11, 2009, Proceedings, Part II, LNCS, vol 5782. Springer, Berlin, pp 286–301
Schmachtenberg M, Bizer C, Paulheim H (2014) Adoption of the linked data best practices in different topical domains. In: Mika P et al (eds) The Semantic Web—ISWC 2014—13th International Semantic Web Conference, Riva del Garda, Italy, October 19–23, 2014. Proceedings, Part I, LNCS, vol 8796. Springer, Heidelberg, pp 245–260
Shadbolt N, Berners-Lee T, Hall W (2006) The semantic web revisited. IEEE Intell Syst 21(3):96–101
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Shor NZ, Kiwiel KC, Ruszcaynski A (1985) Minimization Methods for Non-differentiable Functions. Springer-Verlag New York Inc, New York
Sirin E, Parsia B (2007) SPARQL-DL: SPARQL Query for OWL-DL. In: Golbreich C et al (eds) OWLED, CEUR Workshop Proceedings, vol 258. CEUR-WS.org
Sirin E, Parsia B, Grau BC, Kalyanpur A, Katz Y (2007) Pellet: a practical OWL-DL reasoner. J Web Sement 5(2):51–53
Suchanek FM, Kasneci G, Weikum G (2007) Yago: a core of semantic knowledge. In: Williamson CL et al (eds) Proceedings of the 16th International Conference on World Wide Web, WWW 2007, Banff, Alberta, Canada, May 8–12, 2007, ACM, pp 697–706
Sun Y, Han J (2012) Mining heterogeneous information networks: principles and methodologies. Synthesis lectures on data mining and knowledge discovery. Morgan & Claypool Publishers, San Rafael
Sun Y, Han J, Zhao P, Yin Z, Cheng H, Wu T (2009) Rankclus: integrating clustering with ranking for heterogeneous information network analysis. In: Kersten ML et al (eds) EDBT, ACM International Conference Proceeding Series, vol 360. ACM, pp 565–576
Tresp V, Huang Y, Bundschus M, Rettinger A (2009) Materializing and querying learned knowledge. In: Proceedings of IRMLeS’09
Vapnik VN (1998) Statistical learning theory, 1st edn. Wiley, New York
Wang Z, Zhang J, Feng J, Chen Z (2014) Knowledge graph embedding by translating on hyperplanes. In: Brodley CE et al (eds) Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27–31, 2014, Québec City, Québec, Canada. AAAI Press, pp 1112–1119
Zhang K, Kwok JT, Parvin B (2009) Prototype vector machine for large scale semi-supervised learning. In: Danyluk AP et al (eds) Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, Montreal, Quebec, Canada, June 14–18, 2009, ACM International Conference Proceeding Series, vol 382, ACM, pp 1233–1240
Zhang Y, Huang K, Liu C (2011) Fast and robust graph-based transductive learning via minimum tree cut. In: Cook DJ et al (eds) 11th IEEE International Conference on Data Mining, ICDM 2011, Vancouver, BC, Canada, December 11–14, 2011. IEEE Computer Society, pp 952–961
Zhu X (2005) Semi-supervised learning literature survey. Tech. Rep. 1530, Computer Science, University of Wisconsin-Madison
Zhu X (2005) Semi-supervised learning with graphs. Ph.D. thesis, Pittsburgh, PA, USA . AAI3179046
Zhu X, Ghahramani Z, Lafferty JD (2003) Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions. In: Fawcett T et al (eds) Proceedings of ICML’03, AAAI Press, pp 912–919
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Minervini, P., d’Amato, C., Fanizzi, N. et al. Discovering Similarity and Dissimilarity Relations for Knowledge Propagation in Web Ontologies. J Data Semant 5, 229–248 (2016). https://doi.org/10.1007/s13740-016-0062-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13740-016-0062-7