Skip to main content
Log in

Link classification with probabilistic graphs

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

The need to deal with the inherent uncertainty in real-world relational or networked data leads to the proposal of new probabilistic models, such as probabilistic graphs. Every edge in a probabilistic graph is associated with a probability whose value represents the likelihood of its existence, or the strength of the relation between the entities it connects. The aim of this paper is to propose two machine learning techniques for the link classification problem in relational data exploiting the probabilistic graph representation. Both the proposed methods will exploit a language-constrained reachability method to infer the probability of possible hidden relationships that may exists between two nodes in a probabilistic graph. Each hidden relationships between two nodes may be viewed as a feature (or a factor), and its corresponding probability as its weight, while an observed relationship is considered as a positive instance for its corresponding link label. Given a training set of observed links, the first learning approach is to use a propositionalization technique adopting a L2-regularized Logistic Regression to learn a model able to predict unobserved link labels. Since in some cases the edges’ probability may be not known in advance or they could not be precisely defined for a classification task, the second xposed approach is to exploit the inference method and to use a mean squared technique to learn the edges’ probabilities. Both the proposed methods have been evaluated on real world data sets and the corresponding results proved their validity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Sometimes called certain graph.

  2. In the rest of the paper, if not otherwise specified, \(\mathbb{I} {C}\) denotes the indicator function returning 1 if the condition C is true, and 0 otherwise.

  3. http://www.csie.ntu.edu.tw/~cjlin/liblinear.

  4. http://ir.ii.uam.es/hetrec2011/datasets.html

  5. http://www.lastfm.com

  6. http://www.di.uniba.it/~claudiotaranto/eagle.html

References

  • Baccianella, S., Esuli, A., Sebastiani, F. (2009). Evaluation measures for ordinal regression. In Proceedings of the 9th international conference on intelligent systems design and applications. IEEE, (pp. 283–287).

  • Bottou, L. (1998). Online algorithms and stochastic approximations. In D. Saad (Ed.), Online Learning and Neural Networks. Cambridge: Cambridge University Press.

    Google Scholar 

  • Cantador, I., Brusilovsky, P., Kuflik, T. (eds.) (2011). 2nd Workshop on information heterogeneity and fusion. Recommender Systems (HetRec 2011), ACM.

  • Colbourn, C.J. (1987). The Combinatorics of Network Reliability. Oxford University Press.

  • Craven, M., & Slattery, S. (2001). Relational learning with statistical predicate invention: better models for hypertext. Machine Learning, 43(1–2), 97–119.

    Article  MATH  Google Scholar 

  • Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and roc curve. In Proceedings of the 23rd international conference on machine learning (pp. 233–240).

  • De Raedt, L., Frasconi, P., Kersting, K. (2008). Probabilistic Inductive Logic Programming. In S. Muggleto, (Ed.) Theory and Applications, LNCS, (vol 4911). Springer.

  • Desrosiers, C., & Karypis, G. (2011). A comprehensive survey of neighborhood-based recommendation methods. In F. Ricci, L. Rokach, B. Shapira, P. B. Kantor (Eds.) Recommender Systems Handbook (pp. 107–144). Springer.

  • Domingos, P., & Lowd, D. (2009). Markov Logic: an interface layer for artificial intelligence, 1st edn. Morgan and Claypool Publishers.

  • Duchi, J.C., Hazan, E., Singer, Y. (2010). Adaptive subgradient methods for online learning and stochastic optimization. In A. T. Kalai & M. Mohri (Eds.) The 23rd Conference on Learning Theory, Omnipress, (pp. 257–269).

  • Georgiev, K., & Nakov, P. (2013). A non-iid framework for collaborative filtering with restricted Boltzmann machines. In S. Dasgupta & D. McAllester (Eds.) Proceedings of the 30th international conference on machine learning, JMLR workshop and conference proceedings (Vol. 28, pp. 1148–1156).

  • Getoor, L., & Diehl, C.P. (2005). Link mining: a survey. SIGKDD Explorations, 7(2), 3–12.

    Article  Google Scholar 

  • Getoor, L., & Taskar, B. (2007). Introduction to Statistical Relational Learning Adaptive Computation and Machine Learning. The MIT Press.

  • Goldberg, D.S., & Roth, F.P. (2003). Assessing experimentally derived interactions in a small world. Proceedings of the National Academy of Sciences, 100(8), 4372–4376.

    Article  MATH  MathSciNet  Google Scholar 

  • Gutmann, B., Kimmig, A., Kersting, K., Raedt, L. (2008). Parameter learning in probabilistic databases: a least squares approach. In Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I (pp. 473–488). Springer.

  • Gutmann, B., Thon, I., De Raedt, L. (2011). Learning the parameters of probabilistic logic programs from interpretations. In Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Part I (pp. 581–596). Springer.

  • He, J., & Chu, W.W. (2010). A social network-based recommender system (snrs). In N. Memon, J. J. Xu, D. L. Hicks, H. Chen (Eds.) Data Mining for Social Network Data, Annals of Information Systems (Vol. 12, pp. 47–74). Springer.

  • Jin, R., Liu, L., Ding, B., Wang, H. (2011). Distance-constraint reachability computation in uncertain graphs. Proceedings of the VLDB Endownment, 4, 551–562.

    Google Scholar 

  • Kramer, S., Lavrač N., Flach, P. (2000). In Relational data mining,chap propositionalization approaches to relational data mining, (pp. 262–286). Berlin: Springer-Verlag.

    Google Scholar 

  • Langseth, H., & Nielsen, T.D. (2012). A latent model for collaborative filtering. International Journal of Approximate Reasoning, 53(4), 447–466.

    Article  MathSciNet  Google Scholar 

  • Lin, C.J., Weng, R.C., Keerthi, S.S. (2008). Trust region newton method for logistic regression. Journal of Machine Learning Research, 9, 627–650.

    MATH  MathSciNet  Google Scholar 

  • Macskassy, S.A. (2011). Relational classifiers in a non-relational world: using homophily to create relations. In X. Chen, T. S. Dillon, H. Ishbuchi, J. Pei, H. Wang, M. A. Wani (Eds.) 10th International Conference on Machine Learning and Applications and Workshops, IEEE, (pp. 406–411).

  • Newman, M.E.J. (2001a). Clustering and preferential attachment in growing networks. Physical Review E, 64.

  • Newman, M.E.J. (2001b) The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences of the United States of America, 98(2), 404–409.

    Article  MATH  Google Scholar 

  • Pfeiffer, I.J.J., & Neville, J. (2011). Methods to determine node centrality and clustering in graphs with uncertain structure. In Proceedings of the Fifth International Conference on Weblogs and Social Media, The AAAI Press.

  • Popescul, A., & Ungar, L.H. (2003). Statistical relational learning for link prediction. In IJCAI03 Workshop on Learning Statistical Models from Relational Data.

  • Potamias, M., Bonchi, F., Gionis, A., Kollios, G. (2010). K-nearest neighbors in uncertain graphs. Proceedings of the VLDB Endowment, 3, 997–1008.

    Google Scholar 

  • Robbins, H., & Monro, S. (1951). A stochastic approximation method. Annals of Mathematical Statistics, 22(3), 400–407.

    Article  MATH  MathSciNet  Google Scholar 

  • Sato, T. (1995). A statistical learning method for logic programs with distribution semantics. In Proceedings of the 12th International Conference on Logic Programming. MIT Press (pp. 715–729).

  • Taranto, C., Di Mauro, N., Esposito, F. (2011). Probabilistic inference over image networks. Italian Research 7 Conference on Digital Libraries 2011 (Vol 249, pp. 1-13). CCIS.

  • Taranto, C., Di Mauro, N., Esposito, F. (2012a). Uncertain graphs meet collaborative filtering. In 3rd Italian Information Retrieval Workshop.

  • Taranto, C., Di Mauro, N., Esposito, F. (2012b). Uncertain (multi)graphs for personalization services in digital libraries. In M. Agosti, F. Esposito, S. Ferilli, N. Ferro (Eds.) 8th Italian Research Conference on Digital Libraries, Vol. 354. Berlin: Springer, CCIS.

  • Taranto, C., Di Mauro, N., Esposito, F. (2013). Learning in probabilistic graphs exploiting language-constrained patterns. In A. Appice, M. Ceci, C. Loglisci, G. Manco, E. Masciari, Z. W. Ras (Eds.) New Frontiers in Mining Complex Patterns, LNCS (Vol. 7765, pp. 155–169). Berlin: Springer.

    Chapter  Google Scholar 

  • Taskar, B., Wong, M.F., Abbeel, P., Koller, D. (2003). Link prediction in relational data. In S. Thrun, L. K. Saul, B. Schölkopf (Eds.) Advances in Neural Information Processing Systems (p. 16).

  • von Luxburg, U., Radl, A., Hein, M. (2011). Hitting and commute times in large graphs are often misleading. CORR.

  • Vozalis, M.G., Markos, A., Margaritis, K.G. (2010). Collaborative filtering through svd-based and hierarchical nonlinear pca. In Proceedings of the 20th international conference on Artificial neural networks. Part I, (pp. 395–400). Berlin: Springer.

    Google Scholar 

  • Witsenburg, T., & Blockeel, H. (2011). Improving the accuracy of similarity measures by using link information. In M. Kryszkiewicz, H. Rybinski, A. Skowron, Z. W. Ras (Eds.) Proceedings of the 19th International conference on Foundations of Intelligent Systems (Vol. 6804, pp. 501512). Springer: LNCS

    Google Scholar 

  • Zan, H., Xin, L., Hsinchun, C. (2005). Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries ACM Press (pp. 141–142).

  • Zhu, J. (2003). Mining web site link structures for adaptive web site navigation and search. PhD thesis.

  • Zou, Z., Gao, H., Li, J. (2010a). Discovering frequent subgraphs over uncertain graph databases under probabilistic semantics. In Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM (pp. 633–642).

  • Zou, Z., Li, J., Gao, H., Zhang, S. (2010b). Finding top-k maximal cliques in an uncertain graph. International Conference on Data Engineering, 649–652.

Download references

Acknowledgments

This work fulfills the research objectives of the PON02_00563_3489339 project “PUGLIA@SERVICE - L’Ingegneria dei Servizi Internet-Based per lo sviluppo strutturale di un territorio intelligente” funded by the Italian Ministry of University and Research (MIUR).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicola Di Mauro.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Di Mauro, N., Taranto, C. & Esposito, F. Link classification with probabilistic graphs. J Intell Inf Syst 42, 181–206 (2014). https://doi.org/10.1007/s10844-013-0293-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-013-0293-0

Keywords

Navigation