ABSTRACT
Procedures for collective inference make simultaneous statistical judgments about the same variables for a set of related data instances. For example, collective inference could be used to simultaneously classify a set of hyperlinked documents or infer the legitimacy of a set of related financial transactions. Several recent studies indicate that collective inference can significantly reduce classification error when compared with traditional inference techniques. We investigate the underlying mechanisms for this error reduction by reviewing past work on collective inference and characterizing different types of statistical models used for making inference in relational data. We show important differences among these models, and we characterize the necessary and sufficient conditions for reduced classification error based on experiments with real and simulated data.
- Chakrabarti, S., B. Dom & P. Indyk. Enhanced Hypertext Classification Using Hyper-Links, In Proc. ACM SIGMOD Conference, pp. 307--318, 1998.]] Google ScholarDigital Library
- Domingos, P. A Unified Bias-Variance Decomposition for Zero-One and Squared Loss. In Proc. of the 17th National Conference on Artificial Intelligence, pp. 564--569, 2000.]] Google ScholarDigital Library
- Domingos, P. & M. Richardson. Mining the Network Value of Customers. In Proc. of the 7th International Conference on Knowledge Discovery and Data Mining, pp. 57--66, 2001.]] Google ScholarDigital Library
- Getoor, L., N. Friedman, D. Koller, & A. Pfeffer. Learning Probabilistic Relational Models. In Relational Data Mining, S. Dzeroski and N. Lavrac, Eds., Springer-Verlag, 2001.]] Google ScholarDigital Library
- Getoor, L., E. Segal, B. Taskar, & D. Koller. Probabilistic Models of Text and Link Structure for Hypertext Classification. In Proc. IJCAI01 Workshop on Text Learning: Beyond Supervision, 2001.]]Google Scholar
- Getoor, L., J. Rhee, D. Koller, & P. Small. Understanding Tuberculosis Epidemiology using Probabilistic Relational Models. Journal of Artificial Intelligence in Medicine, vol. 30, pp. 233--256, 2004.]] Google ScholarDigital Library
- Jensen, D. & J. Neville. Linkage and Autocorrelation Cause Feature Selection Bias in Relational Learning. In Proc. of the 19th International Conference on Machine Learning, pp. 259--266, 2002.]] Google ScholarDigital Library
- Kersting, K. & L. De Raedt. Basic principles of learning Bayesian logic programs. Technical Report No. 174, Institute for Computer Science, University of Freiburg, Germany, June 2002.]]Google Scholar
- Macskassy, S. & F. Provost. A Simple Relational Classifier. In Proc. KDD-2003 Workshop on Multi-Relational Data Mining (MRDM-2003), pp. 64--76, 2003.]]Google Scholar
- Neville, J. & D. Jensen. Iterative Classification in Relational Data. In Proc. AAAI-2000 Workshop on Learning Statistical Models from Relational Data, pp. 13--20, 2000.]]Google Scholar
- Neville, J. & D. Jensen. Supporting Relational Knowledge Discovery: Lessons in Architecture and Algorithm Design. In Proc. ICML2002 Data Mining Lessons Learned Workshop, pp. 57--64, 2002.]]Google Scholar
- Neville, J., & Jensen, D. Collective Classification with Relational Dependency Networks. In Proc. KDD-2003 Workshop on Multi-Relational Data Mining (MRDM-2003), pp. 77--91, 2003.]]Google Scholar
- Neville, J., D. Jensen & B. Gallagher. Simple Estimators for Relational Bayesian Classifiers. In Proc. of the 3rd IEEE International Conference on Data Mining, pp. 609--612, 2003.]] Google ScholarDigital Library
- Slattery, S., & T. Mitchell. Discovering Test Set Regularities in Relational Domains. In Proc. 17th International Conference on Machine Learning, pp.895--902, 2000.]] Google ScholarDigital Library
- Taskar, B., P. Abbeel & D. Koller. Discriminative Probabilistic Models for Relational Data. In Proc. 18th Conference on Uncertainty in Artificial Intelligence, pp. 485--492, 2002.]] Google ScholarDigital Library
- Taskar, B., E. Segal & D. Koller. Probabilistic Classification and Clustering in Relational Data. In Proc. 17th International Joint Conference on Artificial Intelligence, pp. 870--878, 2001.]] Google ScholarDigital Library
- Yang, Y, S. Slattery & R. Ghani. A Study of Approaches to Hypertext Categorization. Journal of Intelligent Information Systems. 18(2-3): 219--241. 2002.]] Google ScholarDigital Library
Index Terms
- Why collective inference improves relational classification
Recommendations
Collective inference for network data with copula latent markov networks
WSDM '13: Proceedings of the sixth ACM international conference on Web search and data miningThe popularity of online social networks and social media has increased the amount of linked data available in Web domains. Relational and Gaussian Markov networks have both been applied successfully for classification in these relational settings. ...
A bias/variance decomposition for models using collective inference
Bias/variance analysis is a useful tool for investigating the performance of machine learning algorithms. Conventional analysis decomposes loss into errors due to aspects of the learning process, but in relational domains, the inference process used for ...
Budgeted online collective inference
UAI'15: Proceedings of the Thirty-First Conference on Uncertainty in Artificial IntelligenceUpdating inference in response to new evidence is a fundamental challenge in artificial intelligence. Many real problems require large probabilistic graphical models, containing millions of interdependent variables. For such large models, jointly ...
Comments