Abstract:
Many information tasks involve objects that are explicitly or implicitly connected in a network, such as webpages connected by hyperlinks or people linked by "friendships...Show MoreMetadata
Abstract:
Many information tasks involve objects that are explicitly or implicitly connected in a network, such as webpages connected by hyperlinks or people linked by "friendships" in a social network. Research on link-based classification (LBC) has studied how to leverage these connections to improve classification accuracy. This research broadly falls into two groups. First, there are methods that use the original attributes and/or links of the network, via a link-aware supervised classifier or via a non-learning method based on label propagation or random walks. Second, there are recent methods that first compute a set of latent features or links that summarize the network, then use a (hopefully simpler) supervised classifier or label propagation method. Some work has claimed that the latent methods can improve accuracy, but has not adequately compared with the best non-latent methods. In response, this paper provides the first substantial comparison between these two groups. We find that certain non-latent methods typically provide the best overall accuracy, but that latent methods can be competitive when a network is densely-labeled or when the attributes are not very informative. Moreover, we introduce two novel combinations of these methods that in some cases substantially increase accuracy.
Published in: Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014)
Date of Conference: 13-15 August 2014
Date Added to IEEE Xplore: 02 March 2015
Electronic ISBN:978-1-4799-5880-1