Elsevier

Knowledge-Based Systems

Volume 110, 15 October 2016, Pages 30-48
Knowledge-Based Systems

SCIFNET: Stance community identification of topic persons using friendship network analysis

https://doi.org/10.1016/j.knosys.2016.07.015Get rights and content

Abstract

A topic that involves communities with different competing viewpoints or stances is usually reported by a large number of documents. Knowing the association between the persons mentioned in the documents can help readers construct the background knowledge of the topic and comprehend the numerous topic documents more easily. In this paper, we investigate the stance community identification problem where the goal is to cluster important persons mentioned in a set of topic documents into stance-coherent communities. We propose a stance community identification method called SCIFNET, which constructs a friendship network of topic persons from topic documents automatically. Stance community expansion and stance community refinement techniques are designed to identify stance-coherent communities of topic persons in the friendship network and to detect persons who are stance-irrelevant about the topic. The results of experiments based on real-world datasets demonstrate the effectiveness of SCIFNET and show that it outperforms many well-known community detection approaches and clustering algorithms.

Introduction

With the prevalence of telecommunication technologies and the explosive growth in medium digitization, there are now enormous amounts of information on the Internet. As a result, people worldwide can easily obtain information about the latest topics, such as global economic trends, political events, and sports tournament results via the Internet. Usually, people are interested in topics that involve communities with different competing viewpoints or stances. However, they are often overwhelmed by the large number of topic documents that cover every detail of different stance communities. For example, in the topic about the selection of a new International Monetary Fund (IMF) president in 2011, Google News1 collected hundreds of topic documents that reported the development of the campaign. Although the documents covered all perspectives on the topic (i.e., from the interactions between the candidates to the viewpoints of the general public), readers generally had difficulty assimilating the enormous amount of information in the documents. To ease the burden of reading so many topic documents, several topic mining techniques have been developed. For instance, Nallapati et al. [35] grouped topic documents into clusters, each of which presents a theme of a topic; Feng and Allan [22] extracted informative sentences from themes to summarize a topic; and Chen and Chen [5], [6] further organized themes and summaries chronologically to depict the storyline of a topic. The techniques successfully condense the content of a topic. However, readers still need to invest a lot of time in digesting the generated summaries if they are not familiar with the topic.

A topic is basically associated with persons, times, and places [35]. Learning the associations between the persons mentioned in a set of topic documents (called topic persons hereafter) can help readers construct the background knowledge of the topic and digest the information quickly. For instance, in the above mentioned topic about the new IMF president selection, if readers had known that Angela Merkel supported Christine Lagarde (i.e., they are detected in the same community), they would have understood why she said “Christine Lagarde is an ideal embodiment of economics.”

In this paper, we investigate the stance community identification problem, which involves clustering topic persons into stance-coherent communities. For instance, given the documents about the selection of the new IMF president in 2011, the stance community identification method discovers communities of persons, which represent the camps of the different candidates running for election, as shown in Fig. 1. Identifying stance communities of topic persons is a new research area, and to the best of our knowledge, only Chen et al. [7], [8] have addressed the stance community identification problem. They proposed using Principal Component Analysis (PCA) [4]. Specifically, they examine the signs of the entries in the eigenvector associated with the largest eigenvalue to recognize stance communities of topic persons. The method can only handle two-stance topics; however, in practice, many topics involve more than two stances. Here, we present a novel stance community identification method called SCIFNET (Stance Community Identification based on Friendship NETwork), which analyzes a set of topic documents to identify stance communities and the corresponding persons in a topic. First, SCIFNET constructs a friendship network in which the nodes represent topic persons. The co-occurrence of the persons in the topic documents, the documents’ stance orientation, and the co-neighboring level between nodes are leveraged to define the friendship strength between persons (i.e., the edge weights). We model stance community identification as a community detection task and design an objective function to evaluate the results. Stance community expansion and stance community refinement techniques, which are based on the objective function, are designed to iteratively cluster topic persons into stance-coherent communities and detect persons that are stance-irrelevant about the topic of interest. Their convergence proofs are presented such that the identification result converges to a local optimum. Evaluations based on real-world topics demonstrate the effectiveness of SCIFNET, and show that it outperforms well-known clustering and community detection approaches.

The proposed method has the following advantages over the current community detection research. First, most iterative clustering-based community detection methods, such as those in [20], [31], [48], would suffer the early merging problems of a node in a network tending to be merged (clustered) with a community simply because it is close to the community's seed. To get rid of this type of problem, we design the stance community refinement which iteratively refines the detected communities. Second, nodes in a social (friendship) network can play different roles. Differing from the overlapping node, bridge node, and hub node investigated in [13], [14], [21], the proposed method is able to identify stance-irrelevant nodes which stand for persons neutral to the stances of a topic. Finally, since topic persons may have opposing orientations, the constructed friendship network could have negative edges. While several community detection methods, such as [13], [21], [32] analyze network structures to infer communities, our method further examines edge signs to correctly detect stance communities of topic persons.

The remainder of this paper is organized as follows. In the next section, we review related works. Then, we describe SCIFNET in detail, and demonstrate its efficiency in experimental section. Final section contains our conclusions.

Section snippets

Related work

Our research is related to community detection [41]. Given a network of interests, the community detection task involves identifying sub-networks, each of which represents a coherent community [12], [24], [36], [39]. For instance, given a social network, community detection methods identify groups of people with similar preferences [41]. The identified communities are useful to comprehend various social phenomena, such as epidemic spreading [43], and human interactions [14], [15], [40], [42].

Methodology

We proposed a stance community identification method, SCIFNET, which clusters the persons mentioned in topic documents into stance-coherent communities. Fig. 2 shows SCIFNET's system architecture, which is comprised of three components: friendship network construction, stance community expansion, and stance community refinement. Specifically, given a set of documents reporting a topic with K stance communities, SCIFNET first extracts the topic persons mentioned in the documents. Then, it

Experiment

In this section, we introduce the data corpus used in the experiments; demonstrate the effectiveness of each system component; and compare our method's performance with those of other well-known community detection methods and clustering algorithms. Then, we present a stance community identification result and discuss the stance-irrelevant persons detected by our method.

Concluding remarks

The Internet has become a crucial medium for disseminating and acquiring the latest information about topics. However, users are often overwhelmed by the enormous number of topic documents. Basically, times, places, and persons are the key elements of topics. Knowing the associations of topic persons can help readers construct the background knowledge of a topic and comprehend numerous topic documents quickly. In this paper, we defined the problem of stance community identification, which

Acknowledgement

This research was supported in part by MOST 103-2221-E-002-106-MY2 from the Ministry of Science and Technology, Republic of China.

Reference (56)

  • WangX. et al.

    Detecting communities by the core-vertex and initimate degree in complex networks

    Physica A

    (2013)
  • J. Allan et al.

    Topic detection and tracking pilot study final report

  • P. Anchuri et al.

    Communities and balance in signed networks: a spectral approach

  • I. Antonellis et al.

    Simrank++: query rewriting through link analysis of the click graph

  • D. Barber

    Bayesian Reasoning and Machine Learning

    (2012)
  • ChenC.C. et al.

    TSCAN: a novel method for topic summarization and content anatomy

  • ChenC.C. et al.

    TSCAN: a content anatomy approach to temporal topic summarization

    IEEE Trans. Knowl. Data Eng.

    (2012)
  • ChenC.C. et al.

    An unsupervised approach for person name bipolarization using principal component analysis

    IEEE Trans. Knowl. Data Eng.

    (2012)
  • ChenC.C. et al.

    Bipolar person name identification of topic documents using principal component analysis

  • ChenJ. et al.

    Detecting communities in social networks using max-min modularity

  • ChenJ. et al.

    Local Community Identification in Social Networks

  • ChinA. et al.

    A social hypertext model for finding community in blogs

  • A. Clauset et al.

    Finding community structure in very large networks

    Phys. Rev. E

    (2004)
  • DingC.H.Q. et al.

    A min-max cut algorithm for graph partitioning and data clustering

  • DingZ. et al.

    Overlapping community detection based on network decomposition

    Sci. Rep.

    (2016)
  • W.E. Donath et al.

    Lower bounds for the partitioning of graphs

    IBM J. Res. Dev.

    (1973)
  • FengA. et al.

    Finding and linking incidents in news

  • GaoJ. et al.

    On community outliers and their efficient detection in information networks

  • Cited by (5)

    View full text