SCIFNET: Stance community identification of topic persons using friendship network analysis

doi:10.1016/j.knosys.2016.07.015

Knowledge-Based Systems

Volume 110, 15 October 2016, Pages 30-48

https://doi.org/10.1016/j.knosys.2016.07.015 Get rights and content

Abstract

A topic that involves communities with different competing viewpoints or stances is usually reported by a large number of documents. Knowing the association between the persons mentioned in the documents can help readers construct the background knowledge of the topic and comprehend the numerous topic documents more easily. In this paper, we investigate the stance community identification problem where the goal is to cluster important persons mentioned in a set of topic documents into stance-coherent communities. We propose a stance community identification method called SCIFNET, which constructs a friendship network of topic persons from topic documents automatically. Stance community expansion and stance community refinement techniques are designed to identify stance-coherent communities of topic persons in the friendship network and to detect persons who are stance-irrelevant about the topic. The results of experiments based on real-world datasets demonstrate the effectiveness of SCIFNET and show that it outperforms many well-known community detection approaches and clustering algorithms.

Introduction

With the prevalence of telecommunication technologies and the explosive growth in medium digitization, there are now enormous amounts of information on the Internet. As a result, people worldwide can easily obtain information about the latest topics, such as global economic trends, political events, and sports tournament results via the Internet. Usually, people are interested in topics that involve communities with different competing viewpoints or stances. However, they are often overwhelmed by the large number of topic documents that cover every detail of different stance communities. For example, in the topic about the selection of a new International Monetary Fund (IMF) president in 2011, Google News¹ collected hundreds of topic documents that reported the development of the campaign. Although the documents covered all perspectives on the topic (i.e., from the interactions between the candidates to the viewpoints of the general public), readers generally had difficulty assimilating the enormous amount of information in the documents. To ease the burden of reading so many topic documents, several topic mining techniques have been developed. For instance, Nallapati et al. [35] grouped topic documents into clusters, each of which presents a theme of a topic; Feng and Allan [22] extracted informative sentences from themes to summarize a topic; and Chen and Chen [5], [6] further organized themes and summaries chronologically to depict the storyline of a topic. The techniques successfully condense the content of a topic. However, readers still need to invest a lot of time in digesting the generated summaries if they are not familiar with the topic.

A topic is basically associated with persons, times, and places [35]. Learning the associations between the persons mentioned in a set of topic documents (called topic persons hereafter) can help readers construct the background knowledge of the topic and digest the information quickly. For instance, in the above mentioned topic about the new IMF president selection, if readers had known that Angela Merkel supported Christine Lagarde (i.e., they are detected in the same community), they would have understood why she said “Christine Lagarde is an ideal embodiment of economics.”

In this paper, we investigate the stance community identification problem, which involves clustering topic persons into stance-coherent communities. For instance, given the documents about the selection of the new IMF president in 2011, the stance community identification method discovers communities of persons, which represent the camps of the different candidates running for election, as shown in Fig. 1. Identifying stance communities of topic persons is a new research area, and to the best of our knowledge, only Chen et al. [7], [8] have addressed the stance community identification problem. They proposed using Principal Component Analysis (PCA) [4]. Specifically, they examine the signs of the entries in the eigenvector associated with the largest eigenvalue to recognize stance communities of topic persons. The method can only handle two-stance topics; however, in practice, many topics involve more than two stances. Here, we present a novel stance community identification method called SCIFNET (Stance Community Identification based on Friendship NETwork), which analyzes a set of topic documents to identify stance communities and the corresponding persons in a topic. First, SCIFNET constructs a friendship network in which the nodes represent topic persons. The co-occurrence of the persons in the topic documents, the documents’ stance orientation, and the co-neighboring level between nodes are leveraged to define the friendship strength between persons (i.e., the edge weights). We model stance community identification as a community detection task and design an objective function to evaluate the results. Stance community expansion and stance community refinement techniques, which are based on the objective function, are designed to iteratively cluster topic persons into stance-coherent communities and detect persons that are stance-irrelevant about the topic of interest. Their convergence proofs are presented such that the identification result converges to a local optimum. Evaluations based on real-world topics demonstrate the effectiveness of SCIFNET, and show that it outperforms well-known clustering and community detection approaches.

The proposed method has the following advantages over the current community detection research. First, most iterative clustering-based community detection methods, such as those in [20], [31], [48], would suffer the early merging problems of a node in a network tending to be merged (clustered) with a community simply because it is close to the community's seed. To get rid of this type of problem, we design the stance community refinement which iteratively refines the detected communities. Second, nodes in a social (friendship) network can play different roles. Differing from the overlapping node, bridge node, and hub node investigated in [13], [14], [21], the proposed method is able to identify stance-irrelevant nodes which stand for persons neutral to the stances of a topic. Finally, since topic persons may have opposing orientations, the constructed friendship network could have negative edges. While several community detection methods, such as [13], [21], [32] analyze network structures to infer communities, our method further examines edge signs to correctly detect stance communities of topic persons.

The remainder of this paper is organized as follows. In the next section, we review related works. Then, we describe SCIFNET in detail, and demonstrate its efficiency in experimental section. Final section contains our conclusions.

Section snippets

Related work

Our research is related to community detection [41]. Given a network of interests, the community detection task involves identifying sub-networks, each of which represents a coherent community [12], [24], [36], [39]. For instance, given a social network, community detection methods identify groups of people with similar preferences [41]. The identified communities are useful to comprehend various social phenomena, such as epidemic spreading [43], and human interactions [14], [15], [40], [42].

Methodology

We proposed a stance community identification method, SCIFNET, which clusters the persons mentioned in topic documents into stance-coherent communities. Fig. 2 shows SCIFNET's system architecture, which is comprised of three components: friendship network construction, stance community expansion, and stance community refinement. Specifically, given a set of documents reporting a topic with K stance communities, SCIFNET first extracts the topic persons mentioned in the documents. Then, it

Experiment

In this section, we introduce the data corpus used in the experiments; demonstrate the effectiveness of each system component; and compare our method's performance with those of other well-known community detection methods and clustering algorithms. Then, we present a stance community identification result and discuss the stance-irrelevant persons detected by our method.

Concluding remarks

The Internet has become a crucial medium for disseminating and acquiring the latest information about topics. However, users are often overwhelmed by the enormous number of topic documents. Basically, times, places, and persons are the key elements of topics. Knowing the associations of topic persons can help readers construct the background knowledge of a topic and comprehend numerous topic documents quickly. In this paper, we defined the problem of stance community identification, which

Acknowledgement

This research was supported in part by MOST 103-2221-E-002-106-MY2 from the Ministry of Science and Technology, Republic of China.

Reference (56)

CuiY. et al.
Uncovering overlapping community structures by the key bi-community and intimate degree in bipartite networks
Physica A
(2014)
CuiY. et al.
Detecting overlapping communities in networks using the maximal sub-graph and the clustering coefficient
Physica A
(2014)
CuiY. et al.
Detecting community structure via the maximal sub-graphs and belonging degrees in complex networks
Physica A
(2014)
J. Eustace et al.
Approximating web communities using sub-space decomposition
Knowl. Based Syst.
(2014)
J. Eustace et al.
Community detection using local neighborhood in complex networks
Physica A
(2015)
J. Eustace et al.
Overlapping community detection using neighborhood ratio matrix
Physica A
(2015)
LiuX. et al.
Co-authorship networks in the digital library research community
Inf. Process. Manag.
(2005)
LiJ. et al.
Detecting overlapping communities by seed community in weighted complex networks
Physica A
(2013)
LiJ. et al.
Uncovering the overlapping community strcuture of complex networks by maximal cliques
Physica A
(2014)
M. Oussalah et al.
An automated system for grammatical analysis of Twitter messages: A learning task application
Knowl. Based Syst.
(2016)

WangX. et al.

Detecting communities by the core-vertex and initimate degree in complex networks

Physica A

(2013)

J. Allan et al.

Topic detection and tracking pilot study final report

P. Anchuri et al.

Communities and balance in signed networks: a spectral approach

I. Antonellis et al.

Simrank++: query rewriting through link analysis of the click graph

D. Barber

Bayesian Reasoning and Machine Learning

(2012)

ChenC.C. et al.

TSCAN: a novel method for topic summarization and content anatomy

ChenC.C. et al.

TSCAN: a content anatomy approach to temporal topic summarization

IEEE Trans. Knowl. Data Eng.

(2012)

ChenC.C. et al.

An unsupervised approach for person name bipolarization using principal component analysis

IEEE Trans. Knowl. Data Eng.

(2012)

ChenC.C. et al.

Bipolar person name identification of topic documents using principal component analysis

ChenJ. et al.

Detecting communities in social networks using max-min modularity

ChenJ. et al.

Local Community Identification in Social Networks

ChinA. et al.

A social hypertext model for finding community in blogs

A. Clauset et al.

Finding community structure in very large networks

Phys. Rev. E

(2004)

DingC.H.Q. et al.

A min-max cut algorithm for graph partitioning and data clustering

DingZ. et al.

Overlapping community detection based on network decomposition

Sci. Rep.

(2016)

W.E. Donath et al.

Lower bounds for the partitioning of graphs

IBM J. Res. Dev.

(1973)

FengA. et al.

Finding and linking incidents in news

GaoJ. et al.

On community outliers and their efficient detection in information networks

Cited by (5)

Using Multi-task Deep Neural Network to Explore Person Interaction from Social Media
2022, Proceedings - 2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2022
Stance detection: A survey
2020, ACM Computing Surveys
Stance detection on tweets: An SVM-based approach
2018, arXiv
Stance Detection in Turkish Tweets
2017, arXiv
Stance detection in Turkish tweets
2017, CEUR Workshop Proceedings

View full text

SCIFNET: Stance community identification of topic persons using friendship network analysis

Abstract

Introduction

Section snippets

Related work

Methodology

Experiment

Concluding remarks

Acknowledgement

Physica A

Physica A

Physica A

Knowl. Based Syst.

Physica A

Physica A

Inf. Process. Manag.

Physica A

Physica A

Knowl. Based Syst.

Physica A

Topic detection and tracking pilot study final report

Communities and balance in signed networks: a spectral approach

Simrank++: query rewriting through link analysis of the click graph

Bayesian Reasoning and Machine Learning

TSCAN: a novel method for topic summarization and content anatomy

TSCAN: a content anatomy approach to temporal topic summarization

IEEE Trans. Knowl. Data Eng.

An unsupervised approach for person name bipolarization using principal component analysis

IEEE Trans. Knowl. Data Eng.

Bipolar person name identification of topic documents using principal component analysis

Detecting communities in social networks using max-min modularity

Local Community Identification in Social Networks

A social hypertext model for finding community in blogs

Finding community structure in very large networks

Phys. Rev. E

A min-max cut algorithm for graph partitioning and data clustering

Overlapping community detection based on network decomposition

Sci. Rep.

Lower bounds for the partitioning of graphs

IBM J. Res. Dev.

Finding and linking incidents in news

On community outliers and their efficient detection in information networks