Multilingual opinion holder identification using author and authority viewpoints

https://doi.org/10.1016/j.ipm.2008.11.004Get rights and content

Abstract

Opinion holder identification research is important for discriminating between opinions that are viewed from different perspectives. We propose a new opinion holder identification method that is based on a differentiation between the author and authority viewpoints in opinionated sentences. In our method, the author- and authority-opinionated sentences were extracted, respectively, by utilizing the different features because their writing styles were different. Although the researchers have not focused on it, this differentiation is important for correctly identifying opinion holders. We describe our participation in the NTCIR-6 Opinion Analysis Pilot Task by focusing on the opinion holder identification results in Japanese and English. The evaluation results showed that our system performed fairly well with respect to Japanese documents, and postsubmission analysis has revealed that improvements could be made with respect to English documents as well.

Introduction

Opinion extraction research is a promising field for public opinion and reputation analysis using mass media or word-of-mouth sources on the web. The relevant research is divided into three subcategories. The first subcategory is opinion detection, which involves detection of opinionated documents and sentence/phrase extraction (Wiebe, Wilson, Bruce, Bell, & Martin, 2004). The second subcategory is polarity detection, which involves positive or negative document classification and positive, neutral, or negative sentence/phrase detection (Wilson, Wiebe, & Hoffmann, 2005) The third subcategory is opinion holder identification (Choi et al., 2005, Kim and Hovy, 2006a, Kim and Hovy, 2006b, Stoyanov and Cardie, 2006a, Stoyanov and Cardie, 2006b). Opinion holder identification research is important because news articles contain many opinions from different opinion holders. By grouping opinion holders, we can discriminate between the opinions that are viewed from different perspectives and bridge cultural gaps between groups, organizations, and countries.

A lot of research (Choi et al., 2005, Kim and Hovy, 2006a, Kim and Hovy, 2006b, Stoyanov and Cardie, 2006a, Stoyanov and Cardie, 2006b) on opinion holder identification1 has recently been conducted. Choi et al. (2005) proposed an opinion holder identification method that utilizes conditional random fields (CRFs) with features from part-of-speech information, such as nouns (for opinion holders), and from syntactic dependency information on the semantic classes (for opinionated phrases). For opinion holder annotation, Choi et al. (2005) used only explicit opinion holders in the MPQA Corpus (Wiebe et al., 2006) for their correct answer set. Therefore, their simple baseline used the noun phrases directly related to an expression of opinion and attained high F-values, between 0.4 and 0.5. However, these researchers have only slightly focused on the inexplicit opinion holder elements, such as the anaphoric elements within a document or the exophoric elements2, such as the authors.

The first Opinion Analysis Pilot Task was conducted (Seki et al., 2007) at the sixth NTCIR (NII Test Collection for IR Systems) Workshop in 2006–2007.3 In the NTCIR-6 corpus, the opinion holder identification task was slightly more challenging because it was defined from the viewpoint of opinionated sentences; i.e. not only the explicit noun phrases in the sentences, but also the inexplicit noun phrases, such as the anaphoric elements within a document, or the exophoric elements, such as the authors, were used for the evaluation. This difficulty was easily proved because Choi et al. (the authors of Choi et al., 2005) participated in this task as members of the Cornell team and they only attained F-values of 0.222 based on lenient standards, as shown in Table 6 in Section 3.3.

We propose a new method for the identification of opinion holders based on the differentiation between the author and authority viewpoints. Here, “author” means the writer of the document, and “authority” means the third parties (not only the authoritative institutions such as the government agencies, but also any people or organizations other than authors). This differentiation is important to gather opinions from texts, partly because some sentiment analysis applications, such as public opinion surveys, seek to find the correct number of “sentiments” expressed by the texts to estimate the number of people who have the same opinions. Other opinion mining applications, such as the reputation analysis from blogs or review sites, also seek to find all the opinion units.

Previous research studies have tackled opinion holder identification using explicit clues and sequences related to opinion phrases or holders. They have focused very little on the discrimination between the author and authority viewpoints, because of a lack of genre-independent author clues. To differentiate between the author and authority viewpoints, we focused on the writing style differences between the author- and authority-opinionated sentences. Here, we define writing style as the difference in syntactic constructs or term usages. For example, several term pairs (“may be”, “can say”) or terms (“should”, “must”) tend to be used in author-opinionated sentences, but not to be used in authority-opinionated sentences. This approach is applicable to a wide variety of genres and multilingualy. We implemented two types of opinionated sentence classifiers, for Japanese and English. This paper is organized as follows. In Section 2, we provide an overview of the NTCIR-6 Opinion Analysis Pilot Task. Section 3 describes our approach in NTCIR-6 to Japanese and English opinion extraction, the evaluation results, and the postsubmission analysis of the opinion types. In Sections 4 Revised approach based on postsubmission discussion, 5 Comparison experiments based on revised approach, we give the revised approach and the comparison experiments. Finally, we present our conclusion in Section 6.

Section snippets

Task and annotation overview

The opinion extraction task was conducted in Japanese, English, and Chinese. For opinion extraction, the participants submitted two mandatory results: opinionated sentence extraction and opinion holder identification; and two optional results: relevant sentence judgment and polarity judgment. Six teams in English, three in Japanese, and five in Chinese (14 teams in total) submitted 21 runs. The test collection sizes for Japanese and English, which used in this paper, are shown in Table 1.

Our approach: opinion holder extraction based on author’s opinions and authority opinion extraction

To identify the author and authority opinion holders, we constructed the three-step system shown in Fig. 1. In this section, we describe the author- and authority-opinionated sentence classification, holder identification rules, and evaluation results in the NTCIR-6 Opinion Analysis Pilot Task and the postsubmission analysis of the opinion holder identification we conducted to investigate the effectiveness of our approach.

Revised approach based on postsubmission discussion

To investigate the effectiveness of our approach, we revised it, and conducted comparison experiments with other systems.

Comparison experiments

To investigate the improvements based on the revision in Section 4, we evaluated our revised approach for opinion holder extraction. For comparison, we implemented a baseline system based on opinionated sentences that did not differentiate between the author and authority opinion and the holder extraction algorithm from authority opinion, which was explained in Section 4.1. The features for the opinionated sentence extraction were selected based on χ-square tests on the opinionated sentences in

Conclusions

We have proposed an opinion holder identification system using the author and authority viewpoints in both Japanese and English. We participated in the NTCIR-6 Opinion Analysis Pilot Task and evaluated the effectiveness of our system. The results show that our system performed fairly well with respect to Japanese documents. We also found that our system performed less well with respect to the English documents in NTCIR-6, but that it could be improved by revising opinion holder extraction rules

Acknowledgements

This work was partially supported by Grants-in-Aid for Young Scientists (B) (#18700241) from the Ministry of Education, Culture, Sports, Science, and Technology, Japan. We appreciate the valuable efforts of all the participants involved in the NTCIR-6 Opinion Analysis Task.

References (23)

  • Bautin, M., Vijayarenu, L., & Skiena, S. (2008). International sentiment analysis for news and blogs. In Proceedings of...
  • Bloom, K., Garg, N., & Argamon, S. (2006). Extracting appraisal expressions. In Proceedings of the human language...
  • Bloom, K., Stein, S., & Argamon, S. (2007). Appraisal extraction for news opinion analysis at ntcir-6. In Kando and...
  • Breck, E., Choi, Y., Stoyanov, V., & Cardie, C. (2007). Cornell system description for the ntcir-6 opinion task. In...
  • Choi, Y., Cardie, C., Riloff, E., & Patwardhan, S. (2005). Identifying sources of opinions with conditional random...
  • Hatzivassiloglou, V., & Wiebe, J. M. (2000). Lists of manually and automatically identified gradable, polar, and...
  • Kando, N., & Evans, D. K. (Eds.) (2007). In Proceedings of the sixth NTCIR workshop meeting on evaluation of...
  • Kim, S. M., & Hovy, E. (2006a). Extracting opinions, opinion holders, and topics expressed in online news media text....
  • Kim, S.-M., & Hovy, E. (2006b). Identifying and analyzing judgment opinions. In Proceedings of the human language...
  • Kim, Y., & Myaeng, S.-H. (2007). Opinion analysis based on lexical clues and their expansion. In Kando and Evans, 2007...
  • Kudo, T., Matsumoto, Y. (2002). Japanese dependency analysis using cascaded chunking. In Proceedings of the sixth...
  • Cited by (0)

    View full text