Abstract
Comparative news summarization aims to highlight the commonalities and differences between two comparable news topics by using human-readable sentences. The summary ought to focus on the salient comparative aspects of both topics, and at the same time, it should describe the representative properties of each topic appropriately. In this study, we propose a novel approach for generating comparative news summaries. We consider cross-topic pairs of semantic-related concepts as evidences of comparativeness and consider topic-related concepts as evidences of representativeness. The score of a summary is estimated by summing up the weights of evidences in the summary. We formalize the summarization task as an optimization problem of selecting proper sentences to maximize this score and address the problem by using a mixed integer programming model. The experimental results demonstrate the effectiveness of our proposed model.
Similar content being viewed by others
Notes
The modification only alters the word filter to allow Chinese words.
References
Allan J (ed) (2002) Topic detection and tracking: event-based information organization. Kluwer Academic Publishers, Dordrecht
Anttila R (1989) Historical and comparative linguistics (current issues in linguistic theory). John Benjamins Pub Co., Amsterdam
Balahur A, Steinberger R, Kabadjov M et al. (2010) Sentiment analysis in the news. In: Proceedings of the seventh international conference on language resources and evaluation
Bao S, Li R, Yu Y et al. (2008) Competitor mining with the web. IEEE Trans Knowl Data Eng 20(10): 1297–1310
Barzilay R, McKeown KR (2005) Sentence fusion for multidocument news summarization. Comput Linguist 31(3):297–328
Black CE (1966) Dynamics of modernization. A study in comparative history. Harper & Row, New York
Carbonell J and Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 335–336
Dang HT and Owczarzak K (2008) Overview of the TAC 2008 update summarization task. In: Proceedings of the first text analysis conference (TAC 2008). National Institute of Standards and Technology, Gaithersburg, MD, USA
Dantzig GB (1949) Programming of interdependent activities: II mathematical model. Econometrica 17(3):200–211
Das AS, Datar M, Garg A et al. (2007) Google news personalization: scalable online collaborative filtering. In: Proceedings of the 16th international conference on World Wide Web. AMC, New York, NY, USA, pp 271–280
Gillick D, Favre B, Hakkani-Tur D (2008) The ICSI Summarization System at TAC 2008. In: Proceedings of the first text analysis conference (TAC 2008). National Institute of Standards and Technology, Gaithersburg, MD, USA
Gunes E, Radev DR (2004) LexPageRank: prestige in multi-document text summarization. In: Proceedings of the 2004 conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 365–371
Hovy E, and Lin C-Y (1997) Automated text summarization in SUMMARIST. In: Proceedings of ACL’1997/EACL’1997 workshop on intelligent scalable text summarization
Jain A and Pantel P (2009) How do they compare? Automatic identification of comparable entities on the web. In: Proceedings of the 18th ACM conference on information and knowledge management. ACM, New York, NY, USA, pp 1661–1664
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc, Upper Saddle River
Jindal N and Liu B (2006a) Identifying comparative sentences in text documents. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, USA, pp 244–251
Jindal N and Liu B (2006b) Mining comparative sentences and relations. In: Proceedings of the 21st national conference on, artificial intelligence (AAAI-06). pp 1331–1336
Karmarkar N (1984) A new polynomial-time algorithm for linear programming. Combinatorica 4(4): 373–395
Kawai Y, Kumamoto T, Tanaka K (2007) Fair news reader: recommending news articles with different sentiments based on user preference. Knowl Intell Inf Eng Syst 4692:612–622
Kennedy C (2005) Comparatives, semantics of. In: Allen K (ed) Enclycopedia of language and linguistics, 2nd edn. Elsevier, Oxford
Khachian LG (1979) A polynomial algorithm in linear programming. Doklady Akademiia Nauk SSSR 244:1093–1096
Kim HD, Zhai C (2009) Generating comparative summaries of contradictory opinions in text. In: Proceedings of the 18th ACM conference on information and knowledge management. ACM, New York, NY, USA, pp 385–394
Lerman K, McDonald R (2009) Contrastive summarization: an experiment with consumer reviews. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 113–116
Lerner J-Y and Pinkal Ms (2004). Comparatives and nested quantifications. In: Semantics: critical concepts in linguistics vol V, operators and sentence types, pp 70–87
Li R, Bao S, Wang J et al (2006) CoMiner: an effective algorithm for mining competitors from the web. In: Proceedings of the sixth IEEE international conference on data mining (ICDM’06). IEEE Computer Society, Washington, DC, USA, pp 948–952
Lijphart A (1971) Comparative politics and the comparative method. Am Polit Sci Rev 65(3):682–693
Lin C-Y and Hovy E (2003) Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 71–78
Lin C-Y and Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd annual meeting on association for computational linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 605
Liu B, Hu M, Cheng J (2005) Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the14th international conference on world wide web. ACM, New York, NY, USA, pp 342–351
Liu J, Wagner E, Birnbaum L (2007) Compare & contrast: using the web to discover comparable cases for news stories. In: Proceedings of the 16th international conference on World Wide Web. ACM, New York, NY, USA, pp 541–550
Liu Q, Li S (2002) Word similarity computing based on How-net. Comput Linguist Chinese Lang Process 7(2):59–76
Ma J (1898) Ma Shi Wen Tong. Commercial Press, Shanghai
Mani I (2001) Automatic summarization, vol 3. John Benjamins Pub Co., Amsterdam
Mcdonald R (2007) A study of global inference algorithms in multi-document summarization. In: Proceedings of the 29th European conference on IR research. Springer, Berlin, Heidelberg, pp 557–564
Mei J-P, Chen L (2012) SumCR: a new subtopic-based extractive approach for text summarization. Knowl Inf Syst 31(3):527–545
Mihalcea R (2004) Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL 2004 on interactive poster and demonstration sessions. Association for Computational Linguistics, Stoudsburg, PA, USA, pp 20
Montes-y-Gómez M, Gelbukh A, López-López A (2001) Mining the news: trends, associations, and deviations. Computación y Sistemas 5(1):14–24
Nigam K, McCallum AK, Thrun S et al (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2–3):103–134
NIST (2002) The 2002 topic detection and tracking (TDT2002) task definition and evaluation plan
Nomoto T, Matsumoto Y (2001) A new approach to unsupervised text summarization. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. AMC, New York, NY, USA, pp 26–34
Pablo-Sánchez Cd, Segura-Bedmar I, Martínez P et al (2012) Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining. Knowl Inf Syst, pp 1–23
Paul MJ, Zhai C, Girju R (2010) Summarizing contrastive viewpoints in opinionated text. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 66–76
Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet:Similarity: measuring the relatedness of concepts. In: Demonstration papers at HLT-NAACL 2004. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 38–41
Rau LF, Jacobs PS, Zernik U (1989) Information extraction and text summarization using linguistic knowledge acquisition. Inf Process Manag 25(4):419–428
Roy JR (2000) Compare and contrast, Grades 5-6: using comparisons and contrasts to build comprehension. Instructional fair, Inc., Grand Rapids, MI
Salton G, Singhal A, Mitra M et al (1997) Automatic text structuring and summarization. Inf Process Manag 33(2):193–207
Shen D, Sun J-T, Li H et al (2007) Document summarization using conditional random fields. In: Proceedings of the 20th international joint conference on artifical intelligence. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, pp 2862–2867
Sun A, Grishman R, Sekine S (2011) Semi-supervised relation extraction with large-scale word clustering. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1 . Association for Computational Linguistics, Stroudsburg, PA, USA, pp 521–529
Tsai F, Zhang Y (2011) D2S: document-to-sentence framework for novelty detection. Knowl Inf Syst 29(2):419–433
Wan X, Jia H, Huang S et al (2011) Summarizing the differences in multilingual news. In: Proceedings of the 34th annual ACM SIGIR conference (SIGIR 2011). ACM, New York, NY, USA, pp 735–744
Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. In: Proceedings of the 20th international joint conference on artifical, intelligence, pp 2903–2908
Wang D, Zhu S, Li T et al (2009) Comparative document summarization via discriminative sentence selection. In: Proceedings of the 18th ACM conference on information and knowledge management. ACM, New York, NY, USA, pp 1963–1966
Wayne CL (1998) Topic detection & tracking (TDT) overview & perspective. In: Proceedings of the broadcast news transcription and understanding workshop
Wei F, Li W, Lu Q et al (2010) A document-sensitive graph model for multi-document summarization. Knowl Inf Syst 22(2):245–259
Weisstein U (1974) Comparative literature and literary theory: survey and introduction. Indiana University Press, Indiana
Witte R, Bergler S (2007) Next-generation summarization: contrastive, focused, and update summaries. In: International conference on recent advances in natural language processing (RANLP 2007)
Wood MK, Dantzig GB (1949) Programming of interdependent activities: I general discussion. Econometrica 17(3):193–199
Yeh J-Y, Ke H-R, Yang W-P et al (2005) Text summarization using a trainable summarizer and latent semantic analysis. Inf Process Manag 41(1):75–95
Zhai C, Velivelli A, Yu B (2004) A cross-collection mixture model for comparative text mining, In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining . ACM, New York, NY, USA, pp 743–748
Acknowledgments
This work is supported by NSFC (61170166), Beijing Nova Program (2008B03), and National High Technology Research and Development Program of China (2012AA011101). We thank IBM for making the CPLEX optimizer freely access for academic use.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Huang, X., Wan, X. & Xiao, J. Comparative news summarization using concept-based optimization. Knowl Inf Syst 38, 691–716 (2014). https://doi.org/10.1007/s10115-012-0604-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-012-0604-8