Skip to main content
Log in

Comparative news summarization using concept-based optimization

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Comparative news summarization aims to highlight the commonalities and differences between two comparable news topics by using human-readable sentences. The summary ought to focus on the salient comparative aspects of both topics, and at the same time, it should describe the representative properties of each topic appropriately. In this study, we propose a novel approach for generating comparative news summaries. We consider cross-topic pairs of semantic-related concepts as evidences of comparativeness and consider topic-related concepts as evidences of representativeness. The score of a summary is estimated by summing up the weights of evidences in the summary. We formalize the summarization task as an optimization problem of selecting proper sentences to maximize this score and address the problem by using a mixed integer programming model. The experimental results demonstrate the effectiveness of our proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. http://www.amazon.com.

  2. http://www.newegg.com.

  3. http://news.google.com.

  4. http://www.bing.com/news.

  5. http://news.yahoo.com.

  6. http://nlp.stanford.edu/software/corenlp.shtml.

  7. http://gate.ac.uk/.

  8. http://alias-i.com/lingpipe/.

  9. http://www-01.ibm.com/software/integration/optimization/cplex-optimizer/.

  10. http://www.gurobi.com/.

  11. http://www.gnu.org/software/glpk/.

  12. http://news.google.com.

  13. http://www.berouge.com/.

  14. The modification only alters the word filter to allow Chinese words.

References

  1. Allan J (ed) (2002) Topic detection and tracking: event-based information organization. Kluwer Academic Publishers, Dordrecht

    Google Scholar 

  2. Anttila R (1989) Historical and comparative linguistics (current issues in linguistic theory). John Benjamins Pub Co., Amsterdam

  3. Balahur A, Steinberger R, Kabadjov M et al. (2010) Sentiment analysis in the news. In: Proceedings of the seventh international conference on language resources and evaluation

  4. Bao S, Li R, Yu Y et al. (2008) Competitor mining with the web. IEEE Trans Knowl Data Eng 20(10): 1297–1310

    Google Scholar 

  5. Barzilay R, McKeown KR (2005) Sentence fusion for multidocument news summarization. Comput Linguist 31(3):297–328

    Article  Google Scholar 

  6. Black CE (1966) Dynamics of modernization. A study in comparative history. Harper & Row, New York

    Google Scholar 

  7. Carbonell J and Goldstein J (1998) The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 335–336

  8. Dang HT and Owczarzak K (2008) Overview of the TAC 2008 update summarization task. In: Proceedings of the first text analysis conference (TAC 2008). National Institute of Standards and Technology, Gaithersburg, MD, USA

  9. Dantzig GB (1949) Programming of interdependent activities: II mathematical model. Econometrica 17(3):200–211

    Article  MathSciNet  Google Scholar 

  10. Das AS, Datar M, Garg A et al. (2007) Google news personalization: scalable online collaborative filtering. In: Proceedings of the 16th international conference on World Wide Web. AMC, New York, NY, USA, pp 271–280

  11. Gillick D, Favre B, Hakkani-Tur D (2008) The ICSI Summarization System at TAC 2008. In: Proceedings of the first text analysis conference (TAC 2008). National Institute of Standards and Technology, Gaithersburg, MD, USA

  12. Gunes E, Radev DR (2004) LexPageRank: prestige in multi-document text summarization. In: Proceedings of the 2004 conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 365–371

  13. Hovy E, and Lin C-Y (1997) Automated text summarization in SUMMARIST. In: Proceedings of ACL’1997/EACL’1997 workshop on intelligent scalable text summarization

  14. Jain A and Pantel P (2009) How do they compare? Automatic identification of comparable entities on the web. In: Proceedings of the 18th ACM conference on information and knowledge management. ACM, New York, NY, USA, pp 1661–1664

  15. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Inc, Upper Saddle River

    MATH  Google Scholar 

  16. Jindal N and Liu B (2006a) Identifying comparative sentences in text documents. In: Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, USA, pp 244–251

  17. Jindal N and Liu B (2006b) Mining comparative sentences and relations. In: Proceedings of the 21st national conference on, artificial intelligence (AAAI-06). pp 1331–1336

  18. Karmarkar N (1984) A new polynomial-time algorithm for linear programming. Combinatorica 4(4): 373–395

    Google Scholar 

  19. Kawai Y, Kumamoto T, Tanaka K (2007) Fair news reader: recommending news articles with different sentiments based on user preference. Knowl Intell Inf Eng Syst 4692:612–622

    Google Scholar 

  20. Kennedy C (2005) Comparatives, semantics of. In: Allen K (ed) Enclycopedia of language and linguistics, 2nd edn. Elsevier, Oxford

  21. Khachian LG (1979) A polynomial algorithm in linear programming. Doklady Akademiia Nauk SSSR 244:1093–1096

    Google Scholar 

  22. Kim HD, Zhai C (2009) Generating comparative summaries of contradictory opinions in text. In: Proceedings of the 18th ACM conference on information and knowledge management. ACM, New York, NY, USA, pp 385–394

  23. Lerman K, McDonald R (2009) Contrastive summarization: an experiment with consumer reviews. In: Proceedings of human language technologies: the 2009 annual conference of the North American chapter of the association for computational linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 113–116

  24. Lerner J-Y and Pinkal Ms (2004). Comparatives and nested quantifications. In: Semantics: critical concepts in linguistics vol V, operators and sentence types, pp 70–87

  25. Li R, Bao S, Wang J et al (2006) CoMiner: an effective algorithm for mining competitors from the web. In: Proceedings of the sixth IEEE international conference on data mining (ICDM’06). IEEE Computer Society, Washington, DC, USA, pp 948–952

  26. Lijphart A (1971) Comparative politics and the comparative method. Am Polit Sci Rev 65(3):682–693

    Article  Google Scholar 

  27. Lin C-Y and Hovy E (2003) Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 71–78

  28. Lin C-Y and Och FJ (2004) Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In: Proceedings of the 42nd annual meeting on association for computational linguistics. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 605

  29. Liu B, Hu M, Cheng J (2005) Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the14th international conference on world wide web. ACM, New York, NY, USA, pp 342–351

  30. Liu J, Wagner E, Birnbaum L (2007) Compare & contrast: using the web to discover comparable cases for news stories. In: Proceedings of the 16th international conference on World Wide Web. ACM, New York, NY, USA, pp 541–550

  31. Liu Q, Li S (2002) Word similarity computing based on How-net. Comput Linguist Chinese Lang Process 7(2):59–76

    Google Scholar 

  32. Ma J (1898) Ma Shi Wen Tong. Commercial Press, Shanghai

    Google Scholar 

  33. Mani I (2001) Automatic summarization, vol 3. John Benjamins Pub Co., Amsterdam

    Book  MATH  Google Scholar 

  34. Mcdonald R (2007) A study of global inference algorithms in multi-document summarization. In: Proceedings of the 29th European conference on IR research. Springer, Berlin, Heidelberg, pp 557–564

  35. Mei J-P, Chen L (2012) SumCR: a new subtopic-based extractive approach for text summarization. Knowl Inf Syst 31(3):527–545

    Article  MathSciNet  Google Scholar 

  36. Mihalcea R (2004) Graph-based ranking algorithms for sentence extraction, applied to text summarization. In: Proceedings of the ACL 2004 on interactive poster and demonstration sessions. Association for Computational Linguistics, Stoudsburg, PA, USA, pp 20

  37. Montes-y-Gómez M, Gelbukh A, López-López A (2001) Mining the news: trends, associations, and deviations. Computación y Sistemas 5(1):14–24

    Google Scholar 

  38. Nigam K, McCallum AK, Thrun S et al (2000) Text classification from labeled and unlabeled documents using EM. Mach Learn 39(2–3):103–134

    Article  MATH  Google Scholar 

  39. NIST (2002) The 2002 topic detection and tracking (TDT2002) task definition and evaluation plan

  40. Nomoto T, Matsumoto Y (2001) A new approach to unsupervised text summarization. In: Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval. AMC, New York, NY, USA, pp 26–34

  41. Pablo-Sánchez Cd, Segura-Bedmar I, Martínez P et al (2012) Lightly supervised acquisition of named entities and linguistic patterns for multilingual text mining. Knowl Inf Syst, pp 1–23

  42. Paul MJ, Zhai C, Girju R (2010) Summarizing contrastive viewpoints in opinionated text. In: Proceedings of the 2010 conference on empirical methods in natural language processing. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 66–76

  43. Pedersen T, Patwardhan S, Michelizzi J (2004) WordNet:Similarity: measuring the relatedness of concepts. In: Demonstration papers at HLT-NAACL 2004. Association for Computational Linguistics, Stroudsburg, PA, USA, pp 38–41

  44. Rau LF, Jacobs PS, Zernik U (1989) Information extraction and text summarization using linguistic knowledge acquisition. Inf Process Manag 25(4):419–428

    Article  Google Scholar 

  45. Roy JR (2000) Compare and contrast, Grades 5-6: using comparisons and contrasts to build comprehension. Instructional fair, Inc., Grand Rapids, MI

  46. Salton G, Singhal A, Mitra M et al (1997) Automatic text structuring and summarization. Inf Process Manag 33(2):193–207

    Article  Google Scholar 

  47. Shen D, Sun J-T, Li H et al (2007) Document summarization using conditional random fields. In: Proceedings of the 20th international joint conference on artifical intelligence. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA, pp 2862–2867

  48. Sun A, Grishman R, Sekine S (2011) Semi-supervised relation extraction with large-scale word clustering. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1 . Association for Computational Linguistics, Stroudsburg, PA, USA, pp 521–529

  49. Tsai F, Zhang Y (2011) D2S: document-to-sentence framework for novelty detection. Knowl Inf Syst 29(2):419–433

    Article  Google Scholar 

  50. Wan X, Jia H, Huang S et al (2011) Summarizing the differences in multilingual news. In: Proceedings of the 34th annual ACM SIGIR conference (SIGIR 2011). ACM, New York, NY, USA, pp 735–744

  51. Wan X, Yang J, Xiao J (2007) Manifold-ranking based topic-focused multi-document summarization. In: Proceedings of the 20th international joint conference on artifical, intelligence, pp 2903–2908

  52. Wang D, Zhu S, Li T et al (2009) Comparative document summarization via discriminative sentence selection. In: Proceedings of the 18th ACM conference on information and knowledge management. ACM, New York, NY, USA, pp 1963–1966

  53. Wayne CL (1998) Topic detection & tracking (TDT) overview & perspective. In: Proceedings of the broadcast news transcription and understanding workshop

  54. Wei F, Li W, Lu Q et al (2010) A document-sensitive graph model for multi-document summarization. Knowl Inf Syst 22(2):245–259

    Article  Google Scholar 

  55. Weisstein U (1974) Comparative literature and literary theory: survey and introduction. Indiana University Press, Indiana

    Google Scholar 

  56. Witte R, Bergler S (2007) Next-generation summarization: contrastive, focused, and update summaries. In: International conference on recent advances in natural language processing (RANLP 2007)

  57. Wood MK, Dantzig GB (1949) Programming of interdependent activities: I general discussion. Econometrica 17(3):193–199

    Google Scholar 

  58. Yeh J-Y, Ke H-R, Yang W-P et al (2005) Text summarization using a trainable summarizer and latent semantic analysis. Inf Process Manag 41(1):75–95

    Article  Google Scholar 

  59. Zhai C, Velivelli A, Yu B (2004) A cross-collection mixture model for comparative text mining, In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining . ACM, New York, NY, USA, pp 743–748

Download references

Acknowledgments

This work is supported by NSFC (61170166), Beijing Nova Program (2008B03), and National High Technology Research and Development Program of China (2012AA011101). We thank IBM for making the CPLEX optimizer freely access for academic use.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaojun Wan.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, X., Wan, X. & Xiao, J. Comparative news summarization using concept-based optimization. Knowl Inf Syst 38, 691–716 (2014). https://doi.org/10.1007/s10115-012-0604-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-012-0604-8

Keywords

Navigation