Skip to main content
Log in

Optimizing word set coverage for multi-event summarization

  • Published:
Journal of Combinatorial Optimization Aims and scope Submit manuscript

Abstract

We have witnessed the proliferation of the Internet over the past few decades. A large amount of textual information is generated on the Web. It is impossible to locate and digest all the latest updates available on the Web for individuals. Text summarization would provide an efficient way to generate short, concise abstracts from the massive documents. These massive documents involve many events which are hard to be identified by the summarization procedure directly. We propose a novel methodology that identifies events from these text corpora and creates summarization for each event. We employ a probabilistic, topic model to learn the potential topics from the massive documents and further discover events in terms of the topic distributions of documents. To target the summarization, we define the word set coverage problem (WSCP) to capture the most representative sentences to summarize an event. For getting solution of the WSCP, we propose an approximate algorithm to solve the optimization problem. We conduct a set of experiments to evaluate our proposed approach on two real datasets: Sina news and Johnson & Johnson medical news. On both datasets, our proposed method outperforms competitive baselines by considering the harmonic mean of coverage and conciseness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://www.sina.com.cn/.

References

  • Ablanedo-Rosas Rego (2010) Surrogate constraint normalization for the set covering problem. Eur J Oper Res 205:540–551

    Article  MATH  MathSciNet  Google Scholar 

  • Alguliev RM, Aliguliyev RM, Hajirahimova MS, Mehdiyev CA (2011) Mcmr: maximum coverage and minimum redundant text summarization model. Expert Syst Appl 38:14514–14522

    Article  Google Scholar 

  • Avella P, Boccia M, Vasilyev I (2009) Computational experience with general cutting planes for the set covering problem. Oper Res Lett 37:16–20

    Article  MATH  MathSciNet  Google Scholar 

  • Balas Carrera (1996) A dynamic subgradient-based branch-and-bound procedure for set covering. Oper Res 44:875–890

    Article  MATH  MathSciNet  Google Scholar 

  • Becker H, Naaman M, Gravano L (2010) Learning similarity metrics for event identification in social media. In: Proceedings of the third ACM international conference on Web search and data mining, ACM, pp 291–300

  • Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  • Caprara A, Fischetti M, Toth P (1999) Aheuristic method for the set covering problem. Oper Res 47:730–743

    Article  MATH  MathSciNet  Google Scholar 

  • Caragiannis I, Kaklamanis C, Kyropoulou M (2013) Tight approximation bounds for combinatorial frugal coverage algorithms. J Comb Optim 26:292–309

    Article  MATH  MathSciNet  Google Scholar 

  • Chakrabarti D, Punera K (2011) Event summarization using tweets. In: ICWSM

  • Chieu HL, Ng HT (2002) A maximum entropy approach to information extraction from semi-structured and free text. In: Proceedings of the eighteenth national conference on artificial intelligence and fourteenth conference on innovative applications of artificial intelligence, Edmonton, Alberta, Canada. pp 786–791, 28 July–1 August 2002

  • Conroy JM, O’leary DP (2001) Text summarization via hidden markov models. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 406–407

  • Das D, Martins AF (2007) A survey on automatic text summarization. Lit Surv Lang Stat Course CMU 4:192–195

    Google Scholar 

  • Deng G, Lin W (2011) Ant colony optimization-based algorithm for airline crew scheduling problem. Expert Syst Appl 38:5787–579

    Article  MathSciNet  Google Scholar 

  • Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. KDD 96:226–231

    Google Scholar 

  • Fattah MA, Ren F (2008) Automatic text summarization. World Acad Sci Eng Technol 37:2008

    Google Scholar 

  • Fisher Kan R (1988) The design, analysis and implementation of heuristics. Manag Sci 34:263–265

    Article  Google Scholar 

  • Friedman JH (1997) On bias, variance, 0/1loss, and the curse-of-dimensionality. Data Min Knowl Discov 1:55–77

    Article  Google Scholar 

  • García-Hernández RA, Ledeneva Y (2009) Word sequence models for single text summarization. In: Advances in computer-human interactions, 2009. Second International Conferences on ACHI’09, IEEE, pp 44–48

  • Gupta V, Lehal GS (2010) A survey of text summarization extractive techniques. J Emerg Technol Web Intell 2:258–268

    Google Scholar 

  • Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. Appl Stat 28:100–108

    Article  MATH  Google Scholar 

  • Kruengkrai C, Jaruskulchai C (2003) Generic text summarization using local and global properties of sentences In: Web intelligence, 2003. WI 2003. Proceedings. International Conference on IEEE/WIC, IEEE, pp 201–206

  • Kupiec J, Pedersen J, Chen F (1995) A trainable document summarizer. In: Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 68–73

  • Kyoomarsi F, Khosravi H, Eslami E, Dehkordy PK, Tajoddin A (2008) Optimizing text summarization based on fuzzy logic. In: ACIS-ICIS, pp 347–352

  • Lin CY (1999) Training a selection function for extraction. In: Proceedings of the eighth international conference on information and knowledge management, ACM, pp 55–62

  • Lin J (1991) Divergence measures based on the shannon entropy. IEEE Trans Inf Theory 37:145–151

    Article  MATH  Google Scholar 

  • Radev DR, Hovy E, McKeown K (2002) Introduction to the special issue on summarization. Comput Linguist 28:399–408

    Article  Google Scholar 

  • Sakaki T, Okazaki M, Matsuo Y (2010) Earthquake shakes twitter users: real-time event detection by social sensors. In: Proceedings of the 19th international conference on World wide web, ACM, pp 851–860

  • Salton G, McGill M (1984) Introduction to modern information retrieval. McGraw-Hill Book Company, New York

    Google Scholar 

  • Svore KM, Vanderwende L, Burges CJC (2007) Enhancing single-document summarization by combining ranknet and third-party sources In EMNLP-CoNLL 2007, In: Proceedings of the 2007 joint conference on empirical methods in natural language processing and computational natural language learning, Prague, Czech Republic, pp 448–457, 28–30 June 2007

  • Takamura H, Okumura M (2009) Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the 12th conference of the european chapter of the association for computational linguistics, Association for Computational Linguistics, pp 781–789

  • Tsolmon B, Lee K (2014) An event extraction model based on timeline and user analysis in latent dirichlet allocation. In: The 37th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’14, Gold Coast, QLD, Australia, pp 1187–1190, 06–11 July 2014

  • Umetani, Yagiura (2007) Relaxation heuristics for the set covering problem. J Oper Res Soc Jpn 50:350–375

    MATH  MathSciNet  Google Scholar 

  • Yaghini M, Karimi M, Rahbar M (2013) A set covering approach for multi-depot train driver scheduling. J Comb Optim pp 1–19

Download references

Acknowledgments

This work is partially supported by the National Basic Research Program (973) of China (No. 2012CB316203) and NSFC under Grant Nos. 61402177, 61170838 and 61272036. The author would also like to thank Key Disciplines of Software Engineering of Shanghai Second Polytechnic University under Grant No. XXKZD1301 and Project of Shanghai Shen-kang Hospital Development Centre (No. 2014SKMR-04).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Gao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, J., Cheng, W., Wang, C. et al. Optimizing word set coverage for multi-event summarization. J Comb Optim 30, 996–1015 (2015). https://doi.org/10.1007/s10878-015-9855-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10878-015-9855-0

Keywords

Navigation