Skip to main content

Multi-document Text Summarization Using Topic Model and Fuzzy Logic

  • Conference paper
Book cover Machine Learning and Data Mining in Pattern Recognition (MLDM 2013)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7988))

Abstract

The automation of the process of summarizing documents plays a major rule in many applications. Automatic Text Summarization has been focused on retaining the essential information without affecting the document quality. This paper proposes a new multi-document summarization method that combines topic model and fuzzy logic model. The proposed method extracts some relevant topic words from source documents. The extracted words are used as elements of fuzzy sets. Meanwhile, each sentence on the source document is used to generate a fuzzy relevance rule that measures the importance of each sentence. A fuzzy inference system is used to generate the final summarization. Our summarization results are evaluated against some well-known summary systems and performed well in divergences and similarities.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Voorhees, E.M., Harman, D.K.: The Eight Text Retrieval Conference (TREC-8). In: National Institute of Standards and Technology (NIST) (1999)

    Google Scholar 

  2. DUC.: The Document Understanding Conference (2001-2007), http://duc.nist.gov

  3. TAC.: Text Analysis Conference (2008-present), http://www.nist.gov/tac/

  4. Fukushima, T., Okumura, M.: Text summarization challenge: text summarization in Japan. In: NAACL 2001 Workshop Automatic Summarization, pp. 51–59 (2001)

    Google Scholar 

  5. Zadeh, L.A.: Fuzzy sets. In: Yager, R.R., Ovchinnikov, S., Tong, R.M., Nguyen, H.T. (eds.) Fuzzy Sets and Applications: Selected Papers by L.A. Zadeh, pp. 29–44. Wiley and Sons (1987); Originally published in Information and Control, vol. 8, pp. 338–353. Academic Press, New York (1965)

    Google Scholar 

  6. Witte, R., Bergler, S.: Fuzzy coreference resolution for Summarization. In: 2003 International Symposium on Reference Resolution and Its Applications to Question Answering and Summarization (ARQAS), pp. 43–50. Universit Ca Foscari, Venice (2003)

    Google Scholar 

  7. Lin, C.-Y.: ROUGE: a Package for Automatic Evaluation of Summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona, Spain, July 25-26 (2004)

    Google Scholar 

  8. Suanmali, L., Salim, N., Binwahlan, M.S.: Fuzzy Logic Based Method for Improving Text Summarization. International Journal of Computer Science and Information Security (IJCSIS) 2(1) (2009)

    Google Scholar 

  9. Ravindra, G., Balakrishnan, N., Ramakrishnan, K.R.: Automatic Evaluation of Extract Summaries Using Fuzzy F-score Measure. In: 5th International Conference on Knowledge Based Computer Systems, December 19-22, pp. 487–497 (2004)

    Google Scholar 

  10. Gillick, D.: Sentence Boundary Detection and the Problem with the U.S. The Association for Computational Linguistics, 241–244 (2009)

    Google Scholar 

  11. Reynar, J.C., Ratnaparkhi, A.: A Maximum Entropy Approach to Identifying Sentence Boundaries. In: 5th Conference on Applied Natural Language Processing, Washington, D.C., March 31-April 3 (1997)

    Google Scholar 

  12. Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  13. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  14. Newman, D.: Topic modeling tool, http://code.google.com/p/topic-modeling-tool

  15. Zadeh, L.A.: Fuzzy Sets. Information and Control 8(3), 338–353 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  16. Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. In: Fuzzy Sets and Systems, pp. 9–34. Elsevier, Amsterdam (1999)

    Google Scholar 

  17. Louis, A., Nenkova, A.: Summary Evaluation without Human Models. In: Text Analysis Conference (TAC) (2008)

    Google Scholar 

  18. McKeown, K., Barzilay, R., Chen, J., Elson, D.K., Evans, D.K., Klavans, J., Nenkova, A., Schiffman, B., Sigelman, S.: Columbia’s Newsblaster: New Features and Future Directions. In: HLT-NAACL (2003)

    Google Scholar 

  19. Timothy, D.R., Allison, T., Blair-goldensohn, S., Blitzer, J., Elebi, A., Dimitrov, S., Drabek, E., Hakim, A., Lam, W., Liu, D., Otterbacher, J., Qi, H., Saggion, H., Teufel, S., Winkel, A., Zhang, Z.: MEAD a platform for multidocument multilingual text summarization. In: International Conference on Language Resources and Evaluation (2004)

    Google Scholar 

  20. Conroy, J.M., Schlesinger, J.D., O’Leary, D.P.: Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score. In: The ACL 2006 / COLING 2006 (2006)

    Google Scholar 

  21. SIMetrix: Summary Input similarity Metrics, http://www.cis.upenn.edu/~lannie/IEval2.html

  22. Summerscales, R.L., Argamon, S., Bai, S., Huperff, J., Schwartzff, A.: Automatic Summarization of Results from Clinical Trials. In: BIBM, pp. 372–377 (2011)

    Google Scholar 

  23. Kiritchenko, S., Bruijn, B., Carini, S., Martin, J., Sim, I.: Exact: automatic extraction of clinical trial characteristics from journal publications. BMC Med. Inform. Decis. Mak. 10(1), 56 (2010)

    Article  Google Scholar 

  24. Zadeh, L.A.: The Concept of a Linguistic Variable and Its Application to Approximate Reasoning1. Information Sciences 8, 199–249 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  25. Neto, L., Santos, A.D., Kaestner, C.A.A., Freitas, A.A.: Document Clustering and Text Summarization. In: 4th Int. Conf. Practical Applications of Knowledge Discovery and Data Mining (PADD 2000), pp. 41–55. The Practical Application Company, London (2000)

    Google Scholar 

  26. Salton, G., Buckley, C.: Term-weighting Approaches in Automatic Text Retrieval. Information Processing and Management 24, 513–523 (1988); Reprinted in: Sparck Jones, K., Willet, P.: Readings in Information Retrieval, pp. 323–328. Morgan Kaufmann (1997)

    Google Scholar 

  27. Jaccard, P.: Etude comparative de la distribution florale dans une portion des Alpes et des Jura. Bulletin de la Soci. Vaudoise des Sciences Naturelles 37, 547–579 (1901)

    Google Scholar 

  28. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)

    Book  MATH  Google Scholar 

  29. Dhillon, I., Mallela, S., Kumar, R.: Enhanced word clustering for hierarchical classfication. In: Proc. of 8th ACM Intl. Conf. on Knowledge Discovery and Data Mining (2002)

    Google Scholar 

  30. Kullback, S., Leibler, R.A.: On Information and Sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, S., Belkasim, S., Zhang, Y. (2013). Multi-document Text Summarization Using Topic Model and Fuzzy Logic. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2013. Lecture Notes in Computer Science(), vol 7988. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39712-7_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-39712-7_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-39711-0

  • Online ISBN: 978-3-642-39712-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics