Skip to main content

Multiple Document Summarization Using Text-Based Keyword Extraction

  • Conference paper
  • First Online:
  • 1220 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 436))

Abstract

The main focus of the paper is on the comparison between the proposed methodology keyword-based text extraction using threading and synchronization just like multiple files input as batch processing and previously used technologies for text extraction from research papers. Keyword-based summary is defined as selecting important sentences from actual text. Text summarization is the condensed form of any type of document whether pdf, doc, or txt files but this condensed form should preserve complete information and meaningful text with the help of single input file and multiple input file. It is not an easy task for human being to maintain the summary of large number of documents. Various text summarizations and text extraction techniques are being explained in this paper. Our proposed technique creates the summary by extracting sentences from the original document with the font type and pdf font or keyword extractor.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Mendeley is a desktop and web program for managing and sharing research papers, discovering research data and collaborating online

    Google Scholar 

  2. Accurate Information Extraction from Research Papers using Conditional Random Fields

    Google Scholar 

  3. Lin, C.-J., Lin, Y.-I.: Text mining techniques for patent analysis. Int. J. Inf. Proc. Manag., ACM, USA, 43, 1216–1247 (2007)

    Google Scholar 

  4. Tu, Y.-N., Seng, J.-L.: Research intelligence involving information retrieval—an example of conferences and journals. Int. J. Expert Syst. Appl. 12151–12166 (2009)

    Google Scholar 

  5. Luhn, H.P.: The automatic creation of literature abstracts. Int. J. IBM J. Res. Dev., ACM, USA, vol. 2, pp. 159–165, 1958.

    Google Scholar 

  6. Edmundson, H.P.: New methods in automatic extracting. J. ACM, USA 16, 264–285 (1969)

    Google Scholar 

  7. Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: Proceedings of the 18th ACMSIGIR Conference on Research and Development in Information Retrieval, USA, pp. 68–73 (1995)

    Google Scholar 

  8. Mittendorf, E., Schauble, P.: Document and passage retrieval based on hidden markov models. In: Proceedings of the 17th ACM-SIGIR Conference on Research and Development in Information Retrieval, New York, pp. 318–327 (1994)

    Google Scholar 

  9. Brandow, R., Mitze, K., Rau, L.F.: Automatic condensation of electronic publications by sentence selection. In: International Journal on Information Processing and Management, ACM, USA, vol. 31, pp. 675–685 (1995)

    Google Scholar 

  10. Bookstein, A., Klein S.T., Raita, T.: Detecting content-bearing words by serial clustering. In: Proceedings of the 18th ACM-SIGIR Conference on Research and Development in Information Technology, New York, pp. 319–327 (1995)

    Google Scholar 

  11. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Proceedings of European Conference on Machine Learning, ACM, London, pp. 137–142 (1998)

    Google Scholar 

  12. Makrehchi, M., Kamel, M.: A fuzzy set approach to extracting keywords from abstracts. IEEE Int. Conf. Fuzzy Inf. 2, 528–532 (2004)

    Google Scholar 

  13. Alguliev, R., Aliguliyev, R.: Evolutionary algorithm for extractive text summarization. Int. J. Intell. Inf. Manag. 1 (2), 128–138 (2009).

    Google Scholar 

  14. Liao, S.-H., Chu, P.-H., Hsiao, P.-Y.: Data mining techniques and applications– A decade review from 2000 to 2011. J. Expert Syst. Appl., Elsevier 39, 11303–11311 (2012)

    Google Scholar 

  15. Saleem, O., Latif, S.: Information extraction from research papers by data integration and data validation from multiple header extraction sources. In: World Congress on Engineering and Computer Science (WCECS), San Francisco, USA (2012)

    Google Scholar 

  16. Lu, H., Zheng, X., Sun, X., Zhang, N.: Research on intelligent scientific research collaboration platform and taking journal intelligence system as example. In: International Conference on Service Operations and Logistics, and Informatics (SOLI), IEEE, Suzhou, pp. 138–143 (2012)

    Google Scholar 

  17. Kumar, Y.J., Salim, N.: Automatic multi document summarization approaches. Int. J. Comput. Sci.

    Google Scholar 

  18. Xie, W.-L., Li, Y.-M., Zhang, Y.: Applying information retrieval technology in analyzing the journals. In: Fourth International Conference on Emerging Intelligent Data and Web Technologies (EIDWT), Xi’an, pp. 88–94 (2013)

    Google Scholar 

  19. Beel, J., Langer, S., Genzmehr, M., Müller, C.: Docear’s PDF inspector: title extraction from PDF files. In: Proceedings of 13th ACM/IEEE-CS joint Conference on Digital Libraries, ACM, USA, pp. 443–444 (2013)

    Google Scholar 

  20. Yang, X., Lian, L.: A new data mining algorithm based on map reduce and Hadoop. Int. J. Signal Process. Image Process. Pattern Recogn. 7, 131–142 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deepak Motwani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media Singapore

About this paper

Cite this paper

Deepak Motwani, Saxena, A.S. (2016). Multiple Document Summarization Using Text-Based Keyword Extraction. In: Pant, M., Deep, K., Bansal, J., Nagar, A., Das, K. (eds) Proceedings of Fifth International Conference on Soft Computing for Problem Solving. Advances in Intelligent Systems and Computing, vol 436. Springer, Singapore. https://doi.org/10.1007/978-981-10-0448-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-0448-3_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-0447-6

  • Online ISBN: 978-981-10-0448-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics