Skip to main content

Graph Summarization Using Word Correlation Analysis on Large Set of Documents

  • Conference paper
  • First Online:
  • 989 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8505))

Abstract

As there are a lot of available documents in the Internet, it is impossible to manually extract their important information. In this paper, we propose a system for extracting important information automatically from huge volume of documents using word correlation analysis. Our system analyzes words’ occurrence and co-occurrence frequencies on several levels: sentence, paragraph, and document. And then, it performs three different analysis steps: occurrence frequency, adjacent correlation, and importance score analysis, to calculate the importance score of each word. Finally, it can extract keywords and store them in a graph structure. The benefits of using a graph structure were twofold. We could effectively manage the keywords and their connections; and it assisted us with the retrieval of relevant documents. Our preliminary experiment shows that our technique can be used for analyzing large set of documents well.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Carrier, B.D.: Defining digital forensic examination and analysis tool Using abstraction layers. Int. J. Digital Evidence 1(4) (2003). http://www.utica.edu/academic/institutes/ecii/publications/articles/A04C3F91-AFBB-FC13-4A2E0F13203BA980.pdf, DBLP, http://dblp.uni-trier.de

  2. Chowdhury, S., Landoni, M.: News aggregator services: user expectations and experience. Online Inf. Rev. 30(2), 100–115 (2006)

    Article  Google Scholar 

  3. Summly: Summly news aggregator (2014). http://summly.com/

  4. Inc., G.: Google news (2014). https://news.google.com/

  5. Wartena, C., Brussee, R., Slakhorst, W.: Keyword extraction using word co-occurrence. In: Proceedings of Seventh International Workshop on Text-based Information Retrieval, Bilbao, Spain, pp. 54–58 (2010)

    Google Scholar 

  6. Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Artif. Intell. Tools 13(1), 157–169 (2004)

    Article  Google Scholar 

  7. Hu, X., Wu, B.: Automatic keyword extraction using linguistic features. In: Proceedings of 6th ICDM Workshops, pp. 19–23 (2006)

    Google Scholar 

  8. Wikipedia: Steve jobs (2013). http://en.wikipedia.org/wiki/SteveJobs

  9. Website, T.B.C.: Steve jobs (2014). http://www.biography.com/people/steve-jobs-9354805

  10. Group, T.S.N.: Stanford corenlp (2013). http://nlp.stanford.edu/software/corenlp.shtml

  11. AlchemyAPI: Entity extraction api (2013). http://www.alchemyapi.com/api/entity-extraction/

  12. Project Gutenberg Organization: Free ebooks - Project Gutenberg (2013). http://www.gutenberg.org/dirs/

Download references

Acknowledgement

This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program (NIPA-2013-(H0301-13-1012)) supervised by the NIPA(National IT Industry Promotion Agency).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Putu Y. Kusmawan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kusmawan, P.Y., Kwon, J. (2014). Graph Summarization Using Word Correlation Analysis on Large Set of Documents. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43984-5_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43983-8

  • Online ISBN: 978-3-662-43984-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics