Graph Summarization Using Word Correlation Analysis on Large Set of Documents

Kusmawan, Putu Y.; Kwon, Joonho

doi:10.1007/978-3-662-43984-5_5

Graph Summarization Using Word Correlation Analysis on Large Set of Documents

Putu Y. Kusmawan²¹ &
Joonho Kwon²²

Conference paper
First Online: 01 January 2014

989 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8505))

Abstract

As there are a lot of available documents in the Internet, it is impossible to manually extract their important information. In this paper, we propose a system for extracting important information automatically from huge volume of documents using word correlation analysis. Our system analyzes words’ occurrence and co-occurrence frequencies on several levels: sentence, paragraph, and document. And then, it performs three different analysis steps: occurrence frequency, adjacent correlation, and importance score analysis, to calculate the importance score of each word. Finally, it can extract keywords and store them in a graph structure. The benefits of using a graph structure were twofold. We could effectively manage the keywords and their connections; and it assisted us with the retrieval of relevant documents. Our preliminary experiment shows that our technique can be used for analyzing large set of documents well.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Carrier, B.D.: Defining digital forensic examination and analysis tool Using abstraction layers. Int. J. Digital Evidence 1(4) (2003). http://www.utica.edu/academic/institutes/ecii/publications/articles/A04C3F91-AFBB-FC13-4A2E0F13203BA980.pdf, DBLP, http://dblp.uni-trier.de
Chowdhury, S., Landoni, M.: News aggregator services: user expectations and experience. Online Inf. Rev. 30(2), 100–115 (2006)
Article Google Scholar
Summly: Summly news aggregator (2014). http://summly.com/
Inc., G.: Google news (2014). https://news.google.com/
Wartena, C., Brussee, R., Slakhorst, W.: Keyword extraction using word co-occurrence. In: Proceedings of Seventh International Workshop on Text-based Information Retrieval, Bilbao, Spain, pp. 54–58 (2010)
Google Scholar
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Artif. Intell. Tools 13(1), 157–169 (2004)
Article Google Scholar
Hu, X., Wu, B.: Automatic keyword extraction using linguistic features. In: Proceedings of 6th ICDM Workshops, pp. 19–23 (2006)
Google Scholar
Wikipedia: Steve jobs (2013). http://en.wikipedia.org/wiki/SteveJobs
Website, T.B.C.: Steve jobs (2014). http://www.biography.com/people/steve-jobs-9354805
Group, T.S.N.: Stanford corenlp (2013). http://nlp.stanford.edu/software/corenlp.shtml
AlchemyAPI: Entity extraction api (2013). http://www.alchemyapi.com/api/entity-extraction/
Project Gutenberg Organization: Free ebooks - Project Gutenberg (2013). http://www.gutenberg.org/dirs/

Download references

Acknowledgement

This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program (NIPA-2013-(H0301-13-1012)) supervised by the NIPA(National IT Industry Promotion Agency).

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Pusan National University, Busan, South Korea
Putu Y. Kusmawan
Institute of Logistic Information and Technology, Pusan National University, Busan, South Korea
Joonho Kwon

Authors

Putu Y. Kusmawan
View author publications
You can also search for this author in PubMed Google Scholar
Joonho Kwon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Putu Y. Kusmawan .

Editor information

Editors and Affiliations

Pohang University of Science and Technology (POSTECH), Pohang, Korea, Republic of (South Korea)
Wook-Shin Han
National University of Singapore, Singapore, Singapore
Mong Li Lee
Udayana University, Badung, Indonesia
Agus Muliantara
Udayana University, Badung, Indonesia
Ngurah Agus Sanjaya
Christian-Albrechts-Universität zu Kiel Institut für Informatik, Kiel, Germany
Bernhard Thalheim
Fudan University, Shanghai, China
Shuigeng Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kusmawan, P.Y., Kwon, J. (2014). Graph Summarization Using Word Correlation Analysis on Large Set of Documents. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_5

Download citation

DOI: https://doi.org/10.1007/978-3-662-43984-5_5
Published: 11 July 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43983-8
Online ISBN: 978-3-662-43984-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics