Abstract
As there are a lot of available documents in the Internet, it is impossible to manually extract their important information. In this paper, we propose a system for extracting important information automatically from huge volume of documents using word correlation analysis. Our system analyzes words’ occurrence and co-occurrence frequencies on several levels: sentence, paragraph, and document. And then, it performs three different analysis steps: occurrence frequency, adjacent correlation, and importance score analysis, to calculate the importance score of each word. Finally, it can extract keywords and store them in a graph structure. The benefits of using a graph structure were twofold. We could effectively manage the keywords and their connections; and it assisted us with the retrieval of relevant documents. Our preliminary experiment shows that our technique can be used for analyzing large set of documents well.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Carrier, B.D.: Defining digital forensic examination and analysis tool Using abstraction layers. Int. J. Digital Evidence 1(4) (2003). http://www.utica.edu/academic/institutes/ecii/publications/articles/A04C3F91-AFBB-FC13-4A2E0F13203BA980.pdf, DBLP, http://dblp.uni-trier.de
Chowdhury, S., Landoni, M.: News aggregator services: user expectations and experience. Online Inf. Rev. 30(2), 100–115 (2006)
Summly: Summly news aggregator (2014). http://summly.com/
Inc., G.: Google news (2014). https://news.google.com/
Wartena, C., Brussee, R., Slakhorst, W.: Keyword extraction using word co-occurrence. In: Proceedings of Seventh International Workshop on Text-based Information Retrieval, Bilbao, Spain, pp. 54–58 (2010)
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. Int. J. Artif. Intell. Tools 13(1), 157–169 (2004)
Hu, X., Wu, B.: Automatic keyword extraction using linguistic features. In: Proceedings of 6th ICDM Workshops, pp. 19–23 (2006)
Wikipedia: Steve jobs (2013). http://en.wikipedia.org/wiki/SteveJobs
Website, T.B.C.: Steve jobs (2014). http://www.biography.com/people/steve-jobs-9354805
Group, T.S.N.: Stanford corenlp (2013). http://nlp.stanford.edu/software/corenlp.shtml
AlchemyAPI: Entity extraction api (2013). http://www.alchemyapi.com/api/entity-extraction/
Project Gutenberg Organization: Free ebooks - Project Gutenberg (2013). http://www.gutenberg.org/dirs/
Acknowledgement
This research was supported by the MKE(The Ministry of Knowledge Economy), Korea, under the ITRC(Information Technology Research Center) support program (NIPA-2013-(H0301-13-1012)) supervised by the NIPA(National IT Industry Promotion Agency).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kusmawan, P.Y., Kwon, J. (2014). Graph Summarization Using Word Correlation Analysis on Large Set of Documents. In: Han, WS., Lee, M., Muliantara, A., Sanjaya, N., Thalheim, B., Zhou, S. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science(), vol 8505. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43984-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-662-43984-5_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43983-8
Online ISBN: 978-3-662-43984-5
eBook Packages: Computer ScienceComputer Science (R0)