skip to main content
10.1145/2034691.2034731acmconferencesArticle/Chapter ViewAbstractPublication PagesdocengConference Proceedingsconference-collections
research-article

Automatic text summarization and small-world networks

Published: 19 September 2011 Publication History

Abstract

Automatic text summarization is an important and challenging problem. Over the years, the amount of text available electronically has grown exponentially. This growth has created a huge demand for automatic methods and tools for text summarization. We can think of automatic summarization as a type of information compression. To achieve such compression, better modelling and understanding of document structures and internal relations is required. In this article, we develop a novel approach to extractive text summarization by modelling texts and documents as small-world networks. Based on our recent work on the detection of unusual behavior in text, we model a document as a one-parameter family of graphs with its sentences or paragraphs defining the vertex set and with edges defined by Helmholtz's principle. We demonstrate that for some range of the parameters, the resulting graph becomes a small-world network. Such a remarkable structure opens the possibility of applying many measures and tools from social network theory to the problem of extracting the most important sentences and structures from text documents. We hope that documents will be also a new and rich source of examples of complex networks.

References

[1]
A. Balinsky, H. Balinsky, and S. Simske. On the Helmholtz principle for data mining. In Proceedings of 2011 International Conference on Knowledge Discovery, Chengdu, China, April 2011.
[2]
A. Balinsky, H. Balinsky, and S. Simske. On Helmholtz principle for documents processing. In Proc. of the 10th ACM symposium on Document engineering, 2010.
[3]
B. Corominas-Murtra, J. Fortuny, and R. V. Solé. Emergence of zipf's law in the evolution of communication. Phys. Rev. E, 83(3):036115, Mar 2011.
[4]
G. Erkan and D. R. Radev. LexRank: graph-based lexical centrality as salience in text summarization. J. Artif. Int. Res., 22:457--479, December 2004.
[5]
K. S. Jones. Automatic summarising: The state of the art. Information Processing and Management, 43:1449--1481, 2007.
[6]
J. Kleinberg. Navigation in a small world. Nature, 406:845, 2000.
[7]
J. Kleinberg. The small-world phenomenon: An algorithmic perspective. In Proc. 37th Annual ACM Symposium on Theory of Computing, pages 163--170, 2000.
[8]
J. Kleinberg and D. Easley. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.
[9]
S. Lattanzi and D. Sivakumar. Affiliation networks. In STOC '09 Proceedings of the 41st annual ACM symposium on Theory of computing, 2009.
[10]
J. Leskovec, D. Chakrabarti, J. Kleinberg, C. Faloutsos, and Z. Ghahramani. Kronecker graphs: An approach to modeling networks. Journal of Machine Learning Research, 11:985--1042, February 2010.
[11]
J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: Densification laws, shrinking diameters and possible explanations. In ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD), 2005.
[12]
J. Leskovec and J. Shawe-Taylor. Semantic text features from small world graphs. In Subspace, Latent Structure and Feature Selection techniques: Statistical and Optimization perspectives Workshop, Slovenia, 2005.
[13]
Y. Matsuo, Y. Ohsawa, and M. Ishizuka. A document as a small world. Lecture Notes in Computer Science, 2253:444--448, 2001.
[14]
M. Newman. Models of the small world. J. Stat. Phys., 101:819, 2000.
[15]
M. Newman. The structure and function of complex networks. SIAM Rev., 45:167--256, 2003.
[16]
M. Newman. Networks: An Introduction. Oxford University Press, 2010.
[17]
D. R. Radev, E. Hovy, and K. McKeown. Introduction to the special issue on summarization. Computational Linguistics, 28(4):399--408, December 2002.
[18]
G. Salton, A. Singhal, M. Mitra, and C. Buckley. Automatic text structuring and summarization. Inf. Process. Manage., 33:193--207, March 1997.
[19]
D. Watts and S. Strogatz. Collective dynamics of small-world networks. Nature, 393:440, 1998.

Cited By

View all
  • (2023)Survey on the Biomedical Text Summarization Techniques with an Emphasis on Databases, Techniques, Semantic Approaches, Classification Techniques, and Similarity MeasuresSustainability10.3390/su1505421615:5(4216)Online publication date: 26-Feb-2023
  • (2021)Text structuring methods based on complex network: a systematic reviewScientometrics10.1007/s11192-020-03785-y126:2(1471-1493)Online publication date: 3-Jan-2021
  • (2020)Big Data ProcessingApplications and Approaches to Object-Oriented Software Design10.4018/978-1-7998-2142-7.ch005(111-132)Online publication date: 2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DocEng '11: Proceedings of the 11th ACM symposium on Document engineering
September 2011
296 pages
ISBN:9781450308632
DOI:10.1145/2034691
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 September 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. helmholtz principle
  2. small world network
  3. text summarization
  4. unusual behavior detection

Qualifiers

  • Research-article

Conference

DocEng '11
Sponsor:
DocEng '11: ACM Symposium on Document Engineering
September 19 - 22, 2011
California, Mountain View, USA

Acceptance Rates

Overall Acceptance Rate 194 of 564 submissions, 34%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Survey on the Biomedical Text Summarization Techniques with an Emphasis on Databases, Techniques, Semantic Approaches, Classification Techniques, and Similarity MeasuresSustainability10.3390/su1505421615:5(4216)Online publication date: 26-Feb-2023
  • (2021)Text structuring methods based on complex network: a systematic reviewScientometrics10.1007/s11192-020-03785-y126:2(1471-1493)Online publication date: 3-Jan-2021
  • (2020)Big Data ProcessingApplications and Approaches to Object-Oriented Software Design10.4018/978-1-7998-2142-7.ch005(111-132)Online publication date: 2020
  • (2019)Graph-Based Semantic Learning, Representation and Growth from Text: A Systematic Review2019 IEEE 13th International Conference on Semantic Computing (ICSC)10.1109/ICOSC.2019.8665592(118-123)Online publication date: Jan-2019
  • (2018)Frequent Itemsets as Meaningful Events in Graphs for Summarizing Biomedical Texts2018 8th International Conference on Computer and Knowledge Engineering (ICCKE)10.1109/ICCKE.2018.8566651(135-140)Online publication date: Oct-2018
  • (2018)Different approaches for identifying important concepts in probabilistic biomedical text summarizationArtificial Intelligence in Medicine10.1016/j.artmed.2017.11.00484(101-116)Online publication date: Jan-2018
  • (2016)Analytics, challenges and applications in big data environment: a surveyJournal of Management Analytics10.1080/23270012.2016.11865783:3(206-239)Online publication date: Jul-2016
  • (2015)A novel classifier based on meaning for text classification2015 International Symposium on Innovations in Intelligent SysTems and Applications (INISTA)10.1109/INISTA.2015.7276788(1-5)Online publication date: Sep-2015
  • (2015)Networking Big Data: Definition, Key Technologies and Challenging Issues of TransmissionBig Data Computing and Communications10.1007/978-3-319-22047-5_9(103-112)Online publication date: 24-Jul-2015
  • (2014)On automatic text segmentationProceedings of the 2014 ACM symposium on Document engineering10.1145/2644866.2644874(73-80)Online publication date: 16-Sep-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media