Abstract
We believe that in order to be useful a text summarisation technique must be domain dependent, in that the resulting summary must cover the important aspects and concepts specific to the subject matter’s domain. The main problem with a typical domain-dependent text summarisation technique is the cost of acquiring and hand-coding the required domain-specific knowledge into the system, e.g., in the form of phrase-structure templates. To solve this problem, we propose a solution which uses automatically retrieved sample documents as the source of the domain-specific knowledge, and extracts the knowledge in the form of keyterms. These keyterms represent the key aspects and concepts (terminology) relevant to the input document. The sample documents are retrieved from a collection, called base collection-containing documents of various topics, based on their similarity with the input document. The input document is then summarised by extracting a number of sentences containing the keyterms.
Our text summarisation technique is based on the statistical distribution of words among documents in the base collection, within individual documents, and among sentences in the input document. In particular, statistically-based formula are employed for scoring each of the candidate sample documents, keyterms, and key sentences. Our technique makes use of standard word or term distribution parameters that are commonly provided or can be easily obtained through the use of modern text retrieval systems.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yuwono, B., Adriani, M. (1998). Statistical Identification of Domain-Specific Keyterms for Text Summarisation. In: Nikolaou, C., Stephanidis, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 1998. Lecture Notes in Computer Science, vol 1513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49653-X_39
Download citation
DOI: https://doi.org/10.1007/3-540-49653-X_39
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-65101-7
Online ISBN: 978-3-540-49653-3
eBook Packages: Springer Book Archive