A Method for Semantic Relatedness Based Query Focused Text Summarization

Rahman, Nazreena; Borah, Bhogeswar

doi:10.1007/978-3-319-69900-4_49

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10597))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

2650 Accesses
1 Citations
3 Altmetric

Abstract

In this paper, a semantic relatedness based query focused text summarization technique is introduced to find relevant information from single text document. This semantic relatedness measure extracts the related sentences according to the query. The query focused text summarization approach can work on short query when the query does not contain enough information. Better summaries are produced by this method with increased number of query related sentences included. Experiments and evaluation are done on DUC 2005 and 2006 datasets and results show significant performance.

You have full access to this open access chapter, Download conference paper PDF

Query-Based Extractive Text Summarization Using Sense-Oriented Semantic Relatedness Measure

Article 18 August 2023

Nazreena Rahman & Bhogeswar Borah

Improvement of query-based text summarization using word sense disambiguation

Article Open access 20 July 2019

Nazreena Rahman & Bhogeswar Borah

Query-based multi-documents summarization using linguistic knowledge and content word expansion

Article 23 September 2015

Asad Abdi, Norisma Idris, … Ramiz M. Aliguliyev

Keywords

1 Introduction

Text summarization finds information rich sentences for readers. The research area of text summarization is increasingly becoming popular due to the availability of huge amount of information. Text summarization presents the significant content to minimizing time and cost. It is considerably different from human summarization. Human summary can include significantly rich content and themes which is very difficult to include in case of automatic text summary. To find out the linguistic meaning of words and relations with other words, semantic measure is applied. Text summarization can be generic or user focused; generic summary summarizes the important content and query focused summary gives the summary specifically for user’s interest. Extractive and abstractive methods are used to make summary. Abstractive method needs reformulation of sentences while extractive method extracts the sentences present in input text documents [1]. Here, we propose one semantic relatedness based text summarization method to extract semantically related sentences with the query.

Luhn in 1958 [2] first introduced text summarization by finding significant words from a text. Significant words are found by calculating the occurrence of a word in a text file. Based on the presence of significant words, sentences are ranked and extracted for summarization. In some recent approaches, Abadi et al. [3] (2015) used linguistic knowledge and expansion of content words. Content words includes noun, verb, adjective and adverb. The method finds semantic similarity between the content words along with the word-order similarity. Finally, they used combination model to select relevant sentences to the input query and also the sentences which are semantically very similar to the other high scoring sentences. We introduce semantic relatedness based query focused text summarization (SRQ) method to get well-defined summary according to the user’s need. This SRQ method can work when the query words are not present in the input text. Present method can also perform when the query is short or does not contain enough information.

2 Proposed Semantic Relatedness Based Query Focused Text Summarization (SRQ Method)

Semantic relatedness measure: On the basis of semantic relatedness measure, important sentences are selected for summary purpose. In linguistics, semantics is the study of meaning and semantic relatedness gives the measure of how two words are related to each other. It is different from semantic similarity measure. Semantic similarity gives the measure of alikeness of two words or concepts and semantic relatedness gives more general concept than semantic similarity. For example, hand and finger are not semantically similar but they are semantically related. To find semantic relatedness between content words, WordNet is used. WordNet is a database used to find semantic relations (Miller 1998) [4] for English words. WordNet contains semantic network that defines different relations for content words. The following Table 1 gives different semantic relations for each content word present in WordNet database.

Table 1. Different semantic relations in WordNet

Full size table

Hirst and St-Onge (HSO) [5] proposed one path based semantic relatedness measure using WordNet. Two words can be related in many ways like ‘is-a’, ‘part-of’, ‘member-of’ relations. For example, in Wordnet, hand and fingers are semantically related with ‘part-of’ relation. Semantic relatedness between two words includes all types of relations that are present in WordNet and finds the shortest path from the various semantic networks. They find the semantic relation between two content words by measuring the shortest path between them along with number of changes of direction in the shortest path. The following Fig. 1 shows the ‘is-a’ relation where shortest path and number of changes of direction between two words are (Hemorrhagic_fever and Respiratory_tract_infection) as found in WordNet:

Semantic relatedness between two words: Initially, pre-process the content words by doing stemming. The required method for finding semantic relatedness between two words is given in Eq. 1.

$$\begin{aligned} \begin{aligned} Score\, {(w1, w2)} = 2*c- path\, length\, between\, w1\, and\, w2- k \,*number\, of\,\\ direction\, changes\, between\,w1\, and\, w2 \end{aligned} \end{aligned}$$

(1)

Here, $c=8$ and $k=1$ are considered as constants. If two words are same then the maximum semantic relatedness value of HSO will be 16 and minimum value is 0 [6]. We tested semantic relatedness score with different threshold values. Based on performance, the method uses average or higher semantic relatedness score by taking the threshold value as 8.

Semantic relatedness between two sentences: To find out the semantic related two sentences, semantic relatedness is calculated for each of the content word of the first sentence $S_{1}$ with all the content words present in the second sentence $S_{2}$ and the maximum score is taken. After finding score for every word in the sentence $S_{1}$ with the words in $S_{2}$, we take maximum score as the score for $S_{1}$. The method to find semantic relatedness for the sentence $S_{1}$ with respect to $S_{2}$ is given in Eq. 2:

$$\begin{aligned} \begin{aligned} Score\, {(S_{1}, S_{2})}=\max \limits _{w1\in S_{1}, w2\in S_{2}}(score\,{(w1, w2)}) \end{aligned} \end{aligned}$$

(2)

Important sentence selection: Now, in query focused text summarization, we have a query with input text documents. Before applying semantic relatedness in SRQ method, we give priority to the sentences on the basis of following nine criteria to be considered as important sentences for the text summarization purpose. Semantic relatedness is calculated only for the important sentences.

Title Word Matching: If the words present in a sentence also occur in the title or heading of a text document, then that sentence can be considered as an important one.

Proper Noun: Proper noun or entity name gives more importance to a sentence. Hence, we take out the proper noun containing sentences.

Numerical Data: Presence of numerical data in a sentence always contains rich information.

Thematic Word: Thematic word means word that occur in a text file more frequently. Presence of thematic word makes the sentence important. We find top ten most frequent words from the text file and take out those sentences where any thematic word is present.

Noun Phrase: Presence of noun phrases in a sentence makes the sentence important. The method uses chunkparser to find noun phrases [7].

Font-based Word: Sentences containing words appearing as uppercase, bold, italics or underlined fonts are normally considered as more meaningful.

Cue Phrase: Sentences containing any cue phrase such as in conclusion, this letter, this report, summary, argue, purpose, development are most likely to be in summary.

Sentence Length: It is considered as longer sentence contains more information.

Sentence Position: Important sentences are usually present at the first and the last of the paragraph. We consider the first and the last sentences from paragraphs.

Semantic relatedness is calculated between the input text title $(S_{t})$ and an important sentence $(S_{i})$ present in input text document by using Eqs. 1 and 2. Again semantic relatedness is measured between query $(S_{q})$ and an important sentence $(S_{i})$ using the same Eqs. 1 and 2. We will consider those sentences where score is equal or above the defined threshold value.

Extracting Summary: To create the summary, common sentences are obtained from calculating semantic relatedness between text title and important sentences ($score\, {(S_{t},S_{i})}$) and query and important sentences ($score\,{(S_{q},S_{i})}$). To find out the set of sentences related to the title, the method uses Eq. 3.

$$\begin{aligned} T= { \{ s \mid s\in S_{i}, \, score\, {(S_{t},S_{i})}\ge 8\}} \end{aligned}$$

(3)

Similarly, to find out the set of sentences related to the query, the method uses Eq. 4.

$$\begin{aligned} Q={ \{s \mid s\in S_{i}, \, score\, {(S_{q},S_{i})} \ge 8 \}} \end{aligned}$$

(4)

Finally, summary can be found using the following method:

$$\begin{aligned} \begin{aligned} Summary\, _{sentences}= T \cap \, Q \end{aligned} \end{aligned}$$

(5)

3 Experiments

We use DUC 2005 and DUC 2006 datasets (http://duc.nist.gov), where each topic contains a query and a set of input text documents. Each text document contains newspaper or newswire information in English. DUC 2005 and 2006 datasets are particularly used for query-based text summarization purpose. Queries are based on real world complex questions, where answers not only contain date, name or quantity. Here, each dataset contains 50 documents and length of each summary has been restricted to 250 words only.

To evaluate the performance of SRQ method with other existing methods, ROUGE toolkit [8] is used. ROUGE compares similarity between candidate summary and reference summary. Candidate summary means summary produced from different methods and reference summary comes from DUC datasets. This ROUGE consists of set of metrics, such as ROUGE-N (n-gram co-occurrence statistics), ROUGE-L (longest common subsequence), ROUGE-W (weighted longest common subsequence), ROUGE-S (skip-bigram co-occurrence statistics) and ROUGE-SU4 (skip-bigram based on maximum skip distance of 4, plus unigram). We compare our results with top-performing DUC 2005 and 2006 systems where systems have done their experiments particularly for query-based text summarization. Here, recall value of ROUGE-1 (unigram-based), ROUGE-2 (bigram-based) and ROUGE-SU4 are used for our experiment purpose. The following Figs. 2 and 3 shows the comparison of different ROUGE values of existing systems with SRQ method and finds that SRQ performs well in comparison with these existing systems.

4 Conclusion and Future Work

The paper has presented a query focused text summarization method based on semantic relatedness. This SRQ method performs well for short query. The method is tested with different participating methods in DUC 2005 and DUC 2006 and gives better results. In future we can incorporate effective redundancy removal technique to get more query relevance and information rich summary.

References

Damova, M., Koychev, I.: Query-based summarization: a survey (2010)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Abdi, A., Idris, N., Alguliyev, R.M., Aliguliyev, R.M.: Query-based multi-documents summarization using linguistic knowledge and content word expansion. Soft Comput. 21(7), 1–17 (2015)
Google Scholar
Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to wordnet: an on-line lexical database. Int. J. Lexicography 3(4), 235–244 (1990)
Article Google Scholar
Hirst, G., St-Onge, D., et al.: Lexical chains as representations of context for the detection and correction of malapropisms. In: WordNet: An Electronic Lexical Database, vol. 305, pp. 305–332 (1998)
Google Scholar
Patwardhan, S., Banerjee, S., Pedersen, T.: Using measures of semantic relatedness for word sense disambiguation. In: Gelbukh, A. (ed.) CICLing 2003. LNCS, vol. 2588, pp. 241–257. Springer, Heidelberg (2003). doi:10.1007/3-540-36456-0_24
Chapter Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media Inc., Sebastopol (2009)
MATH Google Scholar
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-2004 Workshop, vol. 8, Barcelona, Spain (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Assam Kaziranga University, Jorhat, 785006, Assam, India
Nazreena Rahman
Department of Computer Science and Engineering, Tezpur University, Sonitpur, 784028, Assam, India
Bhogeswar Borah

Authors

Nazreena Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Bhogeswar Borah
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nazreena Rahman .

Editor information

Editors and Affiliations

Indian Statistical Institute, Kolkata, India
B. Uma Shankar
Indian Statistical Institute, Kolkata, India
Kuntal Ghosh
Indian Statistical Institute, Kolkata, India
Deba Prasad Mandal
Indian Statistical Institute, Kolkata, India
Shubhra Sankar Ray
The Hong Kong Polytechnic University, Hong Kong, China
David Zhang
Indian Statistical Institute, Kolkata, India
Sankar K. Pal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rahman, N., Borah, B. (2017). A Method for Semantic Relatedness Based Query Focused Text Summarization. In: Shankar, B., Ghosh, K., Mandal, D., Ray, S., Zhang, D., Pal, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2017. Lecture Notes in Computer Science(), vol 10597. Springer, Cham. https://doi.org/10.1007/978-3-319-69900-4_49

Download citation

DOI: https://doi.org/10.1007/978-3-319-69900-4_49
Published: 01 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69899-1
Online ISBN: 978-3-319-69900-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

A Method for Semantic Relatedness Based Query Focused Text Summarization

Abstract

Similar content being viewed by others

Query-Based Extractive Text Summarization Using Sense-Oriented Semantic Relatedness Measure

Improvement of query-based text summarization using word sense disambiguation

Query-based multi-documents summarization using linguistic knowledge and content word expansion

Keywords

1 Introduction

2 Proposed Semantic Relatedness Based Query Focused Text Summarization (SRQ Method)

3 Experiments

4 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

A Method for Semantic Relatedness Based Query Focused Text Summarization

Abstract

Similar content being viewed by others

Query-Based Extractive Text Summarization Using Sense-Oriented Semantic Relatedness Measure

Improvement of query-based text summarization using word sense disambiguation

Query-based multi-documents summarization using linguistic knowledge and content word expansion

Keywords

1 Introduction

2 Proposed Semantic Relatedness Based Query Focused Text Summarization (SRQ Method)

3 Experiments

4 Conclusion and Future Work

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation