research-article

Domain-Specific Keyword Extraction Using Joint Modeling of Local and Global Contextual Semantics

Authors:

Muhammad Abulaish,

Mohd Fazil,

Mohammed J. ZakiAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 16, Issue 4

Article No.: 70, Pages 1 - 30

https://doi.org/10.1145/3494560

Published: 08 January 2022 Publication History

Get Access

Abstract

Domain-specific keyword extraction is a vital task in the field of text mining. There are various research tasks, such as spam e-mail classification, abusive language detection, sentiment analysis, and emotion mining, where a set of domain-specific keywords (aka lexicon) is highly effective. Existing works for keyword extraction list all keywords rather than domain-specific keywords from a document corpus. Moreover, most of the existing approaches perform well on formal document corpuses but fail on noisy and informal user-generated content in online social media. In this article, we present a hybrid approach by jointly modeling the local and global contextual semantics of words, utilizing the strength of distributional word representation and contrasting-domain corpus for domain-specific keyword extraction. Starting with a seed set of a few domain-specific keywords, we model the text corpus as a weighted word-graph. In this graph, the initial weight of a node (word) represents its semantic association with the target domain calculated as a linear combination of three semantic association metrics, and the weight of an edge connecting a pair of nodes represents the co-occurrence count of the respective words. Thereafter, a modified PageRank method is applied to the word-graph to identify the most relevant words for expanding the initial set of domain-specific keywords. We evaluate our method over both formal and informal text corpuses (comprising six datasets), and show that it performs significantly better in comparison to state-of-the-art methods. Furthermore, we generalize our approach to handle the language-agnostic case, and show that it outperforms existing language-agnostic approaches.

References

[1]

Muhammad Abulaish, Sielvie Sharma, and Mohd Fazil. 2019. A multi-attributed graph-based approach for text data modeling and event detection in Twitter. In Proceedings of the 11th International Conference on Communication Systems & Networks. IEEE Computer Society, 703–708.

Abstract

References

Cited By

Index Terms

Recommendations

Advanced Text Mining Methods for Bilingual Lexicon Extraction from Speciliazed Comparable Corpora

Keyword Extraction Using Word Co-occurrence

Extracting Domain-Dependent Semantic Orientations of Latent Variables for Sentiment Classification

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Full Text

HTML Format

Share

Share this Publication link

Share on social media

Affiliations