Reference Hub2
A Keyphrase-Based Tag Cloud Generation Framework to Conceptualize Textual Data

A Keyphrase-Based Tag Cloud Generation Framework to Conceptualize Textual Data

Muhammad Abulaish, Tarique Anwar
Copyright: © 2013 |Volume: 4 |Issue: 2 |Pages: 22
ISSN: 1947-9220|EISSN: 1947-9239|EISBN13: 9781466632905|DOI: 10.4018/jaras.2013040104
Cite Article Cite Article

MLA

Abulaish, Muhammad, and Tarique Anwar. "A Keyphrase-Based Tag Cloud Generation Framework to Conceptualize Textual Data." IJARAS vol.4, no.2 2013: pp.72-93. http://doi.org/10.4018/jaras.2013040104

APA

Abulaish, M. & Anwar, T. (2013). A Keyphrase-Based Tag Cloud Generation Framework to Conceptualize Textual Data. International Journal of Adaptive, Resilient and Autonomic Systems (IJARAS), 4(2), 72-93. http://doi.org/10.4018/jaras.2013040104

Chicago

Abulaish, Muhammad, and Tarique Anwar. "A Keyphrase-Based Tag Cloud Generation Framework to Conceptualize Textual Data," International Journal of Adaptive, Resilient and Autonomic Systems (IJARAS) 4, no.2: 72-93. http://doi.org/10.4018/jaras.2013040104

Export Reference

Mendeley
Favorite Full-Issue Download

Abstract

Tag clouds have become an effective tool to quickly perceive the most prominent terms embedded within textual data. Tag clouds help grasp the main theme of a corpus without exploring the pile of documents. However, the effectiveness of tag clouds to conceptualize text corpora is directly proportional to the quality of the tags. In this paper, the authors propose a keyphrase-based tag cloud generation framework. In contrast to existing tag cloud generation systems that use single words as tags and their frequency counts to determine the font size of the tags, the proposed framework identifies feasible keyphrases and uses them as tags. The font-size of a keyphrase is determined as a function of its relevance weight. Instead of using partial or full parsing, which is inefficient for lengthy sentences and inaccurate for the sentences that do not follow proper grammatical structure, the proposed method applies n-gram techniques followed by various heuristics-based refinements to identify candidate phrases from text documents. A rich set of lexical and semantic features are identified to characterize the candidate phrases and determine their keyphraseness and relevance weights. The authors also propose a font-size determination function, which utilizes the relevance weights of the keyphrases to determine their relative font size for tag cloud visualization. The efficacy of the proposed framework is established through experimentation and its comparison with the existing state-of-the-art tag cloud generation methods.

Request Access

You do not own this content. Please login to recommend this title to your institution's librarian or purchase it from the IGI Global bookstore.