To read this content please select one of the options below:

GeTFIRST: ontology-based keyword search towards semantic disambiguation

Hoang-Minh Nguyen (School of Computer Science and Engineering, International University - VNUHCM, Ho Chi Minh City, Viet Nam)
Hong-Quang Nguyen (School of Computer Science and Engineering, International University - VNUHCM, Ho Chi Minh City, Viet Nam)
Khoi-Nguyen Tran (School of Computer Science and Engineering, International University - VNUHCM, Ho Chi Minh City, Viet Nam)
Xuan-Vinh Vo (School of Computer Science and Engineering, International University - VNUHCM, Ho Chi Minh City, Viet Nam)

International Journal of Web Information Systems

ISSN: 1744-0084

Article publication date: 16 November 2015

230

Abstract

Purpose

This paper aims to improve the semantic-disambiguation capability of an information-retrieval system by taking advantages of a well-crafted classification tree. The unstructured nature and sheer volume of information accessible over networks have made it drastically difficult for users to seek relevant information. Many information-retrieval methods have been developed to address this problem, and keyword-based approach is amongst the most common approach. Such an approach is often inadequate to cope with the conceptualization associated with user needs and contents. This brings about the problem of semantic ambiguation that refers to the disagreement in meaning of terms between involving parties of a communication due to polysemy, leading to increased complexity and lesser accuracy in information integration, migration, retrieval and other related activities.

Design/methodology/approach

A novel ontology-based search approach, named GeTFIRST (short for Graph-embedded Tree Fostering Information Retrieval SysTem), is proposed to disambiguate keywords semantically. The contribution is twofold. First, a search strategy is proposed to prune irrelevant concepts for accuracy improvement using our Graph-embedded Tree (GeT)-based ontology. Second, a path-based ranking algorithm is proposed to incorporate and reward the content specificity.

Findings

An empirical evaluation was performed on United States Patent And Trademark Office (USPTO) patent datasets to compare our approach with full-text patent search approaches. The results showed that GeTFIRST handled the ambiguous keywords with higher keyword-disambiguation accuracy than traditional search approaches.

Originality/value

The search approach of this paper copes with the semantic ambiguation by using our proposed GeT-based ontology and a path-based ranking algorithm.

Keywords

Acknowledgements

This research is funded by International University VNUHCM under the grant number SV2014-IT-01.

Citation

Nguyen, H.-M., Nguyen, H.-Q., Tran, K.-N. and Vo, X.-V. (2015), "GeTFIRST: ontology-based keyword search towards semantic disambiguation", International Journal of Web Information Systems, Vol. 11 No. 4, pp. 442-467. https://doi.org/10.1108/IJWIS-06-2015-0019

Publisher

:

Emerald Group Publishing Limited

Copyright © 2015, Emerald Group Publishing Limited

Related articles