Monash University
Browse

Restricted Access

Reason: Access restricted by the author. A copy can be requested for private research and study by contacting your institution's library service. This copy cannot be republished

Reconciling linguistic, statistical and external knowledge-based processing for automated query expansion

thesis
posted on 2017-02-15, 23:46 authored by Selvaretnam, Bhawani
Users’ information needs are expressed in natural language and successful retrieval is very much dependent on the effective communication of the intended purpose. The varying expressions of user queries and the diverse vocabularies found in documents imply the query document vocabulary mismatch problem which is a major issue that affects the effectiveness of information retrieval systems. Query expansion techniques have been widely employed in order to resolve the vocabulary mismatch issue. Central to this thesis is the identification of the significant constituents that characterize the query intent and their enrichment through the addition of meaningful terms while still preserving the original query intent. This thesis explored three influential linguistic characteristics from the morphological, syntactic and semantic views towards the understanding and representation of a search goal in the process of query expansion. Linguistic characteristics of queries relating to non-compositional phrase detection, term dependency modeling, concept-role mapping and semantic disambiguation were identified as crucial factors to consider towards accurate perception and representation of search goals. Role-types were defined to represent the distinct roles of query terms and grammatical relations which depict semantic associations between adjacent and non-adjacent query terms are used to determine meaningful base terms and base pairs. The significant base terms and base pairs play an important role in the candidate expansion term pooling process to ensure only meaningful terms are extracted. In this thesis, a linguistic-based automated query expansion framework is proposed which incorporates specific term relationships in order to resolve the query-document vocabulary mismatch problem. For this purpose, four query expansion patterns that emulate user search behaviour were formulated and expansion terms which are statistically-collocated or lexical-semantically related were adopted in the expansion process. Terms in the final expanded query were weighted using an optimized weighting scheme which places emphasis on various query terms according to their role-type. The importance of the linguistic-characteristics in queries in establishing role-types which represent key and complementary terms in a query and the benefits of incorporating statistically-collocated and lexical-semantic term relationships in the query expansion process were validated through the observed improvements in retrieval effectiveness. Also, the varying structures (i.e. sentence or bag-of-words) of queries were highlighted through an automated query structure classification model. The retrieval performance of varying query lengths in baseline systems was analyzed to understand the impact of query length on retrieval effectiveness. Baseline systems reveal better performance for short queries compared to medium length and long queries whilst the proposed linguistic-based query expansion framework proves to be effective regardless of query length.

History

Campus location

Australia

Principal supervisor

Christopher Hugh Messom

Year of Award

2013

Department, School or Centre

Information Technology (Monash University Malaysia)

Course

Doctor of Philosophy

Degree Type

DOCTORATE

Faculty

Faculty of Information Technology

Usage metrics

    Faculty of Information Technology Theses

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC