Abstract:
The proposed work briefly describes an approach to automatically extract structured information from semi-structured documents to match the document creators and users in...Show MoreMetadata
Abstract:
The proposed work briefly describes an approach to automatically extract structured information from semi-structured documents to match the document creators and users in order to find the best similarities between them and connect them for further collaborations. The general idea is to employ a semantic annotation technique and similarity measurement approach by using the ontology to find best matches between web documents. The proposed approach uses ontologies to annotate the extracted information and for the measuring the similarity between each pair of documents. GATE (General Architecture for Text Engineering) as one of the most famous annotation tools has been utilized to annotate semi-structure documents. A novel algorithm is proposed to update the supported ontology for extraction purpose in GATE by using a training data set. Furthermore, specific domain-based metrics are also utilized to measure semantic similarities between documents with regard to semantic annotations which are implemented in an ontology-based approach. These metrics can be used in order to find the most similar web documents among documents corpus.
Date of Conference: 29 April 2012 - 02 May 2012
Date Added to IEEE Xplore: 22 October 2012
ISBN Information: