Automatic extraction of non-textual information in web document and their classification | IEEE Conference Publication | IEEE Xplore

Automatic extraction of non-textual information in web document and their classification


Abstract:

This paper deals with research in the area of automatic extraction of textual and non-textual information and their classification. The main idea is to create a robust me...Show More

Abstract:

This paper deals with research in the area of automatic extraction of textual and non-textual information and their classification. The main idea is to create a robust method for extraction of image and textual segments to obtain short web document. Thus, developed method consist of two data types extractions, where both image and text data extraction are using Document Object Model tree. Extracted objects are saved in separate databases followed the images analysis that define and describe image object from semantic point of view. Moreover, the semantic description of all modal objects are utilized to short web document creation. To accurate object classification, the fast and powerful hybrid segmentation algorithm based on Mean Shift and Believe Propagation principles are mentioned in this paper, too. Likewise, the image segmentation algorithm was integrated with SIFT descriptor. Finally, in order to obtain a semantic description of objects in static image, the SVM classification is used. The developed method was tested on real unsegmented and segmented images, too.
Date of Conference: 03-04 July 2012
Date Added to IEEE Xplore: 02 August 2012
ISBN Information:
Conference Location: Prague, Czech Republic

Contact IEEE to Subscribe

References

References is not available for this document.