Abstract:
Document processing and understanding is important for a variety of applications such as office automation, creation of electronic manuals, online documentation and annot...Show MoreMetadata
Abstract:
Document processing and understanding is important for a variety of applications such as office automation, creation of electronic manuals, online documentation and annotation etc. The first step towards this process often involves the extraction of relevant keywords and phrases from the documents so that they can be automatically hyperlinked within and outside the document so that we can create an electronic document. This paper describes a novel method for extracting anchorable information units (AIUs), also known as hotspots from any type of portable document format (PDF) files, which may either be created using either an editor or by scanning in documents. The AIUs are used to make these documents more intelligent for content cross-referencing to/from related multimedia documents within an electronic document publishing environment. Domain specific knowledge about the documents are used to aid the extraction process. Once the location and extent of the texts are found, the content is extracted through the use of an optical character recognition (OCR) software if necessary. For the case of object extraction for highlighting, first the images are extracted and then a variety of image processing algorithms are applied.
Published in: 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)
Date of Conference: 06-09 July 2003
Date Added to IEEE Xplore: 18 August 2003
Print ISBN:0-7803-7965-9