ABSTRACT
The concept of thumbnails is common in image representation. A thumbnail is a highly compressed version of an image that provides a small, yet complete visual representation to the human eye. We propose the adaptation of the concept of thumbnails to the domain of documents, whereby a thumbnail of any document can be generated from its semantic content, providing an adequate amount of information about the documents. However, unlike image thumbnails, document thumbnails are mainly for the consumption of software such as search engines, and other content processing systems. With the advent of the semantic web, the requirement for machine processing of documents has become extremely important. We give particular attention to electronic documents in XML and in RDF/XML, with a view towards the processing of documents in the semantic web.
- Adobe Systems, San Jose, CA, USA. Adobe Reader 6.0 for Windows and Macintosh User Manual, 2003.Google Scholar
- T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific American, May 2001.Google ScholarCross Ref
- M. W. Berry. Survey of Text Mining: Clustering, Classification, and Retrieval. Springer-Verlag New York, inc., New York, NY, 2004. Google ScholarDigital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998. Google ScholarDigital Library
- M. Cannataro, G. Carelli, A. Pugliese, and D. Sacca. Semantic lossy compression of XML data. In Knowledge Representation Meets Databases, 2001.Google Scholar
- M. Dalkilic and J. Costello. BioKnOT: Biological knowledge through ontologies and TFIDF. In Proceedings, Workshop on Search and Discovery in Bioinformatics, SIGIR-Bio, 2004.Google Scholar
- Y. Fu and J. Mostafa. Toward information retrieval web services for digital libraries. JCDL, June 2004. Google ScholarDigital Library
- J. Goldstein, M. Kantrowitz, V. O. Mittal, and J. G. Carbonell. Summarizing text documents: Sentence selection and evaluation metrics. In Research and Development in Information Retrieval, pages 121--128, 1999. Google ScholarDigital Library
- I. Korf, M. Yandell, and J. Bedell. Blast. O'Reilly & Associates, 2003. Google ScholarDigital Library
- H. Liefke and D. Suciu. XMill: an efficient compressor for XML data. In Proceedings, ACM SIGMOD 2000, SIGMOD RECORD 29(2), pages 153--164, 2000. Google ScholarDigital Library
- C.-Y. Lin and E. Hovy. From single to multi-document summarization: a prototype system and its evaluation. In Proceedings of the 40th Anniversity Meeting of the Association for Computational Linguistics (ACL-02), Philadelphia, PA, USA, 2002. Google ScholarDigital Library
- K. McKeown, R. Barzilay, D. Evans, et al. Columbia multi-document summarization: Approach and evaluation. In Proceedings of the Workshop of Text Summarization, ACM SIGIR 2001, 2001.Google Scholar
- W. Ogden. Getting information from documents you cannot read: An interactive cross-language text retrieval and summarization system, 1999.Google Scholar
- W. Ogden, J. Cowie, M. Davis, E. Ludovik, S. Nirenburg, H. Molina-Salgado, and N. Sharples. Keizai: An interactive cross-language text retrieval system.Google Scholar
- W. C. Ogden and M. W. Davis. Improving cross-language text retrieval with human interactions. In HICSS, 2000. Google ScholarDigital Library
- W. C. Ogden, M. W. Davis, and S. Rice. Document thumbnail visualization for rapid relevance judgments: When do they pay off? In Text REtrieval Conference, pages 528--534, 1998.Google Scholar
- G. Salton. Developments in automatic text retrieval. Science, 253:974--980, 1991.Google ScholarCross Ref
- G. Salton, J. Allan, C. Buckley, and A. Singhal. Automatic analysis, term generation and summarization of machine readable texts. Science, 264:1421--1426, June 1994.Google ScholarCross Ref
- G. Salton and C. Yang. On the specification of term values in automatic indexing. Journal of Documentation, 29:351--372, April 1973.Google ScholarCross Ref
- B. Suh, A. Woodruff, R. Rosenholtz, and A. Glass. Popout prism: Adding perceptual principles to overview+detail document interfaces, 2002.Google Scholar
- P. Tolani and J. R. Haritsa. XGRIND: A query-friendly XML compressor. In ICDE, 2002.Google ScholarCross Ref
- T. Welch. A technique for high-performance data compression. IEEE Computer, 17(6):8--19, 1984.Google ScholarDigital Library
Index Terms
- Semantic thumbnails: a novel method for summarizing document collections
Recommendations
Semantic SenseLab: Implementing the vision of the Semantic Web in neuroscience
Objective: Integrative neuroscience research needs a scalable informatics framework that enables semantic integration of diverse types of neuroscience data. This paper describes the use of the Web Ontology Language (OWL) and other Semantic Web ...
Semantic lenses to bring digital and semantic publishing together
LISC'14: Proceedings of the 4th International Conference on Linked Science - Volume 1282Modern scholarly publishers are making steps towards semantic publishing, i.e. the use of Web and Semantic Web technologies to represent formally the meaning of a published document by specifying information about it as metadata and to publish them as ...
Streaming thumbnails: combining low resolution navigation and RSVP displays
CHI EA '00: CHI '00 Extended Abstracts on Human Factors in Computing SystemsWe introduce Streaming Thumbnails (STs), which combine RSVP displays and low-resolution thumbnails to enable reading complex documents in very limited areas. STs improve browsing because detailed textual information can be accessed from a thumbnail ...
Comments