skip to main content
10.1145/1026533.1026547acmconferencesArticle/Chapter ViewAbstractPublication PagesdocConference Proceedingsconference-collections
Article

Semantic thumbnails: a novel method for summarizing document collections

Published:10 October 2004Publication History

ABSTRACT

The concept of thumbnails is common in image representation. A thumbnail is a highly compressed version of an image that provides a small, yet complete visual representation to the human eye. We propose the adaptation of the concept of thumbnails to the domain of documents, whereby a thumbnail of any document can be generated from its semantic content, providing an adequate amount of information about the documents. However, unlike image thumbnails, document thumbnails are mainly for the consumption of software such as search engines, and other content processing systems. With the advent of the semantic web, the requirement for machine processing of documents has become extremely important. We give particular attention to electronic documents in XML and in RDF/XML, with a view towards the processing of documents in the semantic web.

References

  1. Adobe Systems, San Jose, CA, USA. Adobe Reader 6.0 for Windows and Macintosh User Manual, 2003.Google ScholarGoogle Scholar
  2. T. Berners-Lee, J. Hendler, and O. Lassila. The semantic web. Scientific American, May 2001.Google ScholarGoogle ScholarCross RefCross Ref
  3. M. W. Berry. Survey of Text Mining: Clustering, Classification, and Retrieval. Springer-Verlag New York, inc., New York, NY, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1--7):107--117, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Cannataro, G. Carelli, A. Pugliese, and D. Sacca. Semantic lossy compression of XML data. In Knowledge Representation Meets Databases, 2001.Google ScholarGoogle Scholar
  6. M. Dalkilic and J. Costello. BioKnOT: Biological knowledge through ontologies and TFIDF. In Proceedings, Workshop on Search and Discovery in Bioinformatics, SIGIR-Bio, 2004.Google ScholarGoogle Scholar
  7. Y. Fu and J. Mostafa. Toward information retrieval web services for digital libraries. JCDL, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Goldstein, M. Kantrowitz, V. O. Mittal, and J. G. Carbonell. Summarizing text documents: Sentence selection and evaluation metrics. In Research and Development in Information Retrieval, pages 121--128, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. I. Korf, M. Yandell, and J. Bedell. Blast. O'Reilly & Associates, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. H. Liefke and D. Suciu. XMill: an efficient compressor for XML data. In Proceedings, ACM SIGMOD 2000, SIGMOD RECORD 29(2), pages 153--164, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C.-Y. Lin and E. Hovy. From single to multi-document summarization: a prototype system and its evaluation. In Proceedings of the 40th Anniversity Meeting of the Association for Computational Linguistics (ACL-02), Philadelphia, PA, USA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. McKeown, R. Barzilay, D. Evans, et al. Columbia multi-document summarization: Approach and evaluation. In Proceedings of the Workshop of Text Summarization, ACM SIGIR 2001, 2001.Google ScholarGoogle Scholar
  13. W. Ogden. Getting information from documents you cannot read: An interactive cross-language text retrieval and summarization system, 1999.Google ScholarGoogle Scholar
  14. W. Ogden, J. Cowie, M. Davis, E. Ludovik, S. Nirenburg, H. Molina-Salgado, and N. Sharples. Keizai: An interactive cross-language text retrieval system.Google ScholarGoogle Scholar
  15. W. C. Ogden and M. W. Davis. Improving cross-language text retrieval with human interactions. In HICSS, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. W. C. Ogden, M. W. Davis, and S. Rice. Document thumbnail visualization for rapid relevance judgments: When do they pay off? In Text REtrieval Conference, pages 528--534, 1998.Google ScholarGoogle Scholar
  17. G. Salton. Developments in automatic text retrieval. Science, 253:974--980, 1991.Google ScholarGoogle ScholarCross RefCross Ref
  18. G. Salton, J. Allan, C. Buckley, and A. Singhal. Automatic analysis, term generation and summarization of machine readable texts. Science, 264:1421--1426, June 1994.Google ScholarGoogle ScholarCross RefCross Ref
  19. G. Salton and C. Yang. On the specification of term values in automatic indexing. Journal of Documentation, 29:351--372, April 1973.Google ScholarGoogle ScholarCross RefCross Ref
  20. B. Suh, A. Woodruff, R. Rosenholtz, and A. Glass. Popout prism: Adding perceptual principles to overview+detail document interfaces, 2002.Google ScholarGoogle Scholar
  21. P. Tolani and J. R. Haritsa. XGRIND: A query-friendly XML compressor. In ICDE, 2002.Google ScholarGoogle ScholarCross RefCross Ref
  22. T. Welch. A technique for high-performance data compression. IEEE Computer, 17(6):8--19, 1984.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Semantic thumbnails: a novel method for summarizing document collections

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SIGDOC '04: Proceedings of the 22nd annual international conference on Design of communication: The engineering of quality documentation
        October 2004
        160 pages
        ISBN:1581138091
        DOI:10.1145/1026533

        Copyright © 2004 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 10 October 2004

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate355of582submissions,61%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader