Skip to main content

The Semantic GrowBag Algorithm: Automatically Deriving Categorization Systems

  • Conference paper
Research and Advanced Technology for Digital Libraries (ECDL 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4675))

Included in the following conference series:

Abstract

Using keyword search to find relevant objects in digital libraries often results in way too large result sets. Based on the metadata associated with such objects, the faceted search paradigm allows users to structure and filter the result set, for example, using a publication type facet to show only books or videos. These facets usually focus on clear-cut characteristics of digital items, however it is very difficult to also organize the actual semantic content information into such a facet. The Semantic GrowBag approach, presented in this paper, uses the keywords provided by many authors of digital objects to automatically create light-weight topic categorization systems as a basis for a meaningful and dynamically adaptable topic facet. Using such emergent semantics enables an alternative way to filter large result sets according to the objects’ content without the need to manually classify all objects with respect to a pre-specified vocabulary. We present the details of our algorithm using the DBLP collection of computer science documents and show some experimental evidence about the quality of the achieved results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hearst, M.A.: Clustering versus faceted categories for information exploration. Commun. ACM 49(4), 59–61 (2006)

    Article  Google Scholar 

  2. Rodden, K., Basalaj, W., Sinclair, D., Wood, K.: Does organisation by similarity assist image browsing? In: Proc. of SIGCHI conference, pp. 190–197 (2001)

    Google Scholar 

  3. Ross, K., Janevski, A.: Querying faceted databases. In: Bussler, C.J., Tannen, V., Fundulaki, I. (eds.) SWDB 2004. LNCS, vol. 3372, pp. 199–218. Springer, Heidelberg (2005)

    Google Scholar 

  4. Weber, A., Reuther, P., Walter, B., Ley, M., Klink, S.: Multi-layered browsing and visualization for digital libraries. In: Gonzalo, J., Thanos, C., Verdejo, M.F., Carrasco, R.C. (eds.) ECDL 2006. LNCS, vol. 4172, Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  5. Diederich, J., Thaden, U., Balke, W.T.: The semantic growbag demonstrator for automatically organizing topic facets. In: Proc. of the SIGIR Workshop on Faceted Search (2006)

    Google Scholar 

  6. Diederich, J., Thaden, U., Balke, W.T.: Demonstrating the Semantic GrowBag: Automatically Creating Topic Facets for FacetedDBLP. In: Proc. of the JCDL (2007)

    Google Scholar 

  7. Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: a cluster-based approach to browsing large document collections. In: Proc. of SIGIR conference, pp. 318–329 (1992)

    Google Scholar 

  8. Pratt, W., Hearst, M.A., Fagan, L.M.: A knowledge-based approach to organizing retrieved documents. In: Proc. of the AAAI conference, Menlo Park, CA, USA, pp. 80–85 (1999)

    Google Scholar 

  9. Park, J., Hunting, S.: XML Topic Maps: Creating and Using Topic Maps for the Web. Addison-Wesley, Reading (2002)

    Google Scholar 

  10. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. of the Conference on Computational Linguistics, pp. 539–545 (1992)

    Google Scholar 

  11. Cimiano, P., Völker, J.: Text2onto - a framework for ontology learning and data-driven change discovery. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 227–238. Springer, Heidelberg (2005)

    Google Scholar 

  12. Sanderson, M., Croft, B.: Deriving concept hierarchies from text. In: Proc. of the SIGIR conference, pp. 206–213 (1999)

    Google Scholar 

  13. Schütze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–123 (1998)

    Google Scholar 

  14. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University (1998)

    Google Scholar 

  15. Jeh, G., Widom, J.: SimRank: A Measure of Structural-Context Similarity. In: Proc. of the SIGKDD conference (2002)

    Google Scholar 

  16. Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)

    Article  Google Scholar 

  17. Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of SIGIR conference, pp. 50–57 (1999)

    Google Scholar 

  18. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press/Addison-Wesley (1999)

    Google Scholar 

  19. Langville, A., Meyer, C.: Deeper inside pagerank. Internet Mathmatics 2(1), 335–380 (2004)

    MathSciNet  Google Scholar 

  20. Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

László Kovács Norbert Fuhr Carlo Meghini

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Diederich, J., Balke, WT. (2007). The Semantic GrowBag Algorithm: Automatically Deriving Categorization Systems. In: Kovács, L., Fuhr, N., Meghini, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2007. Lecture Notes in Computer Science, vol 4675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74851-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74851-9_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74850-2

  • Online ISBN: 978-3-540-74851-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics