Abstract
Using keyword search to find relevant objects in digital libraries often results in way too large result sets. Based on the metadata associated with such objects, the faceted search paradigm allows users to structure and filter the result set, for example, using a publication type facet to show only books or videos. These facets usually focus on clear-cut characteristics of digital items, however it is very difficult to also organize the actual semantic content information into such a facet. The Semantic GrowBag approach, presented in this paper, uses the keywords provided by many authors of digital objects to automatically create light-weight topic categorization systems as a basis for a meaningful and dynamically adaptable topic facet. Using such emergent semantics enables an alternative way to filter large result sets according to the objects’ content without the need to manually classify all objects with respect to a pre-specified vocabulary. We present the details of our algorithm using the DBLP collection of computer science documents and show some experimental evidence about the quality of the achieved results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Hearst, M.A.: Clustering versus faceted categories for information exploration. Commun. ACM 49(4), 59–61 (2006)
Rodden, K., Basalaj, W., Sinclair, D., Wood, K.: Does organisation by similarity assist image browsing? In: Proc. of SIGCHI conference, pp. 190–197 (2001)
Ross, K., Janevski, A.: Querying faceted databases. In: Bussler, C.J., Tannen, V., Fundulaki, I. (eds.) SWDB 2004. LNCS, vol. 3372, pp. 199–218. Springer, Heidelberg (2005)
Weber, A., Reuther, P., Walter, B., Ley, M., Klink, S.: Multi-layered browsing and visualization for digital libraries. In: Gonzalo, J., Thanos, C., Verdejo, M.F., Carrasco, R.C. (eds.) ECDL 2006. LNCS, vol. 4172, Springer, Heidelberg (2006)
Diederich, J., Thaden, U., Balke, W.T.: The semantic growbag demonstrator for automatically organizing topic facets. In: Proc. of the SIGIR Workshop on Faceted Search (2006)
Diederich, J., Thaden, U., Balke, W.T.: Demonstrating the Semantic GrowBag: Automatically Creating Topic Facets for FacetedDBLP. In: Proc. of the JCDL (2007)
Cutting, D.R., Karger, D.R., Pedersen, J.O., Tukey, J.W.: Scatter/gather: a cluster-based approach to browsing large document collections. In: Proc. of SIGIR conference, pp. 318–329 (1992)
Pratt, W., Hearst, M.A., Fagan, L.M.: A knowledge-based approach to organizing retrieved documents. In: Proc. of the AAAI conference, Menlo Park, CA, USA, pp. 80–85 (1999)
Park, J., Hunting, S.: XML Topic Maps: Creating and Using Topic Maps for the Web. Addison-Wesley, Reading (2002)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proc. of the Conference on Computational Linguistics, pp. 539–545 (1992)
Cimiano, P., Völker, J.: Text2onto - a framework for ontology learning and data-driven change discovery. In: Montoyo, A., Muńoz, R., Métais, E. (eds.) NLDB 2005. LNCS, vol. 3513, pp. 227–238. Springer, Heidelberg (2005)
Sanderson, M., Croft, B.: Deriving concept hierarchies from text. In: Proc. of the SIGIR conference, pp. 206–213 (1999)
Schütze, H.: Automatic word sense discrimination. Comput. Linguist. 24(1), 97–123 (1998)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report, Stanford University (1998)
Jeh, G., Widom, J.: SimRank: A Measure of Structural-Context Similarity. In: Proc. of the SIGKDD conference (2002)
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of SIGIR conference, pp. 50–57 (1999)
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press/Addison-Wesley (1999)
Langville, A., Meyer, C.: Deeper inside pagerank. Internet Mathmatics 2(1), 335–380 (2004)
Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Diederich, J., Balke, WT. (2007). The Semantic GrowBag Algorithm: Automatically Deriving Categorization Systems. In: Kovács, L., Fuhr, N., Meghini, C. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2007. Lecture Notes in Computer Science, vol 4675. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74851-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-74851-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74850-2
Online ISBN: 978-3-540-74851-9
eBook Packages: Computer ScienceComputer Science (R0)