Abstract.
The efficient retrieval of data items on set-valued attributes is an important research topic that has attracted little attention so far. We studied and modified four index structures (sequential signature files, signature trees, extendible signature hashing, and inverted files) for a fast retrieval of sets with low cardinality. We compared the index structures by implementing them and subjecting them to extensive experiments, investigating the influence of query set size, database size, domain size, and data distribution (synthetic and real). The results of the experiments clearly indicate that inverted files exhibit the best overall behavior of all tested index structures.
Similar content being viewed by others
References
Ash JE, Chubb PA, Ward SE, Welford SM, Willet P (1985) Communication, storage and retrieval of chemical information. Ellis Horwood, Chichester, UK
Bairoch A, Apweiler R (1996) The SWISS-PROT protein sequence data bank and its new supplement TrEMBL. Nucleic Acids Res 24(1):21-25
Bertino E, Kim W (1989) Indexing techniques for queries on nested objects. IEEE Trans Knowledg Data Eng 1(2):196-214
Biliris A (1992) An efficient database storage structure for large dynamic objects. In: Proceedings of the 8th international conference on data engineering, Tempe, AZ, February 1992, pp 301-308
Biliris A, Panagos E (1994) EOS user's guide. Technical report, AT&T Bell Laboratories, Florham Park, NJ
Böhm K, Rakow TC (1994) Metadata for multimedia documents. SIGMOD Rec 23(4):21-26
Cattell R (ed) (1997) The object database standard: ODMG 2.0. Morgan Kaufmann, San Francisco
Claussen J, Kemper A, Moerkotte G, Peithner K (1997) Optimizing queries with universal quantification in object-oriented and object-relational databases. In: Proceedings of the 23rd VLDB conference, Athens, Greece, August 1997, pp 286-295
Deppisch U (1986) S-tree: a dynamic balanced signature index for office retrieval. In: Proceedings of the 1986 ACM conference on research and development in information retrieval, Pisa, September 1986, pp 77-87
Fagin R, Nievergelt J, Pippenger N, Strong HR (1979) Extendible hashing - a fast access method for dynamic files. ACM Trans Database Sys 4(3):315-344
Faloutsos C, Christodoulakis S (1984) Signature files: an access method for documents and its analytical performance evaluation. ACM Trans Office Inform Sys 2(4):267-288
Fasman KH, Letovsky SI, Cottingham RW, Kingsbury DT (1996) Improvements to the GDB human genome data base. Nucleic Acids Res 24(1):57-63
Grobel T, Kilger C, Rude S (1992) Object-oriented modelling of production organization. In: Tagungsband der 22. GI-Jahrestagung, Karlsruhe, September 1992. Informatik Aktuell, Springer, Berlin Heidelberg New York, (in German)
Guttman A (1984) R-trees: a dynamic index structure for spatial searching. In: Proceedings of the 1984 ACM SIGMOD international conference on management of data, June 1984, Boston, pp 47-57
Hellerstein JM, Pfeffer A (1994) The RD-tree: an index structure for sets. Technical Report 1252, University of Wisconsin at Madison
Helmer S (1997) Index structures for databases containing data items with set-valued attributes. Technical Report 2/97, Universität Mannheim http://pi3.informatik.uni-mannheim.de
Ishikawa Y, Kitagawa H, Ohbo N (1993) Evaluation of signature files as set access facilities in OODBs. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, Washington, DC, May 1993, pp 247-256
Jain R, Hampapur A (1994) Metadata in video databases. SIGMOD Rec 23(4):27-33
Kemper A, Moerkotte G (1992) Access support relations: an indexing method for object bases. Inform Sys 17(2):117-146
Kitagawa H, Fukushima K (1996) Composite bit-sliced signature file: An efficient access method for set-valued object retrieval. In: Proceedings of the international symposium on cooperative database systems for advanced applications (CODAS), Kyoto, Japan, December 1996, pp 388-395
Kitagawa H, Fukushima Y, Ishikawa Y, Ohbo N (1993) Estimation of false drops in set-valued object retrieval with signature files. In: Proceedings of the 4th international conference on foundations of data organization and algorithms, Chicago, October 1993, pp 146-163
Knuth DE (1973) The art of computer programming, vol. 3: Sorting and searching. Addison-Wesley, Reading, MA
Maier D, Stein J (1986) Indexing in an object-oriented database. In: Proceedings of the IEEE workshop on object-oriented DBMSs, Asilomar, CA, September 1986
Moffat A, Zobel J (1996) Self-indexing inverted files for fast text retrieval. ACM Trans Inform Sys 14(4):349-379
Poosala V (1995) Zipf's law. Technical report, University of Wisconsin at Madison
Sacks-Davis R, Zobel J (1997) Text databases. In: Indexing techniques for advanced database systems. Kluwer, Amsterdam, pp 151-184
Stonebraker M, Moore D (1996) Object-relational DBMSs: the next great wave. Morgan Kaufmann, San Francisco
Vance B, Maier D (1996) Rapid bushy join-order optimization with cartesian products. In: Proceedings of the ACM SIGMOD international conference on management of data, Montréal, June 1996, pp 35-46
Westmann T, Kossmann D, Helmer S, Moerkotte G (2000) The implementation and performance of compressed databases. SIGMOD Rec 29(3):55-67
Will M, Fachinger W, Richert JR (1996) Fully automated structure elucidation - a spectroscopist's dream comes true. J Chem Inf Comput Sci 36:221-227
Witten IH, Moffat A, Bell TC (1999) Managing gigabytes. Morgan Kaufmann, San Francisco
Xie Z, Han J (1994) Join index hierarchies for supporting efficient navigation in object-oriented databases. In: Proceedings international conference on very large data bases (VLDB), Santiago, September 1994, pp 522-533
Zezula P, Rabitti F, Tiberio P (1991) Dynamic partitioning of signature files. ACM Trans Inform Sys 9(4):336-369
Zobel J, Moffat A, Ramamohanarao K (1996) Guidelines for presentation and comparison of indexing techniques. ACM SIGMOD Rec 25(3):10-15
Zobel J, Moffat A, Ramamohanarao K (1998) Inverted files versus signature files for text indexing. Trans Database Sys 23(4):453-490
Author information
Authors and Affiliations
Corresponding author
Additional information
Received: 0 May 2000, Accepted: 18 October 2000, Published online: 17 September 2003
Edited by E. Bertino
Rights and permissions
About this article
Cite this article
Helmer, S., Moerkotte, G. A performance study of four index structures for set-valued attributes of low cardinality. VLDB 12, 244–261 (2003). https://doi.org/10.1007/s00778-003-0106-0
Issue Date:
DOI: https://doi.org/10.1007/s00778-003-0106-0