Abstract
We design and analyze integrated ways of applying the signature file approach for text and attributes simultaneously. In traditional signature file methods, the records are stored sequentially in the “main file”; for every record, a hash-coded abstraction of it (“record signature”) is created and stored in the signature file (usually, sequentially). To resolve a query, the signature file is scanned; the signatures retrieved correspond to all the qualifying records, plus some “false drops”.
Here, we extend some signature file methods, namely superimposed coding and disjoint coding, to handle text and attributes. We develop a mathematical model and derive formulas for the optimal choice of parameters. The proposed methods achieve significant performance improvements, because they can take advantage of the skewed distribution of the queries. Depending on the query frequencies, the false drop probability can be reduced 40–45 times (≈ 97% savings), for the same overhead.
Similar content being viewed by others
References
R. Bayer and E. McCreight,Organization and maintenance of large ordered indexes, Acta Informatica, vol. 1, no. 3, pp. 173–189, 1972.
P. B. Berra, S. M. Chung, and N. I. Hachem,Computer architecture for a surrogate file to a very large data knowledge base, IEEE Computer Magazine, vol. 20, no. 3, pp. 25–32, March 1987.
S. Christodoulakis,Analysis of retrieval performance for records and objects using optical disk technology, ACM TODS, vol. 12, no. 2, pp. 137–169, June 1987.
S. Christodoulakis and C. Faloutsos,Design considerations for a message file server, IEEE Trans. on Software Engineering, vol. SE-10, no. 2, pp. 201–210, March 1984.
S. Christodoulakis and C. Faloutsos,Design and performance considerations for an optical disk based multimedia object server, IEEE Computer Magazine, vol. 19, no. 12, pp. 45–56, Dec. 1986.
S. Christodoulakis, F. Ho, and M. Theodoridou,The multimedia object presentation manager in MINOS: A symmetric approach, Proc. ACM SIGMOD, May 1986.
S. Christodoulakis, M. Theodoridou, F. Ho, M. Papa, and A. Pathria,Multimedia document presentation, information extraction and document formation in MINOS: A model and a system, ACM TOOIS, vol. 4, no. 4, Oct. 1986.
C. Faloutsos,Signature Files: An integrated access method for text and attributes suitable for optical disk storage, Tech. Rep. UMIACS-TR-87-23, CS-TR-1867, Dept. of Computer Science, Univ. of Maryland, College Park, June 1987.
C. Faloutsos and R. Chan,Fast text access methods for optical and large magnetic disks: designs and performance comparison, Proc. 14th international conference on VLDB. Long Beach, California, Aug. 1988.
C. Faloutsos and S. Christodoulakis,Design of a signature file method that accounts for non-uniform occurrence and query frequencies, Proc. 11th International Conference on VLDB, pp. 165–170, Stockholm, Sweden, Aug. 1985.
C. Faloutsos and S. Christodoulakis,Description and performance analysis of signature file methods, ACM TOOIS, vol. 5, no. 3, pp. 237–257, 1987.
L. Fujitani,Laser optical disk: the coming revolution in on-line storage, CACM, vol. 27, no. 6, pp. 546–554, June 1984.
G. H. Gonnet,Unstructured data bases, Tech. Report CS-82-09, Univ. of Waterloo, 1982.
M. C. Harrison,Implementation of the substring test by hashing, CACM, vol. 14, no. 12, pp. 777–779, Dec. 1971.
R. L. Haskin,Special-purpose processors for text retrieval, Database Engineering, vol. 4, no. 1, pp. 16–29, Sept. 1981.
D. Hillis,The Connection Machine, MIT Press, Cambridge, Mass., 1985.
L. A. Hollaar,Text retrieval computers, IEEE Computer Magazine, vol. 12, no. 3, pp. 40–50, March 1979.
L. A. Hollaar, K. F. Smith, W. H. Chow, P. A. Emrath, and R. L. Haskin,Architecture and operation of a large, full-text information-retrieval systems, inAdvanced Database Machine Architecture, ed. D. K. Hsiao, pp. 256–299, Prentice-Hall, Englewood Cliffs, New Jersey, 1983.
P. Mockapetris,The domain name server, ISI/RS-84-133, Univ. of Southern California/Information Science Institute, June 1984.
C. Mooers,Application of random codes to the gathering of statistical information, Bulletin 31, Zator Co, Cambridge, Mass, 1949, based on M.S. thesis, MIT, January 1948.
N. Naffah and A. Karmouch,Agora — An experiment in multimedia message systems, IEEE Computer Magazine, vol. 19, no. 5, pp. 56–66, May 1986.
J. L. Pfaltz, W. H. Berman, and E. M. Cagley,Partial match retrieval using indexed descriptor files, CACM, vol. 23, no. 9, pp. 522–528, Sept. 1980.
A. Poggio, Garcia Luna Aceves, E. Craghill, D. Moran, L. Aguilar, D. Worthington, and J. Hight,CCWS: A computer based multimedia information system, IEEE Computer, pp. 92–103, Oct. 1985.
Joseph Price,The optical disk pilot project at the library of congress, Videodisc and Optical Disk, vol. 4, no. 6, pp. 424–432, Nov.–Dec. 1984.
K. Ramamohanarao and J. Shepherd,A superimposed codeword indexing scheme for very large prolog databses, Third Intern. Conf. on Logic Programming, Springer Verlag, London, 1986.
C. S. Roberts,Partial-match retrieval via the method of superimposed codes, Proc. IEEE, vol. 67, no. 12, pp. 1624–1642, Dec. 1979.
R. Sacks-Davis and K. Ramamohanarao,A two level superimposed coding scheme for partial match retrieval, Information Systems, vol. 8, no. 4, pp. 273–280, 1983.
G. Salton and M. J. McGill,Introduction to modern information retrieval, McGraw-Hill, 1983.
M. Solomon, L. Landweber, and D. Neuhengen,The CSNET name server, Computer Networks, vol. 6, no. 3, pp. 161–172, July 1982.
T. A. Standish,An essay on software reuse, IEEE Trans. on Software Engineering, vol. SE-10, no. 5, pp. 494–497, Sept. 1984.
C. Stanfill and B. Kahle,Parallel free-text search on the connection machine system, CACM, vol. 29, no. 12, pp. 1229–1239, Dec. 1986.
S. Stiassny,Mathematical analysis of various superimposed coding methods, American Documentation, vol. 11, no. 2, pp. 155–169, Feb. 1960.
J. A. Thom, K. Ramamohanarao, and L. Naish,A superjoin algorithm for deductive databases, Proc. 12th International Conference on VLDB, pp. 189–196, Kyoto, Japan, Aug. 1986.
G. R. Thoma, S. Suthasinekul, F. A. Walker, J. Cookson, and M. Rashidian,A prototype system for the electronic storage and retrieval of document images, ACM TOOIS, vol. 3, no. 3, July 1985.
D. Tsichritzis and S. Christodoulakis,Message files, ACM Trans. on Office Information Systems, vol. 1, no. 1, pp. 88–98, Jan. 1983.
D. Tsichritzis, S. Christodoulakis, P. Economopoulos, C. Faloutsos, A. Lee, D. Lee, J. Vandenbroek, and C. Woo,A multimedia office filing system, Proc. 9th International Conference on VLDB, Florence, Italy, Oct.-Nov. 1983.
C. J. Van-Rijsbergen, Information Retrieval, Butterworths, London, England, 1979, 2nd edition.
H. K. T. Wong, H. F. Liu, F. Olken, D. Rotem, and L. Wong, Bit transposed files, Proc. 11th International Conference on VLDB, pp. 448–457, Stockholm, Sweden, Aug. 1985.
Author information
Authors and Affiliations
Additional information
Also with the University of Maryland Institute for Advanced Computer Studies (U.M.I.A.C.S.). This research was sponsored partially by the National Science Foundation under the grant DCR-86-16833.
Rights and permissions
About this article
Cite this article
Faloutsos, C. Signature files: An integrated access method for text and attributes, suitable for optical disk storage. BIT 28, 736–754 (1988). https://doi.org/10.1007/BF01954894
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01954894