Skip to main content
Log in

Signature files: An integrated access method for text and attributes, suitable for optical disk storage

  • Part I Computer Science
  • Published:
BIT Numerical Mathematics Aims and scope Submit manuscript

Abstract

We design and analyze integrated ways of applying the signature file approach for text and attributes simultaneously. In traditional signature file methods, the records are stored sequentially in the “main file”; for every record, a hash-coded abstraction of it (“record signature”) is created and stored in the signature file (usually, sequentially). To resolve a query, the signature file is scanned; the signatures retrieved correspond to all the qualifying records, plus some “false drops”.

Here, we extend some signature file methods, namely superimposed coding and disjoint coding, to handle text and attributes. We develop a mathematical model and derive formulas for the optimal choice of parameters. The proposed methods achieve significant performance improvements, because they can take advantage of the skewed distribution of the queries. Depending on the query frequencies, the false drop probability can be reduced 40–45 times (≈ 97% savings), for the same overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. R. Bayer and E. McCreight,Organization and maintenance of large ordered indexes, Acta Informatica, vol. 1, no. 3, pp. 173–189, 1972.

    Google Scholar 

  2. P. B. Berra, S. M. Chung, and N. I. Hachem,Computer architecture for a surrogate file to a very large data knowledge base, IEEE Computer Magazine, vol. 20, no. 3, pp. 25–32, March 1987.

    Google Scholar 

  3. S. Christodoulakis,Analysis of retrieval performance for records and objects using optical disk technology, ACM TODS, vol. 12, no. 2, pp. 137–169, June 1987.

    Google Scholar 

  4. S. Christodoulakis and C. Faloutsos,Design considerations for a message file server, IEEE Trans. on Software Engineering, vol. SE-10, no. 2, pp. 201–210, March 1984.

    Google Scholar 

  5. S. Christodoulakis and C. Faloutsos,Design and performance considerations for an optical disk based multimedia object server, IEEE Computer Magazine, vol. 19, no. 12, pp. 45–56, Dec. 1986.

    Google Scholar 

  6. S. Christodoulakis, F. Ho, and M. Theodoridou,The multimedia object presentation manager in MINOS: A symmetric approach, Proc. ACM SIGMOD, May 1986.

  7. S. Christodoulakis, M. Theodoridou, F. Ho, M. Papa, and A. Pathria,Multimedia document presentation, information extraction and document formation in MINOS: A model and a system, ACM TOOIS, vol. 4, no. 4, Oct. 1986.

  8. C. Faloutsos,Signature Files: An integrated access method for text and attributes suitable for optical disk storage, Tech. Rep. UMIACS-TR-87-23, CS-TR-1867, Dept. of Computer Science, Univ. of Maryland, College Park, June 1987.

    Google Scholar 

  9. C. Faloutsos and R. Chan,Fast text access methods for optical and large magnetic disks: designs and performance comparison, Proc. 14th international conference on VLDB. Long Beach, California, Aug. 1988.

  10. C. Faloutsos and S. Christodoulakis,Design of a signature file method that accounts for non-uniform occurrence and query frequencies, Proc. 11th International Conference on VLDB, pp. 165–170, Stockholm, Sweden, Aug. 1985.

  11. C. Faloutsos and S. Christodoulakis,Description and performance analysis of signature file methods, ACM TOOIS, vol. 5, no. 3, pp. 237–257, 1987.

    Google Scholar 

  12. L. Fujitani,Laser optical disk: the coming revolution in on-line storage, CACM, vol. 27, no. 6, pp. 546–554, June 1984.

    Google Scholar 

  13. G. H. Gonnet,Unstructured data bases, Tech. Report CS-82-09, Univ. of Waterloo, 1982.

  14. M. C. Harrison,Implementation of the substring test by hashing, CACM, vol. 14, no. 12, pp. 777–779, Dec. 1971.

    Google Scholar 

  15. R. L. Haskin,Special-purpose processors for text retrieval, Database Engineering, vol. 4, no. 1, pp. 16–29, Sept. 1981.

    Google Scholar 

  16. D. Hillis,The Connection Machine, MIT Press, Cambridge, Mass., 1985.

    Google Scholar 

  17. L. A. Hollaar,Text retrieval computers, IEEE Computer Magazine, vol. 12, no. 3, pp. 40–50, March 1979.

    Google Scholar 

  18. L. A. Hollaar, K. F. Smith, W. H. Chow, P. A. Emrath, and R. L. Haskin,Architecture and operation of a large, full-text information-retrieval systems, inAdvanced Database Machine Architecture, ed. D. K. Hsiao, pp. 256–299, Prentice-Hall, Englewood Cliffs, New Jersey, 1983.

    Google Scholar 

  19. P. Mockapetris,The domain name server, ISI/RS-84-133, Univ. of Southern California/Information Science Institute, June 1984.

  20. C. Mooers,Application of random codes to the gathering of statistical information, Bulletin 31, Zator Co, Cambridge, Mass, 1949, based on M.S. thesis, MIT, January 1948.

  21. N. Naffah and A. Karmouch,Agora — An experiment in multimedia message systems, IEEE Computer Magazine, vol. 19, no. 5, pp. 56–66, May 1986.

    Google Scholar 

  22. J. L. Pfaltz, W. H. Berman, and E. M. Cagley,Partial match retrieval using indexed descriptor files, CACM, vol. 23, no. 9, pp. 522–528, Sept. 1980.

    Google Scholar 

  23. A. Poggio, Garcia Luna Aceves, E. Craghill, D. Moran, L. Aguilar, D. Worthington, and J. Hight,CCWS: A computer based multimedia information system, IEEE Computer, pp. 92–103, Oct. 1985.

  24. Joseph Price,The optical disk pilot project at the library of congress, Videodisc and Optical Disk, vol. 4, no. 6, pp. 424–432, Nov.–Dec. 1984.

    Google Scholar 

  25. K. Ramamohanarao and J. Shepherd,A superimposed codeword indexing scheme for very large prolog databses, Third Intern. Conf. on Logic Programming, Springer Verlag, London, 1986.

    Google Scholar 

  26. C. S. Roberts,Partial-match retrieval via the method of superimposed codes, Proc. IEEE, vol. 67, no. 12, pp. 1624–1642, Dec. 1979.

    Google Scholar 

  27. R. Sacks-Davis and K. Ramamohanarao,A two level superimposed coding scheme for partial match retrieval, Information Systems, vol. 8, no. 4, pp. 273–280, 1983.

    Google Scholar 

  28. G. Salton and M. J. McGill,Introduction to modern information retrieval, McGraw-Hill, 1983.

  29. M. Solomon, L. Landweber, and D. Neuhengen,The CSNET name server, Computer Networks, vol. 6, no. 3, pp. 161–172, July 1982.

    Google Scholar 

  30. T. A. Standish,An essay on software reuse, IEEE Trans. on Software Engineering, vol. SE-10, no. 5, pp. 494–497, Sept. 1984.

    Google Scholar 

  31. C. Stanfill and B. Kahle,Parallel free-text search on the connection machine system, CACM, vol. 29, no. 12, pp. 1229–1239, Dec. 1986.

    Google Scholar 

  32. S. Stiassny,Mathematical analysis of various superimposed coding methods, American Documentation, vol. 11, no. 2, pp. 155–169, Feb. 1960.

    Google Scholar 

  33. J. A. Thom, K. Ramamohanarao, and L. Naish,A superjoin algorithm for deductive databases, Proc. 12th International Conference on VLDB, pp. 189–196, Kyoto, Japan, Aug. 1986.

  34. G. R. Thoma, S. Suthasinekul, F. A. Walker, J. Cookson, and M. Rashidian,A prototype system for the electronic storage and retrieval of document images, ACM TOOIS, vol. 3, no. 3, July 1985.

  35. D. Tsichritzis and S. Christodoulakis,Message files, ACM Trans. on Office Information Systems, vol. 1, no. 1, pp. 88–98, Jan. 1983.

    Google Scholar 

  36. D. Tsichritzis, S. Christodoulakis, P. Economopoulos, C. Faloutsos, A. Lee, D. Lee, J. Vandenbroek, and C. Woo,A multimedia office filing system, Proc. 9th International Conference on VLDB, Florence, Italy, Oct.-Nov. 1983.

  37. C. J. Van-Rijsbergen, Information Retrieval, Butterworths, London, England, 1979, 2nd edition.

    Google Scholar 

  38. H. K. T. Wong, H. F. Liu, F. Olken, D. Rotem, and L. Wong, Bit transposed files, Proc. 11th International Conference on VLDB, pp. 448–457, Stockholm, Sweden, Aug. 1985.

Download references

Author information

Authors and Affiliations

Authors

Additional information

Also with the University of Maryland Institute for Advanced Computer Studies (U.M.I.A.C.S.). This research was sponsored partially by the National Science Foundation under the grant DCR-86-16833.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Faloutsos, C. Signature files: An integrated access method for text and attributes, suitable for optical disk storage. BIT 28, 736–754 (1988). https://doi.org/10.1007/BF01954894

Download citation

  • Received:

  • Revised:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01954894

CR categories

General terms

Additional keywords and phrases

Navigation