Abstract
The inverted file index common to many full-text information retrieval systems presents unusual and challenging data management requirements. These requirements are usually met with custom data management software. Rather than build this custom software, we would prefer to use an existing database management system. Attempts to do this with traditional (e.g., relational) database management systems have produced discouraging results. Instead, we have used a persistent object store, Mneme, to support the inverted file of a full-text information retrieval system, INQUERY. The result is an improvement in performance along with opportunities for INQUERY to take advantage of the standard data management services provided by Mneme. We describe our implementation, present performance results on a variety of document collections, and discuss the advantages of using a persistent object store to support information retrieval.
This work is supported by the NSF Center for Intelligent Information Retrieval at the University of Massachusetts.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
D. C. Blair. An extended relational document retrieval model. Inf. Process. & Mgmnt., 24(3):349–371, 1988.
C. Buckley and A. F. Lewit. Optimization of inverted vector searches. In Proc. of the 8th Inter. ACM SIGIR Conf. on Res. and Develop in Infor. Retr., pages 97–110, June 1985.
J. P. Callan, W. B. Croft, and S. M. Harding. The INQUERY retrieval system. In Proc. of the 3rd Inter. Conf. on Database and Expert Sys. Apps., Sept. 1992.
R. G. Crawford. The relational model in information retrieval. J. Amer. Soc. Inf. Sci., 32(1):51–64, 1981.
R. G. Crawford and I. A. MacLeod. A relational approach to modular information retrieval systems design. In Proc. of the 41st Conf. of the Amer. Soc. for Inf. Sci., 1978.
J. S. Deogun and V. V. Raghavan. Integration of information retrieval and database management systems. Inf. Process. & Mgmnt., 24(3):303–313, 1988.
C. Faloutsos. Access methods for text. ACM Comput. Surv., 17:50–74, 1985.
E. A. Fox. Characterization of two new experimental collections in computer and information science containing textual and bibliographic concepts. Technical Report 83-561, Cornell University, Ithaca, NY, Sept. 1983.
D. A. Grossman and J. R. Driscoll. Structuring text within a relational system. In Proc. of the 3rd Inter. Conf. on Database and Expert Sys. Apps., pages 72–77, Sept. 1992.
D. Harman, editor. The First Text REtrieval Conference (TREC1). National Institute of Standards and Technology Special Publication 200–207, Gaithersburg, MD, 1992.
C. A. Lynch and M. Stonebraker. Extended user-defined indexing with application to textual databases. In Proc. of the 14th Inter. Conf. on VLDB, pages 306–317, 1988.
I. A. MacLeod. SEQUEL as a language for document retrieval. J. Amer. Soc. Inf. Sci., 30(5):243–249, 1979.
I. A. MacLeod and R. G. Crawford. Document retrieval as a database application. Inf. Tech. Res. Dev., 2(1):43–60, 1983.
J. E. B. Moss. Design of the Mneme persistent object store. ACM Trans. Inf. Syst., 8(2): 103–139, Apr. 1990.
G. Salton and M. J. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983.
L. V. Saxton and V. V. Raghavan. Design of an integrated information retrieval/database management system. IEEE Trans. Know. Data Eng., 2(2):210–219, June 1990.
M. Stonebraker. Operating system support for database management. Commun. ACM, 24(7):412–418, July 1981.
A. Tomasic and H. Garcia-Molina. Performance of inverted indices in distributed text document retrieval systems. Technical Report STAN-CS-92-1434, Stanford University Department of Computer Science, 1992.
A. Tomasic and H. Garcia-Molina. Caching and database scaling in distributed shared-nothing information retrieval systems. In Proc. of the ACM SIGMOD Inter. Conf. on Management of Data, Washington, D.C., May 1993.
H. Turtle and W. B. Croft. Evaluation of an inference network-based retrieval model. ACM Trans. Inf. Syst., 9(3): 187–222, July 1991.
D. Wolfram. Applying informetric characteristics of databases to IR system file design, Part I: informetric models. Inf. Process. & Mgmnt., 28(1):121–133, 1992.
D. Wolfram. Applying informetric characteristics of databases to IR system file design, Part II: simulation comparisons. Inf. Process. & Mgmnt., 28(1):135–151, 1992.
G. K.Zipf. Human Behavior and the Principle of Least Effort. Addison-Wesley Press, 1949.
J. Zobel, A. Moffat, and R. Sacks-Davis. An efficient indexing technique for full-text database systems. In Proc. of the 18th Inter. Conf. on VLDB, Vancouver, 1992.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1994 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Brown, E.W., Callan, J.P., Croft, W.B., Moss, J.E.B. (1994). Supporting full-text information retrieval with a persistent object store. In: Jarke, M., Bubenko, J., Jeffery, K. (eds) Advances in Database Technology — EDBT '94. EDBT 1994. Lecture Notes in Computer Science, vol 779. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-57818-8_64
Download citation
DOI: https://doi.org/10.1007/3-540-57818-8_64
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-57818-5
Online ISBN: 978-3-540-48342-7
eBook Packages: Springer Book Archive