skip to main content
10.1145/191839.191869acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article
Free Access

The effectiveness of GIOSS for the text database discovery problem

Authors Info & Claims
Published:24 May 1994Publication History

ABSTRACT

The popularity of on-line document databases has led to a new problem: finding which text databases (out of many candidate choices) are the most relevant to a user. Identifying the relevant databases for a given query is the text database discovery problem. The first part of this paper presents a practical solution based on estimating the result size of a query and a database. The method is termed GlOSS—Glossary of Servers Server. The second part of this paper evaluates the effectiveness of GlOSS based on a trace of real user queries. In addition, we analyze the storage cost of our approach.

References

  1. 1.Luis Gravano, H6ctor Garc/a-Molina, and Anthony Tomasic. The efficacy of GLOSS for the text database discovery problem. Technical Report STAN-CS-TN- 93-002, Stanford University, November 1993. Available by anonymous ftp from db.stazlford.edu in /pub/grava_no/1993/st a_n. cs. tn. 93. 009. ps. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. 2.Michael F. Schwartz, Alan Emtage, Brewster Kahle, and B. Cliford Neuman. A comparison of INTERNET resource discovery approaches. Computer Systems, 5(4), 1992.Google ScholarGoogle Scholar
  3. 3.Katia Obraczka, Peter B. Danzig, and Shih-Hao Li. INTERNET resource discovery services. IEEE Computer, September 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. 4.Tim Berners-Lee, Robert Cailliau, Jean-F. Croft, and Bernd Pollermann. World-Wide Web: The Information Universe. Electronic Networking: Research, Applications and Policy, 1(2), 1992.Google ScholarGoogle Scholar
  5. 5.Steve Foster. About the Veronica service, November 1992. Message posted in comp. ~nfosystems. gopher.Google ScholarGoogle Scholar
  6. 6.B. Clifford Neuman. The Prospero File System: A global file system based on the Virtual System model. Computer Systems, 5(4), 1992.Google ScholarGoogle Scholar
  7. 7.Brewster Kahle and Art Medlar. An information system for corporate users: Wide Area Information Servers. Technical Report TMC199, Thinking Machines Corporation, April 1991.Google ScholarGoogle Scholar
  8. 8.Jim Fullton, Archie Warnock, et al. Release notes for freeWAIS 0.2, October 1993.Google ScholarGoogle Scholar
  9. 9.Michael F. Schwartz. A scalable, non-hierarchical resource discovery mechanism based on probabilistic protocols. Technical Report CU-CS-474-90, Dept. of Computer Science, University of Colorado at Boulder, June 1990.Google ScholarGoogle Scholar
  10. 10.Michael F. Schwartz. INTERNET resource discovery at the University of Colorado. IEEE Computer, September 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. 11.Peter B. Danzig, Shih-Hao Li, and Katia Obraczka. Distributed indexing of autonomous INTERNET services. Computer Systems, 5(4), 1992.Google ScholarGoogle Scholar
  12. 12.Peter B. Danzig, Jongsuk Ahn, John Noll, and Katia Obraczka. Distributed indexing: a scalable mechanism for distributed information retrieval. In Proceedings of the 14th Annual SIGIR Conference, October 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. 13.Patricia Simpson and Rafael Alonso. Querying a network of autonomous databases. Technical Report CS-TR-202-89, Dept. of Computer Science, Princeton University, January 1989.Google ScholarGoogle Scholar
  14. 14.Daniel Barbar# and Chris Clifton. Information Brokers: Sharing knowledge in a heterogeneous distributed system. Technical Report MITL-TR-31-92, Matsushita Information Technology Laboratory, October 1992.Google ScholarGoogle Scholar
  15. 15.Joann J. OrdilIe and Barton P. Miller. Distributed active catalogs and meta-data caching in descriptive name services. Technical Report #1118, University of Wisconsin-Madison, November 1992.Google ScholarGoogle Scholar
  16. 16.Chris Weider and Simon Spero. Architecture of the WHOIS++ Index Service, October 1993. Working draft.Google ScholarGoogle Scholar
  17. 17.Ran Giladi and Peretz Shoval. Routing queries in a network of databases driven by a meta knowledgebase. In Proceedings of the International Workshop on Next Generation Informatwn Technologies and Systems, June 1993.Google ScholarGoogle Scholar
  18. 18.Mark A. Sheldon, Andrzej Duda, Ron Weiss, James W. O'Toole, and David K. Gifford. A content routing system for distributed information servers. To appear in EDBT '94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. 19.Alice Y. Chamis. Selection of online databases using switching vocabularies. Journal of the American Society for Information Sc,ence, 39(3), 1988.Google ScholarGoogle Scholar
  20. 20.Gerard Salton and Michael J. McGill. Introduction to modern information retrieval. McGraw-Hill, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. 21.Gerard Salton and Chris Buckley. Parallel text search methods. Communicatwns of the ACM, 31(2), February 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The effectiveness of GIOSS for the text database discovery problem

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in
              • Published in

                cover image ACM Conferences
                SIGMOD '94: Proceedings of the 1994 ACM SIGMOD international conference on Management of data
                May 1994
                525 pages
                ISBN:0897916395
                DOI:10.1145/191839

                Copyright © 1994 ACM

                Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 24 May 1994

                Permissions

                Request permissions about this article.

                Request Permissions

                Check for updates

                Qualifiers

                • Article

                Acceptance Rates

                Overall Acceptance Rate785of4,003submissions,20%

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader