skip to main content
10.1145/996350.996377acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
Article

The effectiveness of automatically structured queries in digital libraries

Published:07 June 2004Publication History

ABSTRACT

Structured or fielded metadata is the basis for many digital library services, including searching and browsing. Yet, little is known about the impact of using structure on the effectiveness of such services. In this paper, we investigate a key research question: do structured queries improve effectiveness in DL searching? To answer this question, we empirically compared the use of unstructured queries to the use of structured queries. We then tested the capability of a simple Bayesian network system, built on top of a DL retrieval engine, to infer the best structured queries from the keywords entered by the user. Experiments performed with 20 subjects working with a DL containing a large collection of computer science literature clearly indicate that structured queries, either manually constructed or automatically generated, perform better than their unstructured counterparts, in the majority of cases. Also, automatic structuring of queries appears to be an effectiveand viable alternative to manual structuring that may significantly reduce the burden on users.

References

  1. S. Acid, L. M. de Campos, J. M.Fernández-Luna, and J. F. Huete An information retrieval model based on simple Bayesian networks International Journal of Intelligent Systems, 18(2):251--265, January 2003.Google ScholarGoogle Scholar
  2. S. Agrawal, S. Chaudhuri, and G. Das DBXplorer: A system for keyword--based search over relational databases. In Proceedings of the 18th International Conference on Data Engineering, pages 5--16, San Jose, CA, USA, February 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Baeza-Yates and B. Ribeiro-Neto Modern Information Retrieval Addison Wesley, New York, NY, USA, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Baldonado, S. Katz, A. Paepcke, C-C. K. Chang, H. Garcia-Molina, and T Winograd An extensible constructor tool for the rapid, interactive design of query synthesizers. In DL'98: Proceedings of the 3rd ACM International Conference on Digital Libraries, pages 19--28, Pittsburgh, PA, USA, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Baldonado and T. Winograd Sensemaker: An information-exploration interface supporting the contextual evolution of a user's interests. In Proceedings of ACM CHI 97 Conference on Human Factors in Computing Systems, pages 11--18, Atlanta, GA, USA, March 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Cai, C. J. Van Rijsbergen, and J. M. Jose Automatic query expansion based on divergence. In Proceedings of the 10th International Conference on Information and Knowledge Management CIKM'01, pages 419--426, New York, November 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Calado, M. Cristo, E. Moura, N. Ziviani, B. Ribeiro-Neto, and M. A. Gonçalves Combining link-based and content-based methods for web document classification In Proceedings of the 12th International Conference on Information and Knowledge Management, pages 394--401, New Orleans, LA, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Calado, A. S. da Silva, R. C. Vieira, A. H. F. Laender, and B. A. Ribeiro-Neto Searching web databases by structuring keyword-based queries. In Proceedings of the 11th International Conference on Information and Knowledge Management, pages 26--33, McLean, VA, USA, 2002 ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. P. Callan Document filtering with inference networks. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 262--269, Zurich, Switzerland, August 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. F. Can, R Nuray, and A. B. Sevdik Automatic perfomance evaluation of Web search engines. Information Processing and Management, 2004 In press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. T. Chinenyanga and N. Kushmerick Expressive retrieval from XML documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 163--171, New Orleans, Louisiana, USA, September 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. V. Cormack, C. R. Palmer, and C. L. A. Clarke Efficient construction of large test collections. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 282--289, Melbourne, Australia, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. B. Cousins, A. Paepcke, T. Winograd, E. A. Bier, and K. Pier, The digital library integrated task environment (DLITE) In DL'97: Proceedings of the 2nd ACM International Conference on Digital Libraries, pages 142--151, Philadelphia, PA, USA, July 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. W. B. Croft, H. R. Turtle, and D. D. Lewis, The use of phrases and structured queries in information retrieval. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 32--45, Chicago, IL, USA, October 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. S. da Silva, P. Calado, R. C. Vieira, A. H. F. Laender, and B. A. Ribeiro-Neto Effective Databases for Text & Document Management, chapter Keyword-based Queries over Web Databases, pages 74--92 Idea Group Publishing, Hershey, PA, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Dar, G. Entin, S. Geva, and E Palmon DTL's DataSpot: Database exploration using plain language. In Proceedings of 24th International Conference on Very Large Data Bases VLBD'98, pages 645--649, New York, NY, USA, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. L. M. de Campos, J. M. Fernández-Luna, and J. F. Huete Query Expansion in Information Retrieval Systems Using a Bayesian Network-Based Thesaurus In Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 53--60, San Francisco, CA, July 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. T. Dumais, J. Platt, D. Hecherman, and M. Sahami Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th International Conference on Information and Knowledge Management CIKM'98, pages 148--155, Bethesda, Maryland, USA, November 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W. Fan, M. D. Gordon, and P. Pathak Discovery of context--specific ranking functions for effective information retrieval using genetic programming IEEE Transactions on Knowledge and Data Engineering, 16(4):523--527, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Florescu, D. Kossmann, and I. Manolescu Integrating keyword search into XML query processing WWW9/Computer Networks, 33(1-6):119--135, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. A. Fox Relational Models of the Lexicon: Representing Knowledge in Semantic Networks, chapter Improved Retrieval Using a Relational Thesaurus for Automatic Expansion of Boolean Logic Queries, pages 199--210 Cambridge University Press, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. E. A. Fox and F. D. Neves Extending retrieval with stepping stones and pathways -- NSF proposal (funded), 2003.Google ScholarGoogle Scholar
  23. N. Fuhr and K. Gross XIRQL: a query language for information retrieval in XML documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 172--180, New Orleans, Louisiana, USA, September 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. D. Haines and W. B. Croft Relevance feedback and inference networks. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2--11, Pittsburgh, PA, USA, June 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Mitra, A. Singhal, and C. Buckley Improving automatic query expansion In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 206--214, Melbourne, Australia, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. H. Myaeng, D-H. Jang, M-S. Kim, and Z.-C. Zhoo A exible model for retrieval of SGML documents. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 138--145, Melbourne, Australia, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Navarro and R. Baeza-- Yates Proximal nodes: A model to query document databases by content and structure ACM Transactions on Information Systems, 15(4):400--435, Oct 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J. Pearl Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference Morgan Kaufmann Publishers, San Mateo, California, 2nd edition, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. B. Ribeiro-Neto and R. Muntz A. belief network model for IR. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 253--260, Zurich, Switzerland, August 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Salton, C. Buckley, and E. A. Fox Automatic query formulations in information retrieval Journal of the American Society for Information Science, 34(4):262--280, July 1983.Google ScholarGoogle Scholar
  31. G. Salton and M. J. McGill Introduction to Modern Information Retrieval McGraw--Hill, Tokio, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Schlieder and H. Meuss Querying and ranking XML documents JASIST, 53(6):489--503, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Shin, S. Nam, and M. Kim Hypertext construction using statistical and semantic similarity. In DL'97: Proceedings of the 2nd ACM International Conference on Digital Libraries, pages 57--63, Philadelphia, PA, USA, July 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. I. Silva, B. Ribeiro-Neto, P. Calado, E. Moura, and N. Ziviani Link-based and content-based evidential information in a belief network model. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Theory and Practice in Information Retrieval, pages 96--103, Athens, Greece, July 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. A. Theobald and G Weikum Adding Relevance to XML In Int'l Workshop on the Web and Databases (WebDB), Dallas, TX, May 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. H. R. Turtle and W B Croft Inference networks for document retrieval In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1--24, Brussels, Belgium, September 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. R. F. Valle, B. A. Ribeiro-Neto, L. R. S. de Lima, A. H. F. Laender, and H. R. Freitas-Junior Improving text retrieval in medical collections through automatic categorization In Proceedings of the 10th International Symposium on String Processing and Information Retrieval SPIRE 2003, pages 197--210, Manaus, Brazil, October 2003.Google ScholarGoogle Scholar
  38. E. M. Voorhees Query expansion using lexical-semantic relations In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 61--69, Dublin, Ireland, July 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. E. M. Voorhees and D Harman Overview of the sixth text REtrieval conference (TREC-6) Nov 1997.Google ScholarGoogle Scholar
  40. J. Zobel How reliable are the results of large-scale information retrieval experiments? In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 307--314, Melbourne, Australia, August 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. The effectiveness of automatically structured queries in digital libraries

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        JCDL '04: Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
        June 2004
        440 pages
        ISBN:1581138326
        DOI:10.1145/996350

        Copyright © 2004 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 7 June 2004

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        JCDL '04 Paper Acceptance Rate61of249submissions,24%Overall Acceptance Rate415of1,482submissions,28%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader