ABSTRACT
Structured or fielded metadata is the basis for many digital library services, including searching and browsing. Yet, little is known about the impact of using structure on the effectiveness of such services. In this paper, we investigate a key research question: do structured queries improve effectiveness in DL searching? To answer this question, we empirically compared the use of unstructured queries to the use of structured queries. We then tested the capability of a simple Bayesian network system, built on top of a DL retrieval engine, to infer the best structured queries from the keywords entered by the user. Experiments performed with 20 subjects working with a DL containing a large collection of computer science literature clearly indicate that structured queries, either manually constructed or automatically generated, perform better than their unstructured counterparts, in the majority of cases. Also, automatic structuring of queries appears to be an effectiveand viable alternative to manual structuring that may significantly reduce the burden on users.
- S. Acid, L. M. de Campos, J. M.Fernández-Luna, and J. F. Huete An information retrieval model based on simple Bayesian networks International Journal of Intelligent Systems, 18(2):251--265, January 2003.Google Scholar
- S. Agrawal, S. Chaudhuri, and G. Das DBXplorer: A system for keyword--based search over relational databases. In Proceedings of the 18th International Conference on Data Engineering, pages 5--16, San Jose, CA, USA, February 2002. Google ScholarDigital Library
- R. Baeza-Yates and B. Ribeiro-Neto Modern Information Retrieval Addison Wesley, New York, NY, USA, 1999. Google ScholarDigital Library
- M. Baldonado, S. Katz, A. Paepcke, C-C. K. Chang, H. Garcia-Molina, and T Winograd An extensible constructor tool for the rapid, interactive design of query synthesizers. In DL'98: Proceedings of the 3rd ACM International Conference on Digital Libraries, pages 19--28, Pittsburgh, PA, USA, June 1998. Google ScholarDigital Library
- M. Baldonado and T. Winograd Sensemaker: An information-exploration interface supporting the contextual evolution of a user's interests. In Proceedings of ACM CHI 97 Conference on Human Factors in Computing Systems, pages 11--18, Atlanta, GA, USA, March 1997. Google ScholarDigital Library
- D. Cai, C. J. Van Rijsbergen, and J. M. Jose Automatic query expansion based on divergence. In Proceedings of the 10th International Conference on Information and Knowledge Management CIKM'01, pages 419--426, New York, November 2001. Google ScholarDigital Library
- P. Calado, M. Cristo, E. Moura, N. Ziviani, B. Ribeiro-Neto, and M. A. Gonçalves Combining link-based and content-based methods for web document classification In Proceedings of the 12th International Conference on Information and Knowledge Management, pages 394--401, New Orleans, LA, USA, 2003. Google ScholarDigital Library
- P. Calado, A. S. da Silva, R. C. Vieira, A. H. F. Laender, and B. A. Ribeiro-Neto Searching web databases by structuring keyword-based queries. In Proceedings of the 11th International Conference on Information and Knowledge Management, pages 26--33, McLean, VA, USA, 2002 ACM Press. Google ScholarDigital Library
- J. P. Callan Document filtering with inference networks. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 262--269, Zurich, Switzerland, August 1996. Google ScholarDigital Library
- F. Can, R Nuray, and A. B. Sevdik Automatic perfomance evaluation of Web search engines. Information Processing and Management, 2004 In press. Google ScholarDigital Library
- T. T. Chinenyanga and N. Kushmerick Expressive retrieval from XML documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 163--171, New Orleans, Louisiana, USA, September 2001. Google ScholarDigital Library
- G. V. Cormack, C. R. Palmer, and C. L. A. Clarke Efficient construction of large test collections. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 282--289, Melbourne, Australia, August 1998. Google ScholarDigital Library
- S. B. Cousins, A. Paepcke, T. Winograd, E. A. Bier, and K. Pier, The digital library integrated task environment (DLITE) In DL'97: Proceedings of the 2nd ACM International Conference on Digital Libraries, pages 142--151, Philadelphia, PA, USA, July 1997. Google ScholarDigital Library
- W. B. Croft, H. R. Turtle, and D. D. Lewis, The use of phrases and structured queries in information retrieval. In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 32--45, Chicago, IL, USA, October 1991. Google ScholarDigital Library
- A. S. da Silva, P. Calado, R. C. Vieira, A. H. F. Laender, and B. A. Ribeiro-Neto Effective Databases for Text & Document Management, chapter Keyword-based Queries over Web Databases, pages 74--92 Idea Group Publishing, Hershey, PA, USA, 2003. Google ScholarDigital Library
- S. Dar, G. Entin, S. Geva, and E Palmon DTL's DataSpot: Database exploration using plain language. In Proceedings of 24th International Conference on Very Large Data Bases VLBD'98, pages 645--649, New York, NY, USA, August 1998. Google ScholarDigital Library
- L. M. de Campos, J. M. Fernández-Luna, and J. F. Huete Query Expansion in Information Retrieval Systems Using a Bayesian Network-Based Thesaurus In Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence (UAI-98), pages 53--60, San Francisco, CA, July 1998. Google ScholarDigital Library
- S. T. Dumais, J. Platt, D. Hecherman, and M. Sahami Inductive learning algorithms and representations for text categorization. In Proceedings of the 7th International Conference on Information and Knowledge Management CIKM'98, pages 148--155, Bethesda, Maryland, USA, November 1998. Google ScholarDigital Library
- W. Fan, M. D. Gordon, and P. Pathak Discovery of context--specific ranking functions for effective information retrieval using genetic programming IEEE Transactions on Knowledge and Data Engineering, 16(4):523--527, 2003. Google ScholarDigital Library
- D. Florescu, D. Kossmann, and I. Manolescu Integrating keyword search into XML query processing WWW9/Computer Networks, 33(1-6):119--135, 2000. Google ScholarDigital Library
- E. A. Fox Relational Models of the Lexicon: Representing Knowledge in Semantic Networks, chapter Improved Retrieval Using a Relational Thesaurus for Automatic Expansion of Boolean Logic Queries, pages 199--210 Cambridge University Press, 1988. Google ScholarDigital Library
- E. A. Fox and F. D. Neves Extending retrieval with stepping stones and pathways -- NSF proposal (funded), 2003.Google Scholar
- N. Fuhr and K. Gross XIRQL: a query language for information retrieval in XML documents. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 172--180, New Orleans, Louisiana, USA, September 2001. Google ScholarDigital Library
- D. Haines and W. B. Croft Relevance feedback and inference networks. In Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2--11, Pittsburgh, PA, USA, June 1993. Google ScholarDigital Library
- M. Mitra, A. Singhal, and C. Buckley Improving automatic query expansion In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 206--214, Melbourne, Australia, August 1998. Google ScholarDigital Library
- S. H. Myaeng, D-H. Jang, M-S. Kim, and Z.-C. Zhoo A exible model for retrieval of SGML documents. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 138--145, Melbourne, Australia, August 1998. Google ScholarDigital Library
- G. Navarro and R. Baeza-- Yates Proximal nodes: A model to query document databases by content and structure ACM Transactions on Information Systems, 15(4):400--435, Oct 1997. Google ScholarDigital Library
- J. Pearl Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference Morgan Kaufmann Publishers, San Mateo, California, 2nd edition, 1988. Google ScholarDigital Library
- B. Ribeiro-Neto and R. Muntz A. belief network model for IR. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 253--260, Zurich, Switzerland, August 1996. Google ScholarDigital Library
- G. Salton, C. Buckley, and E. A. Fox Automatic query formulations in information retrieval Journal of the American Society for Information Science, 34(4):262--280, July 1983.Google Scholar
- G. Salton and M. J. McGill Introduction to Modern Information Retrieval McGraw--Hill, Tokio, 1983. Google ScholarDigital Library
- T. Schlieder and H. Meuss Querying and ranking XML documents JASIST, 53(6):489--503, 2002. Google ScholarDigital Library
- D. Shin, S. Nam, and M. Kim Hypertext construction using statistical and semantic similarity. In DL'97: Proceedings of the 2nd ACM International Conference on Digital Libraries, pages 57--63, Philadelphia, PA, USA, July 1997. Google ScholarDigital Library
- I. Silva, B. Ribeiro-Neto, P. Calado, E. Moura, and N. Ziviani Link-based and content-based evidential information in a belief network model. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Theory and Practice in Information Retrieval, pages 96--103, Athens, Greece, July 2000. Google ScholarDigital Library
- A. Theobald and G Weikum Adding Relevance to XML In Int'l Workshop on the Web and Databases (WebDB), Dallas, TX, May 2000. Google ScholarDigital Library
- H. R. Turtle and W B Croft Inference networks for document retrieval In Proceedings of the 13th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1--24, Brussels, Belgium, September 1990. Google ScholarDigital Library
- R. F. Valle, B. A. Ribeiro-Neto, L. R. S. de Lima, A. H. F. Laender, and H. R. Freitas-Junior Improving text retrieval in medical collections through automatic categorization In Proceedings of the 10th International Symposium on String Processing and Information Retrieval SPIRE 2003, pages 197--210, Manaus, Brazil, October 2003.Google Scholar
- E. M. Voorhees Query expansion using lexical-semantic relations In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 61--69, Dublin, Ireland, July 1994. Google ScholarDigital Library
- E. M. Voorhees and D Harman Overview of the sixth text REtrieval conference (TREC-6) Nov 1997.Google Scholar
- J. Zobel How reliable are the results of large-scale information retrieval experiments? In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 307--314, Melbourne, Australia, August 1998. Google ScholarDigital Library
Index Terms
- The effectiveness of automatically structured queries in digital libraries
Recommendations
Automatic structured query transformation over distributed digital libraries
SAC '06: Proceedings of the 2006 ACM symposium on Applied computingStructured data and complex schemas are becoming the main way to represent the information many Digital Libraries provide, thus impacting the services they offer. When searching information among distributed Digital Libraries with heterogeneous schemas, ...
Digital competencies for developing and managing digital libraries
PurposeThe purpose of this study was to explore the essential digital competencies for developing and managing digital libraries. The study identified useful training programs for university librarians to acquire digital competencies. It examined their ...
Digital Preservation, Archival Science and Methodological Foundations for Digital Libraries
Digital libraries, whether commercial, public, or personal, lie at the heart of the information society. Yet, research into their long-term viability and the meaningful accessibility of their contents remains in its infancy. In general, as we have ...
Comments