skip to main content
research-article

Authority-based keyword search in databases

Published:21 March 2008Publication History
Skip Abstract Section

Abstract

Our system applies authority-based ranking to keyword search in databases modeled as labeled graphs. Three ranking factors are used: the relevance to the query, the specificity and the importance of the result. All factors are handled using authority-flow techniques that exploit the link-structure of the data graph, in contrast to traditional Information Retrieval. We address the performance challenges in computing the authority flows in databases by using precomputation and exploiting the database schema if present. We conducted user surveys and performance experiments on multiple real and synthetic datasets, to assess the semantic meaningfulness and performance of our system.

References

  1. Abiteboul, S., Suciu, D., and Buneman, P. 2000. Data on the Web: From Relations to Semistructured Data and Xml. Morgan Kaufmann Series in Data Management Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agrawal, S., Chaudhuri, S., and Das, G. 2002. DBXplorer: A system for keyword-based search over relational databases. In Proceedings of the International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aizawa, A. 2000. The feature quantity: an information theoretic perspective of tfidf-like measures. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Balmin, A., Hristidis, V., and Papakonstantinou, Y. 2004. ObjectRank: Authority-based keyword search in databases. In Proceedings of the International Conference on Very Large Database (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bhalotia, G., Nakhey, C., Hulgeri, A., Chakrabarti, S., and Sudarshan, S. 2002. Keyword searching and browsing in databases using BANKS. In Proceedings of the International Conference on Data Engineering (ICDE). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bharat, K. and Henzinger, M. R. 1998. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the International/World Wide Web Conference (WWW). Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y. S., and Soffer, A. 2001. Static index pruning for information retrieval systems. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P., and Rajagopalan, S. 1998. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of the International/World Wide Web Conference (WWW). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Chen, Y., Gan, Q., and Suel, T. 2002. I/O-efficient techniques for computing PageRank. In Proceedings of the International Conference on Information and Knowledge Management (CIKM). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Cormen, T., Leiserson, C., and Rivest, R. 1989. Introduction to Algorithms. MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Craswell, N., Robertson, S. E., Zaragoza, H., and Taylor, M. J. 2005. Relevance weighting for query independent evidence. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Croft, W. B. 2000. Combining approaches to information retrieval. In Advances in Information Retrieval: Recent Research from the CIIR, Chapter 1, Kluwer.Google ScholarGoogle Scholar
  14. Dar, S., Entin, G., Geva, S., and Palmon, E. 1998. DTL's DataSpot: Database exploration using plain language. In Proceedings of the International Conference on Very Large Database (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Doyle, P. G. and Snell, J. L. 1984. Random Walks and Electric Networks. Mathematical Association of America, Washington, DC.Google ScholarGoogle Scholar
  16. Fagin, R., Kumar, R., and Sivakumar, D. 2003. Comparing top k lists. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Fagin, R., Lotem, A., and Naor, M. 2001. Optimal aggregation algorithms for middleware. In Proceedings of the ACM SIGACT - SIGMOD - SIGART Simposium on Principles of Database Systems (PODS). Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Faloutsos, C., McCurley, K. S., and Tomkins, A. 2004. Fast discovery of connection subgraphs. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Geerts, F., Mannila, H., and Terzi, E. 2004. Relational link-based ranking. In Proceedings of the International Conference on Very Large Database (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Golub, G. H. and Loan, C. F. 1996. Matrix Computations. Johns Hopkins.Google ScholarGoogle Scholar
  21. Gu, X., Nahrstedt, K., Yuan, W., Wichadakul, D., and Xu, D. 2002. An XML-based quality of service enabling language for the web. J. Visual Langu. Comput. 13, 1, 61--95.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Guo, L., Shao, F., Botev, C., and Shanmugasundaram, J. 2003. XRANK: Ranked keyword search over XML documents. In Proceedings of the International Conference on Management of Data (SIGMOD). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gyongyi, Z., Garcia-Molina, H., and Pedersen, J. 2004. Combating Web spam with TrustRank. In Proceedings of the International Conference on Very Large Databases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Haveliwala, T. 1999. Efficient computation of PageRank. Tech. rep. Stanford University (http://www.stanford.edu/~taherh/papers/efficient-pr.pdf).Google ScholarGoogle Scholar
  25. Haveliwala, T. 2002. Topic-sensitive PageRank. In Proceedings of the International/World Wide Web Conference (WWW). Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Hristidis, V., Gravano, L., and Papakonstantinou, Y. 2003. Efficient IR-style keyword search over relational databases. In Proceedings of the International Conference on Very Large Databases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hristidis, V. and Papakonstantinou, Y. 2002. DISCOVER: Keyword search in relational databases. In Proceedings of the International Conference on Very Large Databases (VLDB). Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Hristidis, V., Papakonstantinou, Y., and Balmin, A. 2003. Keyword proximity search on XML graphs. In Proceedings of the International Conference on Data Engineering (ICDE).Google ScholarGoogle Scholar
  29. Huang, A., Xue, Q., and Yang, J. 2003. TupleRank and implicit relationship discovery in relational databases. In Proceedings of the International Conference on Web-Age Information Management (WAIM).Google ScholarGoogle Scholar
  30. Hwang, H., Hristidis, V., and Papakonstantinou, Y. 2006. ObjectRank: A system for authority-based search on databases. Demonstration at SIGMOD. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jeh, G. and Widom, J. 2003. Scaling personalized Web search. In Proceedings of the International/World Wide Web Conference (WWW). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Kamvar, S., Haveliwala, T., Manning, C., and Golub, G. 2003. Extrapolation methods for accelerating PageRank computations. In Proceedings of the Internatinal/World Wide Web Conference (WWW). Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Motwani, R. and Raghavan, P. 1995. Randomized Algorithms. Cambridge University Press, Cambridge, UK. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Ramakrishnan, R. and Gehrke, J. 2003. Database Management Systems. 3rd Ed. McGraw-Hill. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Raschid, L., Wu, Y., Lee, W.-J., Vidal, M. E., Tsaparas, P., Srinivasan, P., and Sehgal, A. K. 2006. Ranking target objects of navigational queries. In Proceedings of the 8th ACM International Workshop on Web Information and Data Management (WIDM06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Richardson, M. and Domingos, P. 2002. The intelligent surfer: Probabilistic combination of link and content information in PageRank. Advances in Neural Information Processing Systems 14, MIT Press.Google ScholarGoogle Scholar
  38. Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison Wesley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Savoy, J. 1992. Bayesian inference networks and spreading activation in hypertext systems. Inform. Proc. Manag. 28, 3, 389--406. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Shafer, P., Isganitis, T., and Yona, G. 2006. Hubs of knowledge: Using the functional link structure in Biozon to mine for biologically significant entities. BMC Bioinformatics. 15, 7, 71.Google ScholarGoogle Scholar
  41. Singhal, A. 2001. Modern information retrieval: A brief overview. IEEE Data Engin. Bull., Special Issue on Text and Databases 24, 4.Google ScholarGoogle Scholar
  42. Tong, H. and Faloutsos, C. 2006. Center-piece subgraphs: problem definition and fast solutions. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Authority-based keyword search in databases

      Recommendations

      Reviews

      Donald Harris Kraft

      Hristidis et al. extend the notion of ranking retrieved textual items from a database via Google's PageRank by giving authority to the "citing" papers and to the "citing" authors. The basic data structure is a labeled directed graph. A demonstration Web site is available. The system allows a user to input a set of keywords as a query, with optional Boolean operators. The user can also input a global object-rank importance parameter, a damping factor, and a specificity metric (inverse object rank). The database is an available set of documents that can be accessed on the Web. Hristidis et al. note that a query keyword provides a set of documents with that keyword, and then the database graph can be traversed to provide a keyword-specific ranking. Moreover, one can calibrate the specificity metric (inverse object rank) and quality metric (global object rank). Finally, the paper offers an ontology graph, based on domain knowledge, to expand the search?only the inheritance of attributes, the "isa" relationship, is considered. The paper provides an interesting approach that merits further consideration and testing. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Database Systems
        ACM Transactions on Database Systems  Volume 33, Issue 1
        March 2008
        211 pages
        ISSN:0362-5915
        EISSN:1557-4644
        DOI:10.1145/1331904
        Issue’s Table of Contents

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 21 March 2008
        • Accepted: 1 June 2007
        • Received: 1 March 2007
        Published in tods Volume 33, Issue 1

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader