research-article

Authority-based keyword search in databases

Authors:
Vagelis Hristidis

Florida International University, Miami, FL

Florida International University, Miami, FL
View Profile

,
Heasoo Hwang

University of California, San Diego, La Jolla, CA

University of California, San Diego, La Jolla, CA
View Profile

,
Yannis Papakonstantinou

University of California, San Diego, La Jolla, CA

University of California, San Diego, La Jolla, CA
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 33 Issue 1Article No.: 1pp 1–40https://doi.org/10.1145/1331904.1331905

Published:21 March 2008Publication History

ACM Transactions on Database Systems

Abstract

Our system applies authority-based ranking to keyword search in databases modeled as labeled graphs. Three ranking factors are used: the relevance to the query, the specificity and the importance of the result. All factors are handled using authority-flow techniques that exploit the link-structure of the data graph, in contrast to traditional Information Retrieval. We address the performance challenges in computing the authority flows in databases by using precomputation and exploiting the database schema if present. We conducted user surveys and performance experiments on multiple real and synthetic datasets, to assess the semantic meaningfulness and performance of our system.

References

Abiteboul, S., Suciu, D., and Buneman, P. 2000. Data on the Web: From Relations to Semistructured Data and Xml. Morgan Kaufmann Series in Data Management Systems. Google ScholarDigital Library
Agrawal, S., Chaudhuri, S., and Das, G. 2002. DBXplorer: A system for keyword-based search over relational databases. In Proceedings of the International Conference on Data Engineering (ICDE). Google ScholarDigital Library
Aizawa, A. 2000. The feature quantity: an information theoretic perspective of tfidf-like measures. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Google ScholarDigital Library
Balmin, A., Hristidis, V., and Papakonstantinou, Y. 2004. ObjectRank: Authority-based keyword search in databases. In Proceedings of the International Conference on Very Large Database (VLDB). Google ScholarDigital Library
Bhalotia, G., Nakhey, C., Hulgeri, A., Chakrabarti, S., and Sudarshan, S. 2002. Keyword searching and browsing in databases using BANKS. In Proceedings of the International Conference on Data Engineering (ICDE). Google ScholarDigital Library
Bharat, K. and Henzinger, M. R. 1998. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Google ScholarDigital Library
Brin, S. and Page, L. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the International/World Wide Web Conference (WWW). Google ScholarDigital Library
Carmel, D., Cohen, D., Fagin, R., Farchi, E., Herscovici, M., Maarek, Y. S., and Soffer, A. 2001. Static index pruning for information retrieval systems. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Google ScholarDigital Library
Chakrabarti, S., Dom, B., Gibson, D., Kleinberg, J., Raghavan, P., and Rajagopalan, S. 1998. Automatic resource compilation by analyzing hyperlink structure and associated text. In Proceedings of the International/World Wide Web Conference (WWW). Google ScholarDigital Library
Chen, Y., Gan, Q., and Suel, T. 2002. I/O-efficient techniques for computing PageRank. In Proceedings of the International Conference on Information and Knowledge Management (CIKM). Google ScholarDigital Library
Cormen, T., Leiserson, C., and Rivest, R. 1989. Introduction to Algorithms. MIT Press. Google ScholarDigital Library
Craswell, N., Robertson, S. E., Zaragoza, H., and Taylor, M. J. 2005. Relevance weighting for query independent evidence. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR). Google ScholarDigital Library
Croft, W. B. 2000. Combining approaches to information retrieval. In Advances in Information Retrieval: Recent Research from the CIIR, Chapter 1, Kluwer.Google Scholar
Dar, S., Entin, G., Geva, S., and Palmon, E. 1998. DTL's DataSpot: Database exploration using plain language. In Proceedings of the International Conference on Very Large Database (VLDB). Google ScholarDigital Library
Doyle, P. G. and Snell, J. L. 1984. Random Walks and Electric Networks. Mathematical Association of America, Washington, DC.Google Scholar
Fagin, R., Kumar, R., and Sivakumar, D. 2003. Comparing top k lists. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA). Google ScholarDigital Library
Fagin, R., Lotem, A., and Naor, M. 2001. Optimal aggregation algorithms for middleware. In Proceedings of the ACM SIGACT - SIGMOD - SIGART Simposium on Principles of Database Systems (PODS). Google ScholarDigital Library
Faloutsos, C., McCurley, K. S., and Tomkins, A. 2004. Fast discovery of connection subgraphs. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'04). Google ScholarDigital Library
Geerts, F., Mannila, H., and Terzi, E. 2004. Relational link-based ranking. In Proceedings of the International Conference on Very Large Database (VLDB). Google ScholarDigital Library
Golub, G. H. and Loan, C. F. 1996. Matrix Computations. Johns Hopkins.Google Scholar
Gu, X., Nahrstedt, K., Yuan, W., Wichadakul, D., and Xu, D. 2002. An XML-based quality of service enabling language for the web. J. Visual Langu. Comput. 13, 1, 61--95.Google ScholarDigital Library
Guo, L., Shao, F., Botev, C., and Shanmugasundaram, J. 2003. XRANK: Ranked keyword search over XML documents. In Proceedings of the International Conference on Management of Data (SIGMOD). Google ScholarDigital Library
Gyongyi, Z., Garcia-Molina, H., and Pedersen, J. 2004. Combating Web spam with TrustRank. In Proceedings of the International Conference on Very Large Databases (VLDB). Google ScholarDigital Library
Haveliwala, T. 1999. Efficient computation of PageRank. Tech. rep. Stanford University (http://www.stanford.edu/~taherh/papers/efficient-pr.pdf).Google Scholar
Haveliwala, T. 2002. Topic-sensitive PageRank. In Proceedings of the International/World Wide Web Conference (WWW). Google ScholarDigital Library
Hristidis, V., Gravano, L., and Papakonstantinou, Y. 2003. Efficient IR-style keyword search over relational databases. In Proceedings of the International Conference on Very Large Databases (VLDB). Google ScholarDigital Library
Hristidis, V. and Papakonstantinou, Y. 2002. DISCOVER: Keyword search in relational databases. In Proceedings of the International Conference on Very Large Databases (VLDB). Google ScholarDigital Library
Hristidis, V., Papakonstantinou, Y., and Balmin, A. 2003. Keyword proximity search on XML graphs. In Proceedings of the International Conference on Data Engineering (ICDE).Google Scholar
Huang, A., Xue, Q., and Yang, J. 2003. TupleRank and implicit relationship discovery in relational databases. In Proceedings of the International Conference on Web-Age Information Management (WAIM).Google Scholar
Hwang, H., Hristidis, V., and Papakonstantinou, Y. 2006. ObjectRank: A system for authority-based search on databases. Demonstration at SIGMOD. Google ScholarDigital Library
Jeh, G. and Widom, J. 2003. Scaling personalized Web search. In Proceedings of the International/World Wide Web Conference (WWW). Google ScholarDigital Library
Kamvar, S., Haveliwala, T., Manning, C., and Golub, G. 2003. Extrapolation methods for accelerating PageRank computations. In Proceedings of the Internatinal/World Wide Web Conference (WWW). Google ScholarDigital Library
Kleinberg, J. M. 1999. Authoritative sources in a hyperlinked environment. J. ACM 46. Google ScholarDigital Library
Motwani, R. and Raghavan, P. 1995. Randomized Algorithms. Cambridge University Press, Cambridge, UK. Google ScholarDigital Library
Ramakrishnan, R. and Gehrke, J. 2003. Database Management Systems. 3rd Ed. McGraw-Hill. Google ScholarDigital Library
Raschid, L., Wu, Y., Lee, W.-J., Vidal, M. E., Tsaparas, P., Srinivasan, P., and Sehgal, A. K. 2006. Ranking target objects of navigational queries. In Proceedings of the 8th ACM International Workshop on Web Information and Data Management (WIDM06). Google ScholarDigital Library
Richardson, M. and Domingos, P. 2002. The intelligent surfer: Probabilistic combination of link and content information in PageRank. Advances in Neural Information Processing Systems 14, MIT Press.Google Scholar
Salton, G. 1989. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison Wesley. Google ScholarDigital Library
Savoy, J. 1992. Bayesian inference networks and spreading activation in hypertext systems. Inform. Proc. Manag. 28, 3, 389--406. Google ScholarDigital Library
Shafer, P., Isganitis, T., and Yona, G. 2006. Hubs of knowledge: Using the functional link structure in Biozon to mine for biologically significant entities. BMC Bioinformatics. 15, 7, 71.Google Scholar
Singhal, A. 2001. Modern information retrieval: A brief overview. IEEE Data Engin. Bull., Special Issue on Text and Databases 24, 4.Google Scholar
Tong, H. and Faloutsos, C. 2006. Center-piece subgraphs: problem definition and fast solutions. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'06). Google ScholarDigital Library

Index Terms

Authority-based keyword search in databases
1. Information systems
  1. Information retrieval
  2. Information systems applications

Recommendations

Re-ranking search results using query logs
CIKM '06: Proceedings of the 15th ACM international conference on Information and knowledge management

This work addresses two common problems in search, frequently occurring with underspecified user queries: the top-ranked results for such queries may not contain documents relevant to the user's search intent, and fresh and relevant pages may not get ...
Read More
Keyword Search in Databases
Read More
Content and link-structure perspective of ranking webpages: A review
Abstract
The delivery of ranked relevant results is probably the most important factor in making a web search engine acceptable to its users. This inspiration has led the search engine engineers and researchers to conceive ranking algorithms ...
Read More

Reviews

Reviewer: Donald Harris Kraft

Hristidis et al. extend the notion of ranking retrieved textual items from a database via Google's PageRank by giving authority to the "citing" papers and to the "citing" authors. The basic data structure is a labeled directed graph. A demonstration Web site is available. The system allows a user to input a set of keywords as a query, with optional Boolean operators. The user can also input a global object-rank importance parameter, a damping factor, and a specificity metric (inverse object rank). The database is an available set of documents that can be accessed on the Web. Hristidis et al. note that a query keyword provides a set of documents with that keyword, and then the database graph can be traversed to provide a keyword-specific ranking. Moreover, one can calibrate the specificity metric (inverse object rank) and quality metric (global object rank). Finally, the paper offers an ontology graph, based on domain knowledge, to expand the search?only the inheritance of attributes, the "isa" relationship, is considered. The paper provides an interesting approach that merits further consideration and testing. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 33, Issue 1
March 2008
211 pages
ISSN:0362-5915
EISSN:1557-4644
DOI:10.1145/1331904
Issue’s Table of Contents

Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 21 March 2008
- Accepted: 1 June 2007
- Received: 1 March 2007
Published in tods Volume 33, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Authority flow
PageRank
quality experiments
ranking
specificity
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 56
  Total Citations
  View Citations
- 1,341
  Total Downloads
- Downloads (Last 12 months)6
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Authority-based keyword search in databases

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Re-ranking search results using query logs

Keyword Search in Databases

Content and link-structure perspective of ranking webpages: A review

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Authority-based keyword search in databases

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

Re-ranking search results using query logs

Keyword Search in Databases

Content and link-structure perspective of ranking webpages: A review

Reviews

Access critical reviews of Computing literature here

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media