Skip to main content
Log in

Scalable keyword search on large data streams

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

It is widely recognized that the integration of information retrieval (IR) and database (DB) techniques provides users with a broad range of high quality services. Along this direction, IR-styled m-keyword query processing over a relational database in an rdbms framework has been well studied. It finds all hidden interconnected tuple structures, for example connected trees that contain keywords and are interconnected by sequences of primary/foreign key relationships among tuples. A new challenging issue is how to monitor events that are implicitly interrelated over an open-ended relational data stream for a user-given m-keyword query. Such a relational data stream is a sequence of tuple insertion/deletion operations. The difficulty of the problem is related to the number of costly joins to be processed over time when tuples are inserted and/or deleted. Such cost is mainly affected by three parameters, namely, the number of keywords, the maximum size of interconnected tuple structures, and the complexity of the database schema when it is viewed as a schema graph. In this paper, we propose new approaches. First, we propose a novel algorithm to efficiently determine all the joins that need to be processed for answering an m-keyword query. Second, we propose a new demand-driven approach to process such a query over a high speed relational data stream. We show that we can achieve high efficiency by significantly reducing the number of intermediate results when processing joins over a relational data stream. The proposed new techniques allow us to achieve high scalability in terms of both query plan generation and query plan execution. We conducted extensive experimental studies using synthetic data and real data to simulate a relational data stream. Our approach significantly outperforms existing algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: A system for keyword-based search over relational databases. In Proceedings of ICDE’02 (2002)

  2. Ayad, A., Naughton, J.F.: Static optimization of conjunctive queries with sliding windows over infinite streams. In Proceedings of SIGMOD’04, pp. 419–430 (2004)

  3. Ayad, A., Naughton, J.F., Wright, S., Srivastava, U.: Approximating streamingwindow joins under cpu limitations. In Proceedings of ICDE’06, pp. 142 (2006)

  4. Balmin, A., Hristidis, V., Papakonstantinou, Y.: ObjectRank: authority-based keyword search in databases. In Proceedings of VLDB’04 (2004)

  5. Bernstein P.A., Chiu D.-M.W.: Using semi-joins to solve relational queries. J. ACM 28(1), 25–40 (1981)

    Article  MATH  MathSciNet  Google Scholar 

  6. Bhalotia, G., Hulgeri, A., Nakhe, C., Chakrabarti, S., Sudarshan, S.: Keyword searching and browsing in databases using BANKS. In Proceedings of ICDE’02 (2002)

  7. Dalvi B.B., Kshirsagar M., Sudarshan S.: Keyword search on external memory data graphs. PVLDB 1(1), 1189–1204 (2008)

    Google Scholar 

  8. Das, A., Gehrke, J., Riedewald, M.: Approximate join processing over data streams. In Proceedings of SIGMOD’03, pp. 40–51 (2003)

  9. Ding, B., Yu, J.X., Wang, S., Qin, L., Zhang, X.,Lin, X.: Finding top-k min-cost connected trees in databases. In Proceedings of ICDE’07 (2007)

  10. Dreyfus S.E., Wagner R.A.: The steiner problem in graphs. Networks 1, 195–207 (1972)

    Article  MATH  MathSciNet  Google Scholar 

  11. Fabret, F., Jacobsen, H.-A., Llirbat, F., Pereira, J., Ross, K.A., Shasha, D.: Filtering algorithms and implementation for very fast publish/subscribe. In Proceedings of SIGMOD’01 (2001)

  12. Golenberg, K., Kimelfeld, B., Sagiv, Y.: Keyword proximity search in complex data graphs. In Proceedings of SIGMOD’08 (2008)

  13. He, H., Wang, H., Yang, J., Yu, P.S.: BLINKS: ranked keyword searches on graphs. In Proceedings of SIGMOD’07 (2007)

  14. Hristidis, V., Gravano, L., Papakonstantinou, Y.: Efficient IR-Style keyword search over relational databases. In Proceedings of VLDB’03 (2003)

  15. Hristidis, V., Hwang, H., Papakonstantinou, Y.: Authority-based keyword search in databases. ACM Trans. Database Syst. 33(1) (2008)

  16. Hristidis, V., Papakonstantinou, Y.: DISCOVER: keyword search in relational databases. In Proceedings of VLDB’02 (2002)

  17. Hristidis, V., Valdivia, O., Vlachos, M., Yu, P.S.: Continuous keyword search on multiple text streams. In Proceedings of CIKM’06 (2006)

  18. Kacholia, V., Pandit, S., Chakrabarti, S., Sudarshan, S., Desai, R., Karambelkar, H.: Bidirectional expansion for keyword search on graph databases. In Proceedings of VLDB’05 (2005)

  19. Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In Proceedings of ICDE’03, pp. 341–352 (2003)

  20. Kimelfeld, B., Sagiv, Y.: Finding and approximating top-k answers in keyword proximity search. In Proceedings of PODS’06 (2006)

  21. Krämer, J., Seeger, B.: Pipes—a public infrastructure for processing and exploring streams. In Proceedings of SIGMOD’04 (2004)

  22. Li, G., Ooi, B.C., Feng, J., Wang, J., Zhou, L.: EASE: efficient and adaptive keyword search on unstructured, semi-structured and structured data. In Proceedings of SIGMOD’08 (2008)

  23. Li, L., Wang, H., Li, J., Gao, H.: Efficient algorithms for skyline top-k keyword queries on xml streams. In Proceedings of DASFAA’09, pp. 283–287 (2009)

  24. Liu, F., Yu, C.T., Meng, W., Chowdhury, A.: Effective keyword search in relational databases. In Proceedings of SIGMOD’06 (2006)

  25. Luo, Y., Lin,X., Wang, W., Zhou, X.: Spark: top-k keyword query in relational databases. In Proceedings of SIGMOD’07 (2007)

  26. Markowetz, A., Yang, Y., Papadias, D.: Keyword search on relational data streams. In Proceedings of SIGMOD’07 (2007)

  27. Qin, L., Yu, J.X., Chang, L., Tao, Y.: Querying communities in relational databases. In Proceedings of ICDE’09 (2009)

  28. Qin, L., Yu, J.X., Chang, L., Tao, Y.: Scalable keyword search on large data streams. In Proceedings of ICDE’09, pp. 1199–1202 (2009)

  29. Srivastava, U., Widom, J.: Memory-limited execution of windowed stream joins. In Proceedings of VLDB’04, pp. 324–335 (2004)

  30. Yan T.W., Garcia-Molina H.: The sift information dissemination system. ACM Trans. Database Syst. 24(4), 324–335 (1999)

    Article  Google Scholar 

  31. Yang, W., Shi,B.: Schema-aware keyword search over xml streams. In Proceedings of CIT’07, pp. 29–34 (2007)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jeffrey Xu Yu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qin, L., Yu, J.X. & Chang, L. Scalable keyword search on large data streams. The VLDB Journal 20, 35–57 (2011). https://doi.org/10.1007/s00778-010-0190-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-010-0190-x

Keywords

Navigation