Skip to main content
Log in

DB-IR integration using tight-coupling in the Odysseus DBMS

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

As many recent applications require integration of structured data and text data, unifying database (DB) and information retrieval (IR) technologies has become one of major challenges in our field. There have been active discussions on the system architecture for DB-IR integration, but a clear agreement has not been reached yet. Along this direction, we have advocated the use of the tight-coupling architecture and developed a novel structure of the IR index as well as tightly-coupled query processing algorithms. In tight-coupling, the text data type is supported from the storage system just like a built-in data type so that the query processor can efficiently handle queries involving both structured data and text data. In this paper, for archival purposes, we consolidate our achievements reported at non-regular publications over the last ten years or so, extending them by adding greater details on the IR index and the query processing algorithms. All the features in this paper are fully implemented in the Odysseus DBMS that has been under development at KAIST for over 23 years. We show that Odysseus significantly outperforms two open-source DBMSs and one open-source search engine (with some exceptional cases) in processing DB-IR integration queries. These results indeed demonstrate superiority of the tight-coupling architecture for DB-IR integration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Abiteboul, S., et al.: The Lowell database research self-assessment. Commun. ACM 48(5), 111–118 (2005)

    Article  Google Scholar 

  2. Agrawal, S., Chaudhuri, S., Das, G.: DBXplorer: a system for keyword-based search over relational databases. In: ICDE, pp. 5–16 (2002)

  3. Agrawal, R., et al.: The Claremont report on database research. ACM SIGMOD Rec. 37(3), 9–19 (2008)

    Article  Google Scholar 

  4. Apache Lucene: http://lucene.apache.org/ (2013). Accessed 22 Nov 2013

  5. Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press/Addison-Wesley (1999)

  6. Baeza-Yates, R.A., Consens, M.P.: The continued saga of DB-IR integration. In: VLDB (2004) (a tutorial)

  7. Banerjee, S., Krishnamurthy, V., Murthy, R.: All your data: the oracle extensibility architecture. Oracle White Paper. Oracle Corp. (1999)

  8. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: IJCAI, pp. 2670–2676 (2007)

  9. Bast, H., Weber, I.: The completeSearch engine: interactive, efficient, and towards IR & DB integration. In: CIDR, pp. 88–95 (2007)

  10. Bast, H., Chitea, A., Suchanek, F.M., Weber, I.: ESTER: efficient search on text, entities, and relations. In: SIGIR, pp. 671–678 (2007)

  11. Biliris, A.: The performance three database storage structures for managing large objects. In: SIGMOD, pp. 276–285 (1992)

  12. Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. In: WWW, pp. 107–117 (1998)

  13. Chaudhuri, S., Ramakrishnan, R., Weikum, G.: Integrating DB and IR technologies: what is the sound of one hand clapping. In: CIDR, pp. 1–12 (2005)

  14. Chen, W., Chow, J., Fuh, Y., Grandbois, J., Jou, M., Mattos, N.M., Tran, B.T., Wang, Y.: High level indexing of user-defined types. In: VLDB, pp. 554–564 (1999)

  15. Cheng, T., Chang, K.C.-C.: Beyond pages: supporting efficient, scalable entity search with dual-inversion index. In: EDBT, pp. 15–26 (2010)

  16. Cornacchia, R., Heman, S., Zukowski, M., de Vries, A.P., Boncz, P.A.: Flexible and efficient IR using array databases. VLDB J. 17(1), 151–168 (2008)

    Article  Google Scholar 

  17. DeRose, P., Shen, W., Chen, F., Doan, A., Ramakrishnan, R.: Building structured web community portals: a top-down, compositional, and incremental approach. In: VLDB, pp. 399–410 (2007)

  18. DeFazio, S., Daoud, A.M., Smith, L.A., Srinivasan, J., Croft, W.B., Callan, J.P.: Integrating IR and RDBMS using cooperative indexing. In: SIGIR, pp. 84–92 (1995)

  19. Ewald, G., Hans-Jurgen, S.: PostgreSQL developer’s handbook. Sams Publishing (2001)

  20. Full-Text Search in PostgreSQL: http://www.postgresql.org/docs/8.3/static/textsearch.html (2013). Accessed 22 Nov 2013

  21. Fuh, Y., Deßloch, S., Chen, W., Mattos, N., Tran, B., Lindsay, B., DeMichel, L., Rielau, S., Mannhaupt, D.: Implementation of SQL3 structured types with inheritance and value substitutability. In: VLDB, pp. 565–574 (1999)

  22. Guo, L., Shao, F., Botev, C., Shanmugasundaram, J.: XRANK: ranked keyword search over XML documents. In: SIGMOD, pp. 16–27 (2003)

  23. Halverson, A., Burger, J., Galanis, L., Kini, A., Krishnamurthy, R., Rao, A.N., Tian, F., Viglas S., Wang, Y., Naughton, J.F., DeWitt, D.J.: Mixed mode XML query processing. In: VLDB, pp. 225–236 (2003)

  24. Heman, S., Zukowski, M., de Vries, A.P., Boncz, P.A.: Efficient and flexible information retrieval using MonetDB/X100. In: CIDR, pp. 96–101 (2007)

  25. Hristidis, V., Papakonstantinou, Y.: DISCOVER: keyword search in relational databases. In: VLDB, pp. 670–681 (2002)

  26. IBM: DB2 UDB Text Extender Administration and Programming Version 8 (2003)

  27. Lentz, A.: MySQL Storage Engine Architecture. MySQL Developer Articles. MySQL AB (2004) (available from http://dev.mysql.com/tech-resources/articles). Accessed 22 Nov 2013

  28. Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)

  29. McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action, 2nd edn. Manning Publications (2010)

  30. Oracle: Oracle Data Cartridge Developer’s Guide 11g Release 1 (2008)

  31. Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW, pp. 697–706 (2007)

  32. Theobald, M., et al.: TopX: Efficient and versatile top-k query processing for semistructured data. VLDB J. 17(1), 81–115 (2008)

    Article  Google Scholar 

  33. Tsearch2—Full Text Extension for PostgreSQL: http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 (2013). Accessed 22 Nov 2013

  34. Weikum, G.: DB&IR: both sides now. In: SIGMOD, pp. 25–30 (2007)

  35. Whang, K., Krishnamurthy, R.: The multilevel grid file—a dynamic hierarchical multidimensional file structure. In: DASFAA, pp. 449–459 (1991)

  36. Whang, K., Park, B., Han, W., Lee, Y.: An inverted index storage structure using subindexes and large objects for tight coupling of information retrieval with database management systems. U.S. Patent No. 6,349,308 (2002) (Appl. No. 09/250,487 (1999))

  37. Whang, K.: Tight-coupling: A way of building high-performance application specific engines. DASFAA (2003) (presented at the panel session, available on-line from http://www.dasfaa.org/dasfaa2003/file/Prof_Kyu-Young_Whang_5.pdf). Accessed 22 Nov 2013

  38. Whang, K., Lee, M., Lee, J., Kim, M., Han, W.: Odysseus: a high-performance ORDBMS tightly-coupled with IR features. In: ICDE, pp. 1104–1105 (2005) (this paper received the Best Demonstration Award)

  39. Whang, K.: A new DBMS architecture for DB-IR integration. In: APWeb/WAIM, pp. 4–5 (2007) (a keynote presentation)

  40. Whang, K.: DB-IR integration and its application to a massively-parallel search engine. In: CIKM, pp. 1–2 (2009) (a keynote presentation)

  41. Whang, K., Lee, J., Kim, M., Lee, M., Lee, K., Han, W., Kim, J.: Tightly-coupled spatial database features in the Odysseus/OpenGIS DBMS for high-performance. GeoInformatica 14(4), 425–446 (2010)

    Article  Google Scholar 

  42. Whang, K., Yun, T., Yeo, Y., Song, I., Kwon, H., and Kim, I.: ODYS: an approach to building a massively-parallel search engine using a DB-IR tightly-integrated parallel DBMS for higher-level functionality. In: SIGMOD, pp. 313–324 (2013)

  43. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishers (1999)

  44. Zobel, J., Moffat, A.: Inverted files for text search engines. ACM Comput. Surv. 38(2), 1–56 (2006)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kyu-Young Whang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Whang, KY., Lee, JG., Lee, MJ. et al. DB-IR integration using tight-coupling in the Odysseus DBMS. World Wide Web 18, 491–520 (2015). https://doi.org/10.1007/s11280-013-0264-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-013-0264-y

Keywords

Navigation