Skip to main content
Log in

Querying Semistructured Heterogeneous Information

  • Published:
Journal of Systems Integration

Abstract

Semistructured data has no absolute schema fixed in advance and its structure may be irregular or incomplete. Such data commonly arises in sources that do not impose a rigid structure (such as the World-Wide Web) and when data is combined from several heterogeneous sources. Data models and query languages designed for well structured data are inappropriate in such environments. Starting with a “lightweight” object model adopted for the TSIMMIS project at Stanford, in this paper we describe a query language and object repository designed specifically for semistructured data. Our language provides meaningful query results in cases where conventional models and languages do not: when some data is absent, when data does not have regular structure, when similar concepts are represented using different types, when heterogeneous sets are present, and when object structure is not fully known. This paper motivates the key concepts behind our approach, describes the language through a series of examples (a complete semantics is available in an accompanying technical report [23]), and describes the basic architecture and query processing strategy of the “lightweight” object repository we have developed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener, “The lorel query language for semistructured data.” Journal of Digital Libraries 1(1), to appear.

  2. F. Bancilhon, S. Cluet, and C. Delobel, “A query language for O2,” in F. Bancilhon, C. Delobel, and P. Kanellakis, (eds.), Building on Object-Oriented Database System—The Story of O2, Morgan Kauffmann, 1992, pp. 234–255.

  3. G. Blake, M. Consens, P. Kilpeläinen, P. Larson, T. Snider, and F. Tompa, “Text/relational database management systems: Harmonizing SQL and SGML,” in W. Litwin and T. Risch, (eds.), Applications of Databases: First International Conference, Vadstena, Sweden, 1994, pp. 267–280.

  4. M. Carey, D. DeWitt, and S. Vandenberg, “A data model and query language for Exodus,” in Proceedings of the ACMSIGMOD International Conference on Management of Data, Chicago, IL, June 1988, pp. 413–423.

  5. R. Cattel, ed., The Object Database Standard: ODMG-93. Morgan Kaufmann, 1994.

  6. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom, ”The TSIMMIS project: Integration of heterogeneous information sources,” in Proceedings of the 100th IPSJ, Tokyo, Japan, October 1994.

  7. V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl, “From structured documents to novel query facilities,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Minneapolis, MN, May 1994, pp. 313–324.

  8. P. Dadam, K. Kuespert, F. Andersen, H. Blanken, R. Erbe, J. Guenauer, V. Lum, P. Pistor, and G. Walch, “A DBMS prototype to support extended N F2 relations: An integrated view on flat tables and hierarchies,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1986, pp. 356–367.

  9. D. Fishman et al., “Overview of the Iris DBMS,” in W. Kim and F. H. Lochovsky, (eds.), Object-Oriented Concepts, Languages, and Applications, Addison-Wesley, 1989, pp. 219–250.

  10. M. Freedman, “WILLOW: Technical overview.” Available by anonymous ftp from ftp. cac. washington. edu as the file willow/Tech-Report.ps, September 1994.

  11. M. Genesereth and R. Fikes, “Knowledge interchange format reference manual (version 3.0).” Available at the URL http://logic.stanford.edu/sharing/papers/kif.ps, 1994.

  12. C. Harrison, “An adaptive query language for object-oriented databases: Automatic navigation through partially specified data structures.” Available by anonymous ftp from ftp.ccs.neu.edu as the file pub/people/lieber/adaptive-query-lang.ps, 1994.

  13. ISO 8879, Information processing—text and office systems—Standard Generalized Markup Language (SGML), 1986.

  14. M. Kifer, W. Kim, and Y. Sagiv, “Querying object-oriented databases,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1992, pp. 393–402.

  15. W. Kim, “On object oriented database technology,” UniSQL product literature, 1994.

  16. W. Litwin, L. Mark, and N. Roussopoulos, “Interoperability of multiple autonomous databases,” ACM Computing Surveys 22(3), pp. 267–293, 1990.

    Article  Google Scholar 

  17. J. Melton and A. R. Simon, Understanding the New SQL: A Complete Guide. Morgan Kaufmann: San Mateo, California, 1993.

    Google Scholar 

  18. Microsoft Corporation, OLE2 Programmer's Reference. Microsoft Press: Redmond, WA, 1994.

    Google Scholar 

  19. OMG ORBTF, Common Object Request Broker Architecture. Object Management Group: Framingham, MA, 1992.

    Google Scholar 

  20. Y. Papakonstantinou, H. Garcia-Molina, and J. Ullman, “MedMaker: A mediation system based on declarative specifications.” Available by anonymous ftp from db.standford.edu as the file pub/papakonstantiou/1995/medmaker.ps, 1995.

  21. Y. Papakonstantinou, H. Garcia-Molina, and J. Widom, “Object exchange across heterogeneous information sources,” in Proceedings of the Eleventh International Conference on Data Engineering, Taipei, Taiwan, March 1995, pp. 251–260.

  22. X. Qian, “Semantic interoperation via intelligent mediation,” in Proceedings of the Third International Workshop on Research Issues in Data Engineering: Interoperability in Multidatabase Systems, IEEE Computer Society Press, April 1993, pp. 228–231.

  23. D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom, “Querying semistructured heterogeneous information.” Available by anonymous ftp from db.stanford.edu as the file pub/quass/1994/querying-full.ps, 1994.

  24. A. Rafii, R. Ahmed, M. Ketabchi, P. DeSmedt, and W. Du, “Integration strategies in Pegasus object oriented multidatabase system,” in Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, Volume II, January 1992, pp. 323–334.

    Article  Google Scholar 

  25. R. Rao, B. Janssen, and A. Rajaraman, “GAIA technical overview,” Technical Report, Xerox Palo Alto Research Center, 1994.

  26. K. Shoens, A. Luniewski, P. Schwarz, J. Stamos, and J. Thomas, “The RUFUS system: Information organization for semi-structured data,” in Proceedings of the Nineteenth International Conference on Very Large Data Bases, Dublin, Ireland, August 1993, pp. 97–107.

  27. J. E. Stoy, Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory. The MIT Press: Cambridge, MA, 1977.

    Google Scholar 

  28. T. Yan and J. Annevelink, “Integrating a structured-text retrieval system with an object-oriented database system,” in Proceedings of the Twentieth International Conference on Very Large Data Bases, Santiago, Chile, September 1994.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Quass, D., Rajaraman, A., Ullman, J. et al. Querying Semistructured Heterogeneous Information. Journal of Systems Integration 7, 381–407 (1997). https://doi.org/10.1023/A:1008287522472

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008287522472

Navigation