Abstract
Semistructured data has no absolute schema fixed in advance and its structure may be irregular or incomplete. Such data commonly arises in sources that do not impose a rigid structure (such as the World-Wide Web) and when data is combined from several heterogeneous sources. Data models and query languages designed for well structured data are inappropriate in such environments. Starting with a “lightweight” object model adopted for the TSIMMIS project at Stanford, in this paper we describe a query language and object repository designed specifically for semistructured data. Our language provides meaningful query results in cases where conventional models and languages do not: when some data is absent, when data does not have regular structure, when similar concepts are represented using different types, when heterogeneous sets are present, and when object structure is not fully known. This paper motivates the key concepts behind our approach, describes the language through a series of examples (a complete semantics is available in an accompanying technical report [23]), and describes the basic architecture and query processing strategy of the “lightweight” object repository we have developed.
Similar content being viewed by others
References
S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. Wiener, “The lorel query language for semistructured data.” Journal of Digital Libraries 1(1), to appear.
F. Bancilhon, S. Cluet, and C. Delobel, “A query language for O2,” in F. Bancilhon, C. Delobel, and P. Kanellakis, (eds.), Building on Object-Oriented Database System—The Story of O2, Morgan Kauffmann, 1992, pp. 234–255.
G. Blake, M. Consens, P. Kilpeläinen, P. Larson, T. Snider, and F. Tompa, “Text/relational database management systems: Harmonizing SQL and SGML,” in W. Litwin and T. Risch, (eds.), Applications of Databases: First International Conference, Vadstena, Sweden, 1994, pp. 267–280.
M. Carey, D. DeWitt, and S. Vandenberg, “A data model and query language for Exodus,” in Proceedings of the ACMSIGMOD International Conference on Management of Data, Chicago, IL, June 1988, pp. 413–423.
R. Cattel, ed., The Object Database Standard: ODMG-93. Morgan Kaufmann, 1994.
S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom, ”The TSIMMIS project: Integration of heterogeneous information sources,” in Proceedings of the 100th IPSJ, Tokyo, Japan, October 1994.
V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl, “From structured documents to novel query facilities,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, Minneapolis, MN, May 1994, pp. 313–324.
P. Dadam, K. Kuespert, F. Andersen, H. Blanken, R. Erbe, J. Guenauer, V. Lum, P. Pistor, and G. Walch, “A DBMS prototype to support extended N F2 relations: An integrated view on flat tables and hierarchies,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1986, pp. 356–367.
D. Fishman et al., “Overview of the Iris DBMS,” in W. Kim and F. H. Lochovsky, (eds.), Object-Oriented Concepts, Languages, and Applications, Addison-Wesley, 1989, pp. 219–250.
M. Freedman, “WILLOW: Technical overview.” Available by anonymous ftp from ftp. cac. washington. edu as the file willow/Tech-Report.ps, September 1994.
M. Genesereth and R. Fikes, “Knowledge interchange format reference manual (version 3.0).” Available at the URL http://logic.stanford.edu/sharing/papers/kif.ps, 1994.
C. Harrison, “An adaptive query language for object-oriented databases: Automatic navigation through partially specified data structures.” Available by anonymous ftp from ftp.ccs.neu.edu as the file pub/people/lieber/adaptive-query-lang.ps, 1994.
ISO 8879, Information processing—text and office systems—Standard Generalized Markup Language (SGML), 1986.
M. Kifer, W. Kim, and Y. Sagiv, “Querying object-oriented databases,” in Proceedings of the ACM SIGMOD International Conference on Management of Data, 1992, pp. 393–402.
W. Kim, “On object oriented database technology,” UniSQL product literature, 1994.
W. Litwin, L. Mark, and N. Roussopoulos, “Interoperability of multiple autonomous databases,” ACM Computing Surveys 22(3), pp. 267–293, 1990.
J. Melton and A. R. Simon, Understanding the New SQL: A Complete Guide. Morgan Kaufmann: San Mateo, California, 1993.
Microsoft Corporation, OLE2 Programmer's Reference. Microsoft Press: Redmond, WA, 1994.
OMG ORBTF, Common Object Request Broker Architecture. Object Management Group: Framingham, MA, 1992.
Y. Papakonstantinou, H. Garcia-Molina, and J. Ullman, “MedMaker: A mediation system based on declarative specifications.” Available by anonymous ftp from db.standford.edu as the file pub/papakonstantiou/1995/medmaker.ps, 1995.
Y. Papakonstantinou, H. Garcia-Molina, and J. Widom, “Object exchange across heterogeneous information sources,” in Proceedings of the Eleventh International Conference on Data Engineering, Taipei, Taiwan, March 1995, pp. 251–260.
X. Qian, “Semantic interoperation via intelligent mediation,” in Proceedings of the Third International Workshop on Research Issues in Data Engineering: Interoperability in Multidatabase Systems, IEEE Computer Society Press, April 1993, pp. 228–231.
D. Quass, A. Rajaraman, Y. Sagiv, J. Ullman, and J. Widom, “Querying semistructured heterogeneous information.” Available by anonymous ftp from db.stanford.edu as the file pub/quass/1994/querying-full.ps, 1994.
A. Rafii, R. Ahmed, M. Ketabchi, P. DeSmedt, and W. Du, “Integration strategies in Pegasus object oriented multidatabase system,” in Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, Volume II, January 1992, pp. 323–334.
R. Rao, B. Janssen, and A. Rajaraman, “GAIA technical overview,” Technical Report, Xerox Palo Alto Research Center, 1994.
K. Shoens, A. Luniewski, P. Schwarz, J. Stamos, and J. Thomas, “The RUFUS system: Information organization for semi-structured data,” in Proceedings of the Nineteenth International Conference on Very Large Data Bases, Dublin, Ireland, August 1993, pp. 97–107.
J. E. Stoy, Denotational Semantics: The Scott-Strachey Approach to Programming Language Theory. The MIT Press: Cambridge, MA, 1977.
T. Yan and J. Annevelink, “Integrating a structured-text retrieval system with an object-oriented database system,” in Proceedings of the Twentieth International Conference on Very Large Data Bases, Santiago, Chile, September 1994.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Quass, D., Rajaraman, A., Ullman, J. et al. Querying Semistructured Heterogeneous Information. Journal of Systems Integration 7, 381–407 (1997). https://doi.org/10.1023/A:1008287522472
Issue Date:
DOI: https://doi.org/10.1023/A:1008287522472