Abstract
Several advanced applications, such as those dealing with the Web, need to handle data whose structure is not known a-priori. Such requirement severely limits the applicability of traditional database techniques, that are based on the fact that the structure of data (e.g. the database schema) is known before data are entered into the database. Moreover, in traditional database systems, whenever a data item (e.g. a tuple, an object, and so on) is entered, the application specifies the collection (e.g. relation, class, and so on) the data item belongs to. Collections are the basis for handling queries and indexing and therefore a proper classification of data items in collections is crucial. In this paper, we address this issue in the context of an extended object-oriented data model. We propose an approach to classify objects, created without specifying the class they belong to, in the most appropriate class of the schema, that is, the class closest to the object state. In particular, we introduce the notion of weak membership of an object in a class, and define two measures, the conformity and the heterogeneity degrees, ex- ploited by our classification algorithm to identify the most appropriate class in which an object can be classified, among the ones of which it is a weak member.
★
Work partially supported by the Italian MURST under the Interdata Project.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Abiteboul. Querying Semi-Structured Data. In F. Afrati and P. Kolaitis, editors, Database Theory-ICDT’97, pages 1–18, 1997.
S. Abiteboul, S. Cluet, and T. Milo. Correspondence and Traslation for Heterogeneous Data. In F. Afrati and P. Kolaitis, editors, Database Theory-ICDT’97, pages 351–363, 1997.
S. Abiteboul, R. Motwani, and S. Nestorov. Inferring Structure in Semistructured Data. In Proc. Workshop on Management of Semistructured Data, SIGMOD Record, 26(4):39–43, 1997.
S. Abiteboul, D. Quass, J. McHugh, J. Widom, and J. L. Wiener. The Lorel Query Language for Semistructured Data. Journal of Digital Libraries, 1(1):68–88, 1996.
S. Abiteboul and V. Vianu. Queries and Computation on the Web. In F. Afrati and P. Kolaitis, editors, Database Theory-ICDT’97, pages 262–275, 1997.
R. Agrawal, A. Borgida, and H. Jagadish. Effcient Management of Transitive Relationships in Large Data and Knowledge Bases. In J. Clifford, B. Lindsay, and D. Maier, editors, Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pages 253–262, 1989.
P. L. Bergstein and K. J. Lieberherr. Incremental Class Dictionary Learning and Optimization. In P. America, editor, Proc. Fifth European Conference on Object-Oriented Programming, number 512 in Lecture Notes in Computer Science, pages 377–396, 1991.
E. Bertino, G. Guerrini, I. Merlo, and M. Mesiti. An Object-Oriented Data Model for Semi-Structured Data. Technical Report DISI-TR-99-06, University of Genova, Department of Computer Science (DISI), 1998.
R. Breitl, D. Maier, A. Otis, J. Penney, B. Schuchardt, J. Stein, E. H. Williams, and M. Williams. The GemStone Data Management System. In W. Kim and F. H. Lochovsky, editors, Object-Oriented Concepts, Databases, and Applications, pages 283–308. Addison-Wesley, 1989.
P. Buneman. Semistructured Data. In Proc. of 6th ACM SIGACT-SIGMOD-SIGART Symposium on PODS, pages 117–121, 1997. Tutorial.
P. Buneman, S. Davidson, M. Fernandez, and D. Suciu. Adding Structure to Unstructured Data. In F. Afrati and P. Kolaitis, editors, Database Theory-ICDT’97, pages 336–350, 1997.
P. Buneman, S. Davidson, D. Suciu, and G. Hillebrand. A Query Language and Optimization Techniques for Unstructured Data. In Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pages 505–516, 1996.
V. Christophides, S. Abiteboul, S. Cluet, and M. Scholl. From Structured Documents to Novel Query Facilities. In Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pages 313–324, 1994.
S. Cluet. Modeling and Querying Semi-Structured Data. In M. T. Pazienza, editor, Information Extraction. LNAI 1299, pages 192–213, 1997.
O. Deux et al. The Story of o2. IEEE Transactions on Knowledge and Data Engineering, 2(1):91–108, 1990.
A. Goldberg and D. Robson. Smalltalk-80: The Language and its Implementation. Addison-Wesley, 1983.
R. Goldman and J. Widom. Dataguides: Enabling Query Formulation and Optimization in Semistructured Databases. In Proc. Twentythird Int’l Conf. on Very Large Data Bases, pages 436–445, 1997.
G. Guerrini, E. Bertino, and R. Bal. A Formal De nition of the Chimera Object-Oriented Data Model. Journal of Intelligent Information Systems, 11(1):5–40, 1998.
J. Hammer, H. Garcia-Molina, J. Cho, R. Aranha, and A. Crespo. Extracting Semistructured Information from the Web, 1997. Available via anonymous ftp at ftp://db.stanford.edu/pub/paper/extract.ps.
M. Henzinger, T. Henzinger, and P. Kopke. Computing Simulation on Finite and Infinite Graphs. In Proc. of 20th Symposium on Foundations on Computer Science, pages 453–462, 1995.
S. Holzner. XML Complete. McGraw-Hill, 1998.
R. Milner. An Algebraic Definition of Simulation between Programs. In Proc. of the 2nd IJCAI, pages 481–489, London, UK, 1971.
S. Nestorov, S. Abiteboul, and R. Motwani. Extracting Schema from Semistructured Data. In L. M. Haas and A. Tiwary, editors, Proc. of the ACM SIGMOD Int’l Conf. on Management of Data, pages 295–306, 1998.
Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object Exchange Across Heterogeneous Information Sources. In Proc. of the 11th Int’l Conf. on Data Engineering, pages 251–260, 1995.
C. Peltason, A. Schmiedel, C. Kindermann, and J. Quantz. The BACK System Revisited. Technical Report KIT-Report 75, Technische Universitat Berlin, 1989.
F. Rabitti. The Multos Document Model, volume Human Factors in Information Technology of 6, chapter 3, pages 17–52. North-Holland, 1990.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bertino, E., Guerrini, G., Merlo, I., Mesiti, M. (1999). An Approach to Classify Semi-Structured Objects. In: Guerraoui, R. (eds) ECOOP’ 99 — Object-Oriented Programming. ECOOP 1999. Lecture Notes in Computer Science, vol 1628. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48743-3_19
Download citation
DOI: https://doi.org/10.1007/3-540-48743-3_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66156-6
Online ISBN: 978-3-540-48743-2
eBook Packages: Springer Book Archive