ABSTRACT
Keys are very important in many aspects of data management, such as guiding query formulation, query optimization, indexing, etc. We consider the situation where an XML document does not come with key definitions, and we are interested in using data mining techniques to obtain a representation of the keys holding in a document. In order to have a compact representation of the set of keys holding in a document, we define a partial order on the set of all key expressions. This order is based on an analysis of the properties of absolute and relative keys for XML. Given the existence of the partial order, only a reduced set of key expressions need to be discovered.Due to the semistructured nature of XML documents, it turns out to be useful to consider keys that hold in "almost" the whole document, that is, they are violated only in a small part of the document. To this end, the support and confidence of a key expression are also defined, and the concept of approximate key expression is introduced. We give an efficient algorithm to mine a reduced set of approximate keys from an XML document.
- ACM SIGMOD Record: XML Version, http://www.acm.org/sigmod/record/xml/.]]Google Scholar
- S. Abiteboul, R. Hull and V. Vianu. Foundations of databases, Addison-Wesley, 1995.]] Google ScholarDigital Library
- R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Very Large Data Bases, pages 487--499, Santiago, 1994.]] Google ScholarDigital Library
- M. Arenas and L. Libkin. A normal form for XML documents, Proceedings of the 21th Symposium on Principles of Database Systems (PODS'02), pages 85--96, 2002.]] Google ScholarDigital Library
- T. Bray, J. Paoli, and C. M. Sperberg-McQueen. Extensive Markup Language (XML) 1.0. World Wide Web Consortium (W3C), Feb. 1998. http://www.w3.org/TR/REC-xml.]]Google Scholar
- P. Buneman, S. Davidson, W. Fan, C. Hara, W. Tan. Reasoning about Keys for XML. In 8th International Workshop on Databases and Programming Languages (DBPL '01).]] Google ScholarDigital Library
- P. Buneman, W. Fan,J. Siméon, S. Weinstein. Constraints for semistructured data and XML. SIGMOD Record, 30(1):47--55, 2001.]] Google ScholarDigital Library
- S. Davidson, Y. Chen and Y. Zheng. Technical report, Indexing Keys in Hierarchical Data, 2001.]]Google Scholar
- W. Fan, L. Libkin. On XML Integrity Constraints in the Presence of DTDs. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 114--125, Santa Barbara, California, May 2001.]] Google ScholarDigital Library
- W. Fan, J. Siméon. Integrity Constraints for XML. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 23--34, Dallas, Texas, May 2000.]] Google ScholarDigital Library
- C. M. Hoffmann and M. J. O'Donnell. Pattern matching in trees, Journal of the ACM, 29(1):68--95, 1982.]] Google ScholarDigital Library
- Y. Huhtala, J. Kivinen, P. Porkka and H. Toivonen. Efficient Discovery of Functional and Approximate Dependencies Using Partitions, ICDE, pages 392--401, 1998.]] Google ScholarDigital Library
- A. Layman et al. XML-Data. W3C Note, Jan. 1998. http://www.w3.org/TR/1998/ NOTE-XML-data.]]Google Scholar
- K. Wang, H. Liu. Discovering Typical Structures of Documents: A Road Map Approach. In 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 146--154, 1998.]] Google ScholarDigital Library
- P. Buneman, S. Khanna, K. Tajima, W. Tan, Archiving Scientific Data. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 1-12, 2002.]] Google ScholarDigital Library
- J. Kivinen and H. Mannila Approximate dependency inference from relations. Theoretical Computer Science, 149:129--149, 1995.]] Google ScholarDigital Library
- H. Mannila and K.-J. Räihä On the complexity of inferring functional dependencies. Discrete Applied Mathematics, 40:237--243, 1992.]] Google ScholarDigital Library
- Calders T., Paredaens J. Axiomatization of frequent sets. In Proceedings of the International Conference on Database Theory, pages 204--218, London, 2001.]] Google ScholarDigital Library
Index Terms
- Discovering approximate keys in XML data
Recommendations
Discovering XML keys and foreign keys in queries
SAC '09: Proceedings of the 2009 ACM symposium on Applied ComputingThe XML has undoubtedly become a standard for data representation and manipulation. But most of XML documents are still created without the respective description of their structure, i.e. an XML schema. In this paper, we further enhance current methods ...
Discovering XSD Keys from XML Data
Invited Articles Issue, SIGMOD 2013, PODS 2013 and ICDT 2013A great deal of research into the learning of schemas from XML data has been conducted in recent years to enable the automatic discovery of XML schemas from XML documents when no schema or only a low-quality one is available. Unfortunately, and in ...
Discovering XSD keys from XML data
SIGMOD '13: Proceedings of the 2013 ACM SIGMOD International Conference on Management of DataA great deal of research into the learning of schemas from XML data has been conducted in recent years to enable the automatic discovery of XML Schemas from XML documents when no schema, or only a low-quality one is available. Unfortunately, and in ...
Comments