skip to main content
10.1145/584792.584867acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Discovering approximate keys in XML data

Published:04 November 2002Publication History

ABSTRACT

Keys are very important in many aspects of data management, such as guiding query formulation, query optimization, indexing, etc. We consider the situation where an XML document does not come with key definitions, and we are interested in using data mining techniques to obtain a representation of the keys holding in a document. In order to have a compact representation of the set of keys holding in a document, we define a partial order on the set of all key expressions. This order is based on an analysis of the properties of absolute and relative keys for XML. Given the existence of the partial order, only a reduced set of key expressions need to be discovered.Due to the semistructured nature of XML documents, it turns out to be useful to consider keys that hold in "almost" the whole document, that is, they are violated only in a small part of the document. To this end, the support and confidence of a key expression are also defined, and the concept of approximate key expression is introduced. We give an efficient algorithm to mine a reduced set of approximate keys from an XML document.

References

  1. ACM SIGMOD Record: XML Version, http://www.acm.org/sigmod/record/xml/.]]Google ScholarGoogle Scholar
  2. S. Abiteboul, R. Hull and V. Vianu. Foundations of databases, Addison-Wesley, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Very Large Data Bases, pages 487--499, Santiago, 1994.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Arenas and L. Libkin. A normal form for XML documents, Proceedings of the 21th Symposium on Principles of Database Systems (PODS'02), pages 85--96, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Bray, J. Paoli, and C. M. Sperberg-McQueen. Extensive Markup Language (XML) 1.0. World Wide Web Consortium (W3C), Feb. 1998. http://www.w3.org/TR/REC-xml.]]Google ScholarGoogle Scholar
  6. P. Buneman, S. Davidson, W. Fan, C. Hara, W. Tan. Reasoning about Keys for XML. In 8th International Workshop on Databases and Programming Languages (DBPL '01).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Buneman, W. Fan,J. Siméon, S. Weinstein. Constraints for semistructured data and XML. SIGMOD Record, 30(1):47--55, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Davidson, Y. Chen and Y. Zheng. Technical report, Indexing Keys in Hierarchical Data, 2001.]]Google ScholarGoogle Scholar
  9. W. Fan, L. Libkin. On XML Integrity Constraints in the Presence of DTDs. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 114--125, Santa Barbara, California, May 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. W. Fan, J. Siméon. Integrity Constraints for XML. In Proceedings of ACM Symposium on Principles of Database Systems (PODS), pages 23--34, Dallas, Texas, May 2000.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. M. Hoffmann and M. J. O'Donnell. Pattern matching in trees, Journal of the ACM, 29(1):68--95, 1982.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Huhtala, J. Kivinen, P. Porkka and H. Toivonen. Efficient Discovery of Functional and Approximate Dependencies Using Partitions, ICDE, pages 392--401, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Layman et al. XML-Data. W3C Note, Jan. 1998. http://www.w3.org/TR/1998/ NOTE-XML-data.]]Google ScholarGoogle Scholar
  14. K. Wang, H. Liu. Discovering Typical Structures of Documents: A Road Map Approach. In 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 146--154, 1998.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. P. Buneman, S. Khanna, K. Tajima, W. Tan, Archiving Scientific Data. Proceedings of ACM SIGMOD International Conference on Management of Data, pages 1-12, 2002.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Kivinen and H. Mannila Approximate dependency inference from relations. Theoretical Computer Science, 149:129--149, 1995.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. Mannila and K.-J. Räihä On the complexity of inferring functional dependencies. Discrete Applied Mathematics, 40:237--243, 1992.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Calders T., Paredaens J. Axiomatization of frequent sets. In Proceedings of the International Conference on Database Theory, pages 204--218, London, 2001.]] Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Discovering approximate keys in XML data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '02: Proceedings of the eleventh international conference on Information and knowledge management
      November 2002
      704 pages
      ISBN:1581134924
      DOI:10.1145/584792

      Copyright © 2002 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 November 2002

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader