research-article

Mining flexible association rules from XML

Authors:

Elisabetta Caneva,

Barbara Oliboni,

Elisa QuintarelliAuthors Info & Claims

EDBT/ICDT '09: Proceedings of the 2009 EDBT/ICDT Workshops

Pages 85 - 92

https://doi.org/10.1145/1698790.1698805

Published: 22 March 2009 Publication History

Abstract

The role of the eXtensible Markup Language (XML) is becoming very important in the research fields focusing on the representation, the exchange, and the integration of information coming from different data sources and containing information related to various contexts such as, for example, medical and biological data. Extracting knowledge from XML datasets is an important issue that may be difficult because of the semistructured intrinsic nature of XML; indeed documents can have an implicit and irregular structure, not defined in advance.

In this paper, we propose a novel approach for discovering frequent, but approximate, information in XML documents, based on Flexible Tree Rules taking into account both structure and content of the analyzed data. Our proposal is flexible enough to be adapted to both documents with a regular structure and documents with a highly heterogeneous structure, and can be used to evaluate the similarity of XML documents. Moreover, we describe an algorithm to evaluate the similarity degree of a Flexible Tree Rule with respect to an XML document.

References

[1]

R. Agrawal, R. Srikant. Fast Algorithms for Mining Association Rules. In Proc. of the 20th International Conference on Very Large Data Bases, pp. 478--499, 1994.

Digital Library

[2]

M. Anandhavalli Gauthaman. Analysis of DNA Microarray Data using Association Rules: A Selective Study. In Proc. of World Academy of Science, Engineering and Technology, Vol. 32, August 2008.

[3]

T. Asai et al. Efficient Substructure Discovery from Large Semi-structured Data. 2002.

[4]

E. Baralis, P. Garza, E. Quintarelli, L. Tanca. Answering XML Queries by Means of Data Summaries. ACM Transactions on Information Systems, Vol. 25, No. 3, Article 10, July 2007.

Digital Library

[5]

D. Braga, A. Campi, M. Klemettinen, P. Lanzi. Mining Association Rules from XML Data. In Y. Kambayashi, W. Winiwarter, M. Arikawa editors: DaWaK 2002, LCNS 2454, pp. 21--30 Springer, 2002.

Digital Library

[6]

D. Braga, A. Campi, S. Ceri, M. Klemettinen, and P. Lanzi. Discovering Interesting Information in XML Data with Association Rules. In SAC 2003, pages 450--454. ACM, 2003.

Digital Library

[7]

Y. Chi, Y. Yang, Y. Xia, R. Muntz. CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. In The Eighth Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD'04), 2004.

[8]

C. Combi, B. Oliboni, R. Rossato. Querying XML documents by using association rules. In Proc. of the 16th International Workshop on Database and Expert Systems Applications (DEXA'05), 2005.

[9]

L. Dehaspe, H. Toivonen, R. D. King. Finding Frequent Substructures in Chemical Compounds. In Proc. 4th Internationall Conference of Knowledge Discovery and Data Mining, pp. 30--36, 1998.

[10]

M. Mazuran, E. Quintarelli, L. Tanca. Mining tree-based association rules from XML documents Technical Report 3/2009, Politecnico di Milano.

[11]

B. A. Shapiro, K. Zhang. Comparing Multiple RNA Secondary Structures Using Tree Comparisons. Computer Applications in the Biosciences, 6(4), pp. 309--318, 1990.

[12]

A. Termier, M. C. Rousset, M. Sebag. DRYADE: A New Approach for Discovering Closed Frequent Trees in Heterogeneous Tree Databases. In Proc. of the 4th IEEE International Conference on Data Mining, pp. 543--546, 2004.

Digital Library

[13]

A. Termier, M. C. Rousset, M. Sebag, K. Ohara, T. Washio, H. Motoda. DryadeParent, An Efficient and Robust Closed Attribute Tree Mining Algorithm. IEEE Transactions on Knowledge and Data Engineering, volume 20, Issue 3 (March 2008), pp. 300--320, 2008.

Digital Library

[14]

Y. Xiao, J. Yao, Z. Li, M. H. Dunham. Efficient Data Mining for Maximal Frequent Subtrees. In Proc. of the 3rd IEEE International Conference on Data Mining (ICDM'03), 2003.

Digital Library

[15]

M. J. Zaki. Efficiently Mining Frequent Trees in a Forest. In Proc. of the 8th ACM SIGKKD'02 International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, July 2002.

Digital Library

[16]

M. J. Zaki. Efficiently Mining Frequent Embedded Unordered Trees. In Fundamenta Informaticae, vol. 66, pages 33--53, November 2004.

Digital Library

[17]

World Wide Web Consortium. Extensible Markup Language (XML) 1.0 (Fifth Edition), 2008. http://www.w3C.org/TR/REC-xml/.

Cited By

Thangarasu SSasikala D(2014)Extracting Knowledge from XML Document Using Tree-Based Association RulesProceedings of the 2014 International Conference on Intelligent Computing Applications10.1109/ICICA.2014.37(134-137)Online publication date: 6-Mar-2014
https://dl.acm.org/doi/10.1109/ICICA.2014.37
Yan WYan LMa Z(2013)Automated Ranking of Relaxing Query Results Based on XML Structure and Content PreferencesMobile and Web Innovations in Systems and Service-Oriented Engineering10.4018/978-1-4666-2470-2.ch003(44-62)Online publication date: 2013
https://doi.org/10.4018/978-1-4666-2470-2.ch003
Yan LMa ZYan W(2011)Automated Ranking of Relaxing Query Results Based on XML Structure and Content PreferencesInternational Journal of Systems and Service-Oriented Engineering10.4018/jssoe.20110101022:1(21-39)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.4018/jssoe.2011010102
Show More Cited By

Index Terms

Mining flexible association rules from XML
1. Applied computing
  1. Document management and text processing
    1. Document preparation
2. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Mining association rules from XML data using XQuery
ACSW Frontiers '04: Proceedings of the second workshop on Australasian information security, Data Mining and Web Intelligence, and Software Internationalisation - Volume 32

In recent years XML has became very popular for representing semistructured data and a standard for data exchange over the web. Mining XML data from the web is becoming increasingly important. Several encouraging attempts at developing methods for mining ...
Mining Association Rules from XML Data
DaWaK 2000: Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery

The eXtensible Markup Language (XML) rapidly emerged as a standard for representing and exchanging information. The fastgrowing amount of available XML data sets a pressing need for languages and tools to manage collections of XML documents, as well as ...
Mining Association Rules from Complex and Irregular XML Documents Using XSLT and Xquery
ALPIT '08: Proceedings of the 2008 International Conference on Advanced Language Processing and Web Information Technology

Currently, XML has been penetrating all areas of Internet for exchanging data. This fast growing usage of XML makes great amount data sources of XML data available and raises the need for languages, methods and tools to extract knowledge through the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

EDBT/ICDT '09: Proceedings of the 2009 EDBT/ICDT Workshops

March 2009

218 pages

ISBN:9781605586502

DOI:10.1145/1698790

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Conference

EDBT/ICDT '09

EDBT/ICDT '09: EDBT/ICDT '09 joint conference

March 22, 2009

Saint-Petersburg, Russia

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
142
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Thangarasu SSasikala D(2014)Extracting Knowledge from XML Document Using Tree-Based Association RulesProceedings of the 2014 International Conference on Intelligent Computing Applications10.1109/ICICA.2014.37(134-137)Online publication date: 6-Mar-2014
https://dl.acm.org/doi/10.1109/ICICA.2014.37
Yan WYan LMa Z(2013)Automated Ranking of Relaxing Query Results Based on XML Structure and Content PreferencesMobile and Web Innovations in Systems and Service-Oriented Engineering10.4018/978-1-4666-2470-2.ch003(44-62)Online publication date: 2013
https://doi.org/10.4018/978-1-4666-2470-2.ch003
Yan LMa ZYan W(2011)Automated Ranking of Relaxing Query Results Based on XML Structure and Content PreferencesInternational Journal of Systems and Service-Oriented Engineering10.4018/jssoe.20110101022:1(21-39)Online publication date: 1-Jan-2011
https://dl.acm.org/doi/10.4018/jssoe.2011010102
Shahriar MLiu J(2011)On mining association rules with semantic constraints in XML2011 Sixth International Conference on Digital Information Management10.1109/ICDIM.2011.6093337(1-5)Online publication date: Sep-2011
https://doi.org/10.1109/ICDIM.2011.6093337

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten