skip to main content
10.1145/1698790.1698805acmotherconferencesArticle/Chapter ViewAbstractPublication PagesedbtConference Proceedingsconference-collections
research-article

Mining flexible association rules from XML

Published: 22 March 2009 Publication History

Abstract

The role of the eXtensible Markup Language (XML) is becoming very important in the research fields focusing on the representation, the exchange, and the integration of information coming from different data sources and containing information related to various contexts such as, for example, medical and biological data. Extracting knowledge from XML datasets is an important issue that may be difficult because of the semistructured intrinsic nature of XML; indeed documents can have an implicit and irregular structure, not defined in advance.
In this paper, we propose a novel approach for discovering frequent, but approximate, information in XML documents, based on Flexible Tree Rules taking into account both structure and content of the analyzed data. Our proposal is flexible enough to be adapted to both documents with a regular structure and documents with a highly heterogeneous structure, and can be used to evaluate the similarity of XML documents. Moreover, we describe an algorithm to evaluate the similarity degree of a Flexible Tree Rule with respect to an XML document.

References

[1]
R. Agrawal, R. Srikant. Fast Algorithms for Mining Association Rules. In Proc. of the 20th International Conference on Very Large Data Bases, pp. 478--499, 1994.
[2]
M. Anandhavalli Gauthaman. Analysis of DNA Microarray Data using Association Rules: A Selective Study. In Proc. of World Academy of Science, Engineering and Technology, Vol. 32, August 2008.
[3]
T. Asai et al. Efficient Substructure Discovery from Large Semi-structured Data. 2002.
[4]
E. Baralis, P. Garza, E. Quintarelli, L. Tanca. Answering XML Queries by Means of Data Summaries. ACM Transactions on Information Systems, Vol. 25, No. 3, Article 10, July 2007.
[5]
D. Braga, A. Campi, M. Klemettinen, P. Lanzi. Mining Association Rules from XML Data. In Y. Kambayashi, W. Winiwarter, M. Arikawa editors: DaWaK 2002, LCNS 2454, pp. 21--30 Springer, 2002.
[6]
D. Braga, A. Campi, S. Ceri, M. Klemettinen, and P. Lanzi. Discovering Interesting Information in XML Data with Association Rules. In SAC 2003, pages 450--454. ACM, 2003.
[7]
Y. Chi, Y. Yang, Y. Xia, R. Muntz. CMTreeMiner: Mining Both Closed and Maximal Frequent Subtrees. In The Eighth Pacific Asia Conference on Knowledge Discovery and Data Mining (PAKDD'04), 2004.
[8]
C. Combi, B. Oliboni, R. Rossato. Querying XML documents by using association rules. In Proc. of the 16th International Workshop on Database and Expert Systems Applications (DEXA'05), 2005.
[9]
L. Dehaspe, H. Toivonen, R. D. King. Finding Frequent Substructures in Chemical Compounds. In Proc. 4th Internationall Conference of Knowledge Discovery and Data Mining, pp. 30--36, 1998.
[10]
M. Mazuran, E. Quintarelli, L. Tanca. Mining tree-based association rules from XML documents Technical Report 3/2009, Politecnico di Milano.
[11]
B. A. Shapiro, K. Zhang. Comparing Multiple RNA Secondary Structures Using Tree Comparisons. Computer Applications in the Biosciences, 6(4), pp. 309--318, 1990.
[12]
A. Termier, M. C. Rousset, M. Sebag. DRYADE: A New Approach for Discovering Closed Frequent Trees in Heterogeneous Tree Databases. In Proc. of the 4th IEEE International Conference on Data Mining, pp. 543--546, 2004.
[13]
A. Termier, M. C. Rousset, M. Sebag, K. Ohara, T. Washio, H. Motoda. DryadeParent, An Efficient and Robust Closed Attribute Tree Mining Algorithm. IEEE Transactions on Knowledge and Data Engineering, volume 20, Issue 3 (March 2008), pp. 300--320, 2008.
[14]
Y. Xiao, J. Yao, Z. Li, M. H. Dunham. Efficient Data Mining for Maximal Frequent Subtrees. In Proc. of the 3rd IEEE International Conference on Data Mining (ICDM'03), 2003.
[15]
M. J. Zaki. Efficiently Mining Frequent Trees in a Forest. In Proc. of the 8th ACM SIGKKD'02 International Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, July 2002.
[16]
M. J. Zaki. Efficiently Mining Frequent Embedded Unordered Trees. In Fundamenta Informaticae, vol. 66, pages 33--53, November 2004.
[17]
World Wide Web Consortium. Extensible Markup Language (XML) 1.0 (Fifth Edition), 2008. http://www.w3C.org/TR/REC-xml/.

Cited By

View all
  • (2014)Extracting Knowledge from XML Document Using Tree-Based Association RulesProceedings of the 2014 International Conference on Intelligent Computing Applications10.1109/ICICA.2014.37(134-137)Online publication date: 6-Mar-2014
  • (2013)Automated Ranking of Relaxing Query Results Based on XML Structure and Content PreferencesMobile and Web Innovations in Systems and Service-Oriented Engineering10.4018/978-1-4666-2470-2.ch003(44-62)Online publication date: 2013
  • (2011)Automated Ranking of Relaxing Query Results Based on XML Structure and Content PreferencesInternational Journal of Systems and Service-Oriented Engineering10.4018/jssoe.20110101022:1(21-39)Online publication date: 1-Jan-2011
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EDBT/ICDT '09: Proceedings of the 2009 EDBT/ICDT Workshops
March 2009
218 pages
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 March 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

EDBT/ICDT '09
EDBT/ICDT '09: EDBT/ICDT '09 joint conference
March 22, 2009
Saint-Petersburg, Russia

Acceptance Rates

Overall Acceptance Rate 7 of 10 submissions, 70%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2014)Extracting Knowledge from XML Document Using Tree-Based Association RulesProceedings of the 2014 International Conference on Intelligent Computing Applications10.1109/ICICA.2014.37(134-137)Online publication date: 6-Mar-2014
  • (2013)Automated Ranking of Relaxing Query Results Based on XML Structure and Content PreferencesMobile and Web Innovations in Systems and Service-Oriented Engineering10.4018/978-1-4666-2470-2.ch003(44-62)Online publication date: 2013
  • (2011)Automated Ranking of Relaxing Query Results Based on XML Structure and Content PreferencesInternational Journal of Systems and Service-Oriented Engineering10.4018/jssoe.20110101022:1(21-39)Online publication date: 1-Jan-2011
  • (2011)On mining association rules with semantic constraints in XML2011 Sixth International Conference on Digital Information Management10.1109/ICDIM.2011.6093337(1-5)Online publication date: Sep-2011

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media