Application-independent feature construction based on almost-closedness properties

Gay, Dominique; Selmaoui-Folcher, Nazha; Boulicaut, Jean-François

doi:10.1007/s10115-010-0369-x

Application-independent feature construction based on almost-closedness properties

Regular Paper
Published: 25 December 2010

Volume 30, pages 87–111, (2012)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Dominique Gay^1,2,
Nazha Selmaoui-Folcher¹ &
Jean-François Boulicaut³

135 Accesses
3 Citations
Explore all metrics

Abstract

Feature construction has been studied extensively, including for 0/1 data samples. Given the recent breakthroughs in closedness-related constraint-based mining, we are considering its impact on feature construction for classification tasks. We investigate the use of condensed representations of frequent itemsets based on closedness properties as new features. These itemset types have been proposed to avoid set counting in difficult association rule mining tasks, i.e. when data are noisy and/or highly correlated. However, our guess is that their intrinsic properties (say the maximality for the closed itemsets and the minimality for the δ-free itemsets) should have an impact on feature quality. Understanding this remains fairly open, and we discuss these issues thanks to itemset properties on the one hand and an experimental validation on various data sets (possibly noisy) on the other hand.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Efficient Algorithm for Mining Frequent Itemsets with Single Constraint

Structure of frequent itemsets with extended double constraints

Article Open access 29 January 2016

CLS-Miner: efficient and effective closed high-utility itemset mining

Article 11 April 2019

References

Agrawal R, Imielinski T, Swami AN (1993) Mining association rules between sets of items in large databases. In: Proceedings ACM SIGMOD’93, pp 207–216
Antonie M-L, Zaïane OR (2004) An associative classifier based on positive and negative rules. In: Proceedings of the 9th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, DMKD’04. ACM Press, pp 64–69
Baralis E, Chiusano S (2004) Essential classification rule sets. ACM Trans Database Syst 29(4): 635–674
Article Google Scholar
Bastide Y, Taouil R, Pasquier N, Stumme G, Lakhal L (2000) Mining frequent patterns with counting inference. SIGKDD Explor 2(2): 66–75
Article Google Scholar
Besson J, Pensa RG, Robardet C, Boulicaut J-F (2006) Constraint-based mining of fault-tolerant patterns from boolean data. In: KDID’05 selected and invited revised papers, vol. 3933 of LNCS, Springer, pp 55–71
Boley M, Grosskreutz H (2009) Approximating the number of frequent sets in dense data. Knowl Inf Syst 21(1): 65–89
Article Google Scholar
Bonchi F, Lucchese C (2006) On condensed representations of constrained frequent patterns. Knowl Inf Syst 9(2): 180–201
Article Google Scholar
Boulicaut J-F, Bykowski A, Rigotti C (2000) Approximation of frequency queries by means of free-sets. In: Proceedings PKDD’00, vol. 1910 of LNCS, Springer, pp 75–85
Boulicaut J-F, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining Knowl Discov 7(1): 5–22
Article MathSciNet Google Scholar
Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. In: SIGMOD’97. ACM Press, New york, pp 265–276
Bringmann B, Nijssen S, Zimmermann A (2009) Pattern based classification: a unifying perspective. In: LeGo’09 worskhop colocated with ECML/PKDD’09
Bringmann B, Zimmermann A (2009) One in a million: picking the right patterns. Knowl Inf Syst 18(1): 61–81
Article Google Scholar
Brodley CE, Utgoff PE (1995) Multivariate decision trees. Mach Learn 19(1): 45–77
MATH Google Scholar
Calders T, Rigotti C, Boulicaut J-F (2005) A survey on condensed representations for frequent sets. In: Constraint-based mining and inductive databases, vol 3848 of LNCS. Springer, Berlin, pp 64–80
Cerf L, Gay D, Selmaoui N, Boulicaut J-F (2008) A parameter free associative classifier. In: Proceedings DaWaK’08, vol 5182 of LNCS. Springer, Berlin, pp 238–247
Chang C-C, Lin C-J (2001) LIBSVM: a library for support vector machines’. http://www.csie.ntu.edu.tw/~cjlin/libsvm/
Cheng H, Yan X, Han J, Hsu C-W (2007) Discriminative frequent pattern analysis for effective classification. In: Proceedings ICDE’07. IEEE Computer Society, Silver Spring, pp 716–725
Cheng H, Yu PS, Han J (2006) AC-close: efficiently mining approximate closed itemsets by core pattern recovery. In: ICDM’06. pp 839–844
Cheng J, Ke Y, Ng W (2006) δ-tolerance closed frequent itemsets. In: ICDM’06, pp 139–148
Crémilleux B, Boulicaut J-F (2002) Simplest rules characterizing classes generated by delta-free sets. In: Proceedings ES’02. Springer, Berlin, pp 33–46
Dong G, Li J (1999) Efficient mining of emerging patterns: discovering trends and differences. In: Proceedings KDD’99. ACM Press, New york, pp 43–52
Dong G, Zhang X, Wong L, Li J (1999) CAEP: classification by aggregating emerging patterns. In: Proceedings DS’99, vol 1721 of LNCS, Springer, Berlin, pp 30–42
El-Manzalawy Y (2005) WLSVM: integrating libsvm into weka environment. http://www.cs.iastate.edu/~yasser/wlsvm/
Fayyad UM, Irani KB (1993) Multi-interval discretization of continous-valued attributes for classification learning. In: Proceedings IJCAI’93. Morgan Kaufmann, Los Altos, pp 1022–1027
Fürnkranz J (2002) Round robin classification. J Mach Learn Res 2: 721–747
MATH MathSciNet Google Scholar
Ganter B, Stumme G, Wille R (eds) (2005) Formal concept analysis, foundations and applications, vol 3626 of lecture notes in computer science. Springer, Berlin
Garriga GC, Kralj P, Lavrac N (2006) Closed sets for labeled data. In: Proceedings PKDD’06. Springer, Berlin, pp 163–174
Garriga GC, Kralj P, Lavrac N (2008) Closed sets for labeled data. J Mach Learn Res 9: 559–580
MATH MathSciNet Google Scholar
Gay D, Selmaoui N, Boulicaut J.-F (2007) Pattern-based decision tree construction. In: Proceedings of IEEE ICDIM’07. IEEE Press, New York, pp 291–296
Gay D, Selmaoui N, Boulicaut J-F (2008) Feature construction based on closedness properties is not that simple. In: Proceedings PAKDD’08, vol 5012 of LNCS. Springer, Berlin, pp 112–123
Gay D, Selmaoui N, Boulicaut J-F (2009) Application-independent feature construction from noisy samples In: Proceedings PAKDD’09, vol 5476 of LNCS. Springer, Berlin, pp 965–972
Hébert C, Crémilleux B (2005) Mining delta-strong characterization rules in large SAGE data. In: PKDD’05 discovery challenge on gene expression data
Hébert C, Crémilleux B (2006) Optimized rule mining through a unified framework for interestingness measures. In: Proceedings DaWaK’06, vol 4081 of LNCS. Springer, Berlin, pp 238–247
John GH, Langley P (1995) Estimating continuous distributions in bayesian classifiers. In: Proceedings UAI’95. Morgan Kaufmann, Los Altos, pp 338–345
Kubica J, Moore AW (2003) Probabilistic noise identification and data cleaning. In: Proceedings ICDM’03. IEEE Computer Society, Silver Spring, pp 131–138
Li J, Dong G, Ramamohanarao K (2000) Instance-based classification by emerging patterns. In: Proceedings the 4th European conference on principles and practice of knowledge discovery in databases. Springer, Berlin, pp 191–200
Li J, Dong G, Ramamohanarao K (2001) ‘Making use of the most expressive jumping emerging patterns for classification. Knowl Inf Syst 3(2): 131–145
Article Google Scholar
Li J, Liu G, Wong L (2007) Mining statistically important equivalence classes and delta-discriminative emerging patterns. In: Proceedings of the 13th ACM SIGKDD international conference on knowledge discovery and data mining KDD’07. ACM Press, New York
Li W, Han J, Pei J (2001) CMAR: accurate and efficient classification based on multiple class-association rules. In: Proceedings ICDM’01. IEEE Computer Society, New York, pp 369–376
Liu B, Hsu W, Ma Y (1998) Integrating classification and association rule mining. In: Proceedings KDD’98. AAAI Press, pp 80–86
Liu G, Li J, Wong L (2007) A new concise representation of frequent itemsets using generators and a positive border. Knowl Inf Syst
Miettinen P, Mielikäinen T, Gionis A, Das G, Mannila H (2008) The discrete basis problem. IEEE Trans Knowl Data Eng 20(10): 1348–1362
Article Google Scholar
Park S-H, Fürnkranz J. (2007) Efficient pairwise classification. In: ECML’07, pp 658–665
Pensa RG, Robardet C, Boulicaut J-F (2006) Supporting bi-cluster interpretation in 0/1 data by means of local patterns. Intell Data Anal 10(5): 457–472
Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, Los Altos
Google Scholar
Ramamohanarao K, Fan H (2007) Patterns based classifiers. World Wide Web 10(1): 71–83
Article Google Scholar
Rebbapragada U, Brodley CE (2007) Class noise mitigation through instance weighting. In: Proceedings ECML’07, vol 4701 of LNCS. Springer, Berlin, pp 708–715
Selmaoui N, Leschi C, Gay D, Boulicaut J-F (2006) Feature construction and delta-free sets in 0/1 samples. In: Proceedings DS’06, vol 4265 of LNCS. Springer, Berlin, pp 363–367
Soulet A, Crémilleux B, Rioult F (2004) Condensed representation of emerging patterns. In: Proceedings of the 8th Pacific-Asia conference on knowledge discovery in databases, vol 3056 of LNCS, pp 127–132
Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Reading
Google Scholar
Utgoff PE, Brodley CE (1990) An incremental method for finding multivariate splits for decision trees. In: ICML’90, pp 58–65
Van Hulse J, Khoshgoftaar TM, Huang H (2007) The pairwise attribute noise detection algorithm. Knowl Inf Syst 11(2): 171–190
Article Google Scholar
Wang J, Karypis G (2005) HARMONY: efficiently mining the best rules for classification. In: Proceedings SIAM SDM’05, pp 34–43
Wang J, Karypis G (2006) On mining instance-centric classification rules. IEEE Trans Knowl Data Eng 18(11): 1497–1511
Article Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, Los Altos
MATH Google Scholar
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng AFM, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1): 1–37
Article Google Scholar
Yang C, Fayyad UM, Bradley PS (2001) Efficient discovery of error-tolerant frequent itemsets in high dimensions. In: Proceedings KDD’01. ACM Press, New York, pp 194–203
Yang Y, Wu X, Zhu X (2004) Dealing with predictive-but-unpredictable attributes in noisy data sources. In: Proceedings PKDD’04, vol 3202 of LNCS. Springer, Berlin, pp 471–483
Zhang S, Wu X, Zhang C, Lu J (2008) Computing the minimum-support for mining frequent patterns. Knowl Inf Syst 15(2): 233–257
Article Google Scholar
Zhang Y, Wu X (2007) Noise modeling with associative corruption rules. In: Proceedings ICDM’07. IEEE Computer Society, New York, pp 733–738
Zheng Z (1995) Constructing nominal x-of-n attributes. In: IJCAI’95, pp 1064–1070
Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study. Artif Intell Revue 22(3): 177–210
Article MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

University of New-Caledonia, PPME EA3325, ERIM EA3791, BP R4, 98851, NOUMEA Cédex, New-Caledonia, France
Dominique Gay & Nazha Selmaoui-Folcher
Orange Labs, TECH/ASAP/PROF, 2, avenue Pierre Marzin, 22307, LANNION Cédex, France
Dominique Gay
INSA-Lyon, LIRIS CNRS UMR5205, INRIA Combining, 69621, Villeurbanne, France
Jean-François Boulicaut

Authors

Dominique Gay
View author publications
You can also search for this author in PubMed Google Scholar
Nazha Selmaoui-Folcher
View author publications
You can also search for this author in PubMed Google Scholar
Jean-François Boulicaut
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dominique Gay.

Additional information

Dominique Gay was with PPME EA3325, ERIM EA3791, University of New-Caledonia when this work began.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gay, D., Selmaoui-Folcher, N. & Boulicaut, JF. Application-independent feature construction based on almost-closedness properties. Knowl Inf Syst 30, 87–111 (2012). https://doi.org/10.1007/s10115-010-0369-x

Download citation

Received: 08 July 2009
Revised: 20 October 2010
Accepted: 26 November 2010
Published: 25 December 2010
Issue Date: January 2012
DOI: https://doi.org/10.1007/s10115-010-0369-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Application-independent feature construction based on almost-closedness properties

Abstract

Access this article

Similar content being viewed by others

An Efficient Algorithm for Mining Frequent Itemsets with Single Constraint

Structure of frequent itemsets with extended double constraints

CLS-Miner: efficient and effective closed high-utility itemset mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Application-independent feature construction based on almost-closedness properties

Abstract

Access this article

Similar content being viewed by others

An Efficient Algorithm for Mining Frequent Itemsets with Single Constraint

Structure of frequent itemsets with extended double constraints

CLS-Miner: efficient and effective closed high-utility itemset mining

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation