Mining skypatterns in fuzzy tensors

Nadisic, Nicolas; Coussat, Aurélien; Cerf, Loïc

doi:10.1007/s10618-019-00640-4

Mining skypatterns in fuzzy tensors

Published: 04 July 2019

Volume 33, pages 1298–1322, (2019)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

375 Accesses
3 Citations
Explore all metrics

Abstract

Many data mining tasks rely on pattern mining. To identify the patterns of interest in a dataset, an analyst may define several measures that score, in different ways, the relevance of a pattern. Until recently, most algorithms have only handled constraints in an efficient way, i.e., every measure had to be associated with a user-defined threshold, which can be tricky to determine. Skypatterns were introduced to allow analysts to simply define the measures of interest, and to get as a result a set of globally optimal and semantically relevant patterns. Skypatterns are Pareto-optimal patterns: no other pattern scores better on one of the chosen measures and scores at least as well on every remaining measure. This article tackles the search of the skypatterns in a more general context than the 0/1 (aka Boolean) matrix: the fuzzy tensor. The proposed solution supports a large class of measures. After explaining why and how their common mathematical property enables a safe pruning of the search space, an algorithm is presented. It builds upon multidupehack, a generalist pattern mining framework, which is now able to efficiently list skypatterns in addition to enforcing constraints on them. Experiments on two real-world fuzzy tensors illustrate the versatility of the proposal. Other experiments show it is typically more than one order of magnitude faster than the state-of-the-art algorithms, which can only mine 0/1 matrices.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

Article 12 April 2024

Rashmin Gajera, Suresh Patel, … Ayush Solanki

On the nature and types of anomalies: a review of deviations in data

Article Open access 04 August 2021

Ralph Foorthuis

A survey on topological structures on fuzzy rough sets

Article 08 April 2024

Virendra Kumar & Surabhi Tiwari

Notes

ET-n-set stands for Error-Tolerantn-set.
https://gitlab.com/nnadisic/skypatterns-uncertain-tensors.
http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php.

References

Bistarelli S, Bonchi F (2007) Soft constraint based pattern mining. Data Knowl Eng 62(1):118–137
Article Google Scholar
Börzsönyi S, Kossmann D, Stocker K (2001) The skyline operator. In: ICDE’01: proceedings of the 17th international conference on data engineering. IEEE Computer Society, pp 421–430
Cerf L, Meira Jr. W (2014) Complete discovery of high-quality patterns in large numerical tensors. In: ICDE’14: proceedings of the 30th international conference on data engineering. IEEE Computer Society, pp 448–459
Cerf L, Besson J, Robardet C, Boulicaut J-F (2009) Closed patterns meet $n$-ary relations. ACM Trans Knowl Discov Data 3(1):1–36
Article Google Scholar
Coussat A, Nadisic N, Cerf L (2018) Mining high-utility patterns in uncertain tensors. In: KES’18: proceedings of the 22nd international conference on knowledge-based and intelligent information & engineering systems. Elsevier, pp 403–412
Goyal V, Sureka A, Patel D (2015) Efficient skyline itemsets mining. In: C3S2E’15: proceedings of the eighth international C* conference on computer science & software engineering. ACM Press, pp 119–124
Lin JC-W, Yang L, Fournier-Viger P, Dawar S, Goyal V, Sureka A, Vo B (2016) A more efficient algorithm to mine skyline frequent-utility patterns. In: ICGEC’16: proceedings of the tenth international conference on genetic and evolutionary computing, pp 127–135
Négrevergne B, Dries A, Guns T, Nijssen S (2013) Dominance programming for itemset mining. In: ICDM’13: proceedings of the 13th international conference on data mining. IEEE Computer Society, pp 557–566
Papadopoulos AN, Lyritsis A, Manolopoulos Y (2008) SkyGraph: an algorithm for important subgraph discovery in relational graphs. Data Min Knowl Discov 17(1):57–76
Article MathSciNet Google Scholar
Soulet A, Crémilleux B (2005) Exploiting virtual patterns for automatically pruning the search space. In: KDID’05: Proceedings of the fourth international workshop on knowledge discovery in inductive databases. Springer, pp 202–221
Soulet A, Crémilleux B (2009) Mining constraint-based patterns using automatic relaxation. Intell Data Anal 13(1):109–133
Article Google Scholar
Soulet A, Raïssi C, Plantevit M, Crémilleux B (2011) Mining dominant patterns in the sky. In: ICDM’11: proceedings of the 11th international conference on data mining. IEEE Computer Society, pp 655–664
Ugarte W, Boizumault P, Loudni S, Crémilleux B (2014a) Computing skypattern cubes. In: ECAI’14: proceedings of the 21st European conference on artificial intelligence. IOS Press, pp 903–908
Ugarte W, Boizumault P, Loudni S, Crémilleux B, Lepailleur A (2014b) Mining (soft-) skypatterns using dynamic CSP. In: CPAIOR’14: proceedings of the 11th international conference on integration of AI and OR techniques in constraint programming. Springer, pp 71–87
Ugarte W, Boizumault P, Crémilleux B, Lepailleur A, Loudni S, Plantevit M, Raïssi C, Soulet A (2017) Skypattern mining: from pattern condensed representations to dynamic constraint satisfaction problems. Artif Intell 244:48–69
Article MathSciNet MATH Google Scholar
van Leeuwen M, Ukkonen A (2013) Discovering skylines of subgroup sets. In: ECML PKDD’13: proceeding of the European conference on machine learning and knowledge discovery in databases. Springer, pp 272–287
Wang J, Han J, Lu Y, Tzvetkov P (2005) TFP: an efficient algorithm for mining top-k frequent closed itemsets. IEEE Trans Knowl Data Eng 17(5):652–663
Article Google Scholar

Download references

Acknowledgements

We would like to thank Willy Ugarte, Bruno Crémilleux, Chedy Raïssi and Benjamin Négrevergne for providing the source codes of their algorithms and for their valuable comments.

Author information

Authors and Affiliations

Department of Mathematics and Operational Research, University of Mons, Mons, Belgium
Nicolas Nadisic
Université de Lyon, INSA-Lyon, Université Claude Bernard Lyon 1, UJM-Saint Etienne, CNRS, Inserm, CREATIS UMR, 5220, U1206, 69373, Lyon, France
Aurélien Coussat
Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
Loïc Cerf

Authors

Nicolas Nadisic
View author publications
You can also search for this author in PubMed Google Scholar
Aurélien Coussat
View author publications
You can also search for this author in PubMed Google Scholar
Loïc Cerf
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Loïc Cerf.

Additional information

Responsible editor: Po-ling Loh, Evimaria Terzi, Antti Ukkonen, Karsten Borgwardt and Katharina Heinrich.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work has been partially funded by the FAPEMIG under Grant No. APQ-04224-16 (Multilateral Cooperation FAPEMIG-CNRS) and by the ERC Starting Grant No. 679515.

A Piecewise (Anti-)Monotonicity of the Slope Measure

To simplify the proof that the slope is piecewise (anti-)monotone, all the outputs of the x and y data-access functions, i.e., the abscissas and the ordinates of the points, are supposed positive. If it is not the case, $\min _{t \in \prod _{i \in I} X_i} x(t)$ is subtracted from every abscissa and $\min _{t \in \prod _{i \in I} X_i} y(t)$ is subtracted from every ordinate, what moves all the points to the positive quadrant of the Cartesian coordinate system. The slope of the fitting line being invariant under translation, $x \ge 0$ and $y \ge 0$ are assumed without loss of generality.

A rewriting $m'_{\text {slope}}$ of the slope $m_{\text {slope}}$ maps $({L}, {U}) \in \left( \prod _{i = 1}^n 2^{D_i}\right) ^2$ to:

case 1.
if denom$({U}, {L}) > 0$ then
1. (a)
  $\displaystyle \frac{\text {num}({L}, {U})}{\text {denom}({U}, {L})}$ if num$({L}, {U}) > 0$
2. (a)
  $\displaystyle \frac{\text {num}({L}, {U})}{\text {denom}({L}, {U})}$ otherwise
case 2.
if denom$({L}, {U}) < 0$ then
1. (a)
  $\displaystyle \frac{\text {num}({U}, {L})}{\text {denom}({L}, {U})}$ if num$({U}, {L}) < 0$
2. (b)
  $\displaystyle \frac{\text {num}({U}, {L})}{\text {denom}({U}, {L})}$ otherwise
case 3.
otherwise $+\infty $

where $\forall (X^1, X^2) = (X_1^1, \dots , X_n^1, X_1^2, \dots , X_n^2) \in \left( \prod _{i = 1}^n 2^{D_i}\right) ^2$:

num$(X^1, X^2) = \displaystyle \sum _{t \in \prod _{i \in I} X_i^2} x(t) \sum _{t \in \prod _{i \in I} X_i^2} y(t) - \left| \prod _{i \in I} X_i^1\right| \sum _{t \in \prod _{i \in I} X_i^1} x(t)y(t)$;
denom$(X^1, X^2) = \displaystyle \left( \sum _{t \in \prod _{i \in I} X_i^2} x(t)\right) ^2 - \left| \prod _{i \in I} X_i^1\right| \sum _{t \in \prod _{i \in I} X_i^1} x(t)^2$.

The equality $m'_{\text {slope}}(X, X) = m_{\text {slope}}(X)$, for any pattern $X \in \prod _{i = 1}^n 2^{D_i}$, derives from the equality $\frac{\text {num}(X, X)}{\text {denom}(X, X)} = m_{\text {slope}}(X)$, for cases 1 and 2 in the definition of $m'_{\text {slope}}$, and from the nullity of denom(X, X) in case 3.

The rewriting $m'_{\text {slope}}$ actually proves that $m_{\text {slope}}$ is piecewise (anti-)monotone. To show it, following Definition 8, let us take $U \in \prod _{i = 1}^n 2^{D_i}$, $X \in \prod _{i = 1}^n 2^{U_i}$ and $L \in \prod _{i = 1}^n 2^{X_i}$. L being a sub-pattern of X, its subsets of the dimensions with indexes in I are subsets of those of X, i.e., $\forall i \in I$, $L_i \subseteq X_i$. That implies $\prod _{i \in I} L_i \subseteq \prod _{i \in I} X_i$, which in turn implies both $\left| \prod _{i \in I} L_i\right| \le \left| \prod _{i \in I} X_i\right| $ and $\sum _{t \in \prod _{i \in I} L_i} x(t)^2 \le \sum _{t \in \prod _{i \in I} X_i} x(t)^2$. As a consequence, the (positive) quantity subtracted in the expression of denom is smaller if L, rather than X, is input as the first argument. U being a super-pattern of X, the first sum, in the expression of denom, involves more terms when U, rather than X, is input as the second argument. Because $x \ge 0$, that sum is greater and so is its square. Combining the results on both parts in the expression of denom, $\hbox {denom}(X, X) \le $$\hbox {denom}(L, U)$ stands. It entails $\hbox {denom}(X, X) > 0 \Rightarrow $$\hbox {denom}(L, U) > 0$, i.e., if (X, X) triggers case 1 of $m'_{\text {slope}}$ then (L, U) cannot trigger case 2.

The same steps as in the previous paragraph, but considering X or its super-pattern U as the first input of denom, X or its sub-pattern L as the second input of denom, prove $\hbox {denom}(U, L) \le $$\hbox {denom}(X, X)$. That inequality entails $\hbox {denom}(X, X) < 0 \Rightarrow $$\hbox {denom}(U, L) < 0$, i.e., if (X, X) triggers case 2 of $m'_{\text {slope}}$ then (L, U) cannot trigger case 1. Also, $\hbox {denom}(X, X) = 0$ implies both $\hbox {denom}(U, L) \le 0$ and $\hbox {denom}(L, U) \ge 0$, i.e., if (X, X) triggers case 3 then (L, U) triggers neither case 1 nor case 2. Given all the impossibilities proven so far, if (X, X) triggers case $k \in \{1, 2, 3\}$ in the definition of $m'_{\text {slope}}$ then (L, U) triggers either case k or case 3.

If (L, U) triggers case 3, $m_{\text {slope}}(X) = m'_{\text {slope}}(X, X) \le m'_{\text {slope}}(L, U) = +\infty $. It remains to prove $m_{\text {slope}}(X) \le m'_{\text {slope}}(L, U)$ when (X, X) and (L, U) both trigger case 1 or when they both trigger case 2. An analysis of the expression of num, which is analog to the earlier analysis of denom and uses both $x \ge 0$ and $y \ge 0$, proves $\hbox {num}(U, L) \le $$\hbox {num}(X, X) \le $$\hbox {num}(L, U)$ and, in sequence, the impossibility for (L, U) to trigger a sub-case (b) if (X, X) triggers the related sub-case (a). If, on the contrary, (X, X) triggers a sub-case (b) and (L, U) triggers the related sub-case (a) then $m(X) = m'_{\text {slope}}(X, X) \le m'_{\text {slope}}(L, U)$. Indeed, given the tests in $m'_{\text {slope}}$ and the inequations $\hbox {denom}(U, L) \le $$\hbox {denom}(X, X) \le $$\hbox {denom}(L, U)$ that were proven above, the sub-cases (a) always provide positive outputs, whereas the sub-cases (b) always provide negative (hence smaller) outputs.

Finally, when (X, X) and (L, U) trigger, in the definition of $m'_{\text {slope}}$, not only a same case but also a same sub-case, $m_{\text {slope}}(X) \le m'_{\text {slope}}(L, U)$ still stands. Indeed, the inequality $\hbox {num}(U, L) \le $$\hbox {num}(X, X) \le $$\hbox {num}(L, U)$ and the inequality $\hbox {denom}(U, L) \le $$\hbox {denom}(X, X) \le $$\hbox {denom}(L, U)$ together entail:

$m_{\text {slope}}(X) = \frac{\text {num}(X, X)}{\text {denom}(X, X)} \le \frac{\text {num}(L, U)}{\text {denom}(U, L)}$ if the two numerators and the two denominators are positive, i.e., in case 1a;
$m_{\text {slope}}(X) = \frac{\text {num}(X, X)}{\text {denom}(X, X)} \le \frac{\text {num}(L, U)}{\text {denom}(L, U)}$ if the two numerators are negative and the two denominators are positive, i.e., in case 1b;
$m_{\text {slope}}(X) = \frac{\text {num}(X, X)}{\text {denom}(X, X)} \le \frac{\text {num}(U, L)}{\text {denom}(L, U)}$ if the two numerators and the two denominators are negative, i.e., in case 2a;
$m_{\text {slope}}(X) = \frac{\text {num}(X, X)}{\text {denom}(X, X)} \le \frac{\text {num}(U, L)}{\text {denom}(U, L)}$ if the two numerators are positive and the two denominators are negative, i.e., in case 2b.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nadisic, N., Coussat, A. & Cerf, L. Mining skypatterns in fuzzy tensors. Data Min Knowl Disc 33, 1298–1322 (2019). https://doi.org/10.1007/s10618-019-00640-4

Download citation

Received: 22 October 2018
Accepted: 18 June 2019
Published: 04 July 2019
Issue Date: 01 September 2019
DOI: https://doi.org/10.1007/s10618-019-00640-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Mining skypatterns in fuzzy tensors

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

On the nature and types of anomalies: a review of deviations in data

A survey on topological structures on fuzzy rough sets

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Piecewise (Anti-)Monotonicity of the Slope Measure

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Mining skypatterns in fuzzy tensors

Abstract

Access this article

Similar content being viewed by others

An efficient join operations for utility list-based high-utility mining approaches using hybrid search technique

On the nature and types of anomalies: a review of deviations in data

A survey on topological structures on fuzzy rough sets

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

A Piecewise (Anti-)Monotonicity of the Slope Measure

A Piecewise (Anti-)Monotonicity of the Slope Measure

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation