Layered convolutional dictionary learning for sparse coding itemsets

Mansha, Sameen; Lam, Hoang Thanh; Yin, Hongzhi; Kamiran, Faisal; Ali, Mohsen

doi:10.1007/s11280-018-0565-2

Layered convolutional dictionary learning for sparse coding itemsets

Published: 11 May 2018

Volume 22, pages 2225–2239, (2019)
Cite this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

World Wide Web Aims and scope Submit manuscript

Layered convolutional dictionary learning for sparse coding itemsets

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Sameen Mansha¹,
Hoang Thanh Lam²,
Hongzhi Yin¹,
Faisal Kamiran³ &
…
Mohsen Ali³

460 Accesses
2 Citations
Explore all metrics

Abstract

Dictionary learning for sparse coding has been successfully used in different domains, however, has never been employed for the interesting itemset mining. In this paper, we formulate an optimization problem for extracting a sparse representation of itemsets and show that the discrete nature of itemsets makes it NP-hard. An efficient approximation algorithm is presented which greedily solves maximum set cover to reduce overall compression loss. Furthermore, we incorporate our sparse representation algorithm into a layered convolutional model to learn nonredundant dictionary items. Following the intuition of deep learning, our convolutional dictionary learning approach convolves learned dictionary items and discovers statistically dependent patterns using chi-square in a hierarchical fashion; each layer having more abstract and compressed dictionary than the previous. An extensive empirical validation is performed on thirteen datasets, showing better interpretability and semantic coherence of our approach than two existing state-of-the-art methods.

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

1 Introduction and motivation

Itemset mining deals with searching for a compact set of itemsets to summarize a given transaction dataset in the most effective and efficient way. For example, in market basket analysis, a dataset contains a number of items and transactions. Each transaction is a list of items a customer has purchased. We can examine which items are sold together to analyze user behavior, increase sales and make predictions. Early works in this domain focus on finding frequent items that satisfy minimum support thresholds for the analysis of datasets. Apriori is the first work introduced for finding frequent itemsets whose minimum support is above a user-specified threshold [2] and it has been applied extensively in numerous applications since then. Apriori and many similar algorithms, e.g., Eclat [40] and FPGrowth [11] suffer from the pattern explosion, i.e., high minsup thresholds lead to return a small number of well-known patterns. Additionally, these methods return an incredibly large number of patterns for small values of minimum support threshold, many of which are only variations of the same theme. For example, if we learn from the transactions that bread and butter are often purchased together and many people buy milk, then it is entailed by redundancy to inspect if these three items are purchased together [30]. A few other works tried to solve this problem [4, 9, 22] but they do not fully resolve the problem of pattern explosion [1].

This field is introduced as ‘Interesting Itemset Mining’ by advanced itemset mining community which focus on finding the non-redundant and self-sufficient summary of data [7, 8, 13, 17, 21, 26, 28, 30]. These works have achieved comparatively interesting and nonredundant patterns than the frequent itemset mining works. This helps to intelligently analyze the data-driven problems from the domain of finance, graph search [20, 24, 44], recommendation systems [23, 36, 38, 46] and data engineering [34, 39, 45, 47] etc. however, existing works do not consider sparsity constraints of the encoding. In some applications, e.g., compression of transaction databases, sparsity constraint might be preferred to limit the maximum size of each selected itemset. Additionally, these works do not learn a convolutional hierarchical representation of data.

In this paper, we propose a convolutional sparse coding-based approach for interesting itemset mining that is essentially different from the tasks in image processing domain with real values. We propose a matching pursuit greedy approach which performs dictionary learning from transaction data to reduce data loss compression under sparsity constraint. To further enhance its performance, we embed our sparse coding algorithm into a convolutional neural network based architecture such that each layer learns a complex discrete representation from the transformed database. This resembles state-of-the-art convolutional sparse coding in the image processing domain [14, 42]. Adding sparse representation of images and signals into training instances helps to improve the classification accuracy [3]. Nevertheless, leveraging the sparse representation of itemsets to enhance the performance of classifiers (e.g., Naive Bayes, Decision Trees, Random Forest etc) is still an open question. To summarize, we make the following contributions:

Sparse coding of itemsets is first time addressed and formulated as an optimization problem. We prove it NP-hard by reducing it to set cover problem. We propose approximation based sparse coding algorithm, Dictionary Learning for Sparse Coding of Itemsets (DSI) to efficiently learn nonredundant dictionary elements for lossless compression. It provides a bottom-up mapping from transaction to dictionary items, efficiently giving a reconstruction close to the original transactions.
We propose a new approach Layered Convolutional Dictionary Learning for Sparse Coding of Itemsets (CDSI) to deploy sparse coding within a convolutional resembling model to learn grouping representation at each level. The dictionary itemsets are interfused in the database to learn a meaningful representation.
An extensive empirical validation on thirteen datasets shows the superiority of our proposed methods as compared to the recent works. A text dataset (JMLR) is used to evaluate the pattern meaningfulness just by eyeballing. Transactions of nine UCI [6] and three SIPO [21] datasets (Section 6.3) are sparse coded, to determine its impact on the prediction accuracy of different classifiers.

Section 2 discusses the other important related works. Our targeted problem is formally defined and proved NP-hard in Section 3. Greedy approach for sparse representing itemsets is presented in Section 4. In Section 5, we explain a layered convolutional process for transforming database and dictionary learning. Section 6 describes our extensive empirical validation in detail. We conclude our work with future directions in Section 7.

2 Related work

Our model draws inspiration from the field of sparse dictionary learning, convolutional sparse coding and itemset mining that operates on transaction data. In what follows, we provide a brief overview of related work in these fields.

2.1 Sparse dictionary learning

Sparse coding is an unsupervised algorithm which is widely used in signal and image processing to compress images or signals using a compact set of basis learned from data. It discovers basis functions called dictionary to adapt it to specific data, an approach that has recently proven to be very effective for signal reconstruction and classification in the audio and image processing domain [15, 16]. A dictionary consisting of image edges can give a better representation of images than the pixel intensity values. Sparsity constraint is enforced to restrict the size of the basis for sparse coding image hence the dictionary is overcomplete. Sparse dictionary learning mainly deals with the continuous data while in practice many datasets are discrete. Continuing this highly promising line of work, we explore how to represent itemsets under sparsity constraint and learn dictionary. Though the idea of coding binary data is not new, handling of discrete data for the sparse coding problem is still challenging. It is in high demand to study sparse coding techniques for discrete data.

2.2 Layered convolutional sparse dictionary learning

A sparse feature vector is computed to reconstruct the original input vector by minimizing an energy function. A highly redundant representation of an image is produced if patches are processed independently, as these features can be correlated. The sparse coding algorithm cannot capture dependencies alone. To address this problem, a variety of convolutional sparse coding methods have been introduced in the image processing domain [12, 42]. These techniques are based on the convolutional decomposition of input data to learn dictionary under a sparsity constraint. It is a top-down approach seeking to generate the input signal by summing up the convolutions of the feature maps with learned filters. Sparsity limits the representation by imposing size restriction at each layer, which facilitates assembling parsimonious features into more complex structures. A convolutional sparse coded dictionary contains rich information which many existing feature detectors cannot detect.

2.3 Interesting itemset mining

In this section, we summarize a few recent works who mine small, high quality and non-redundant patterns that yield the best lossless compression of the database. This problem is already proved NP-hard [27]. A few clustering based approaches are used to create frequent feature value pairs belonging to a specific cluster. The compression ratios are dependent on the number of clusters. Outliers detection and compaction gain are bottlenecks in this work [5]. MTV [18] uses Minimum Description Length (MDL) principle together with the maximum entropy distribution to directly calculate the expected frequencies of itemsets and identify interesting contents. KRIMP [28] applies MDL principle to create a simple two column translation based code table that optimally describes the data. The candidate itemset is selected w.r.t. the standard candidate order. It uses a cover algorithm to select a smaller compressed sized encoding. KRIMP candidate generation technique requires high running time and selecting right threshold values for larger databases or candidate collections is challenging. SLIM [25] addressed this issue by directly mining descriptive patterns from the data. It uses MDL along with an accurate heuristic to greedily construct patterns in a bottom-up fashion. OPUS Miner [31] is a branch and bound approach which deploys two pruning mechanisms considering itemset values and statistical significance levels. It finds top k productive and nonredundant itemsets to identify small sets of key associations ultimately leading to self-sufficient itemsets. Interesting Itemset Miner (IIM) [7] uses a generative model over itemsets in the form of Bayesian networks. The greedy approximation based weighted set cover approach infers interesting itemsets.

3 Problem definition and proof of NP-hardness

For ease of presentation, we first introduce some preliminary concepts and notations. Let D = {T₁,T₂,⋯ ,T_n} be a database of n transactions where each transaction belongs to a set of items I = {i_p|1 = 1,.....,p}. The cardinality of a transaction is the number of items in it. When a set of items called itemset, contains p items, it is referred as p-itemset. We aim to learn a dictionary B = {I₁,I₂,⋯ ,I_m} of m basis (itemsets), from which discrete sparse code of the database can be inferred. A sparse code of transaction T is the union of k itemsets U(b): $\cup ^{k}_{i = 1}B_{i}$ from B such that U(b) ⊆ T and k is less than the cardinality of T. With these notations, we formulate the following research problems:

Problem 1

[Finding sparse representation of transaction T] Given a dictionary B and sparsity constraint (k: the maximum number of basis to choose from B), a sparse code of T is denoted as B(T):

$$ B(T) = \arg min_{b \subset B, |b| \leq k}|T - U(b)| $$

(1)

where U(b) represents a set of items in T that are covered by b.

Example 1

Given T = qrvwx, B = {qr, vw, vy, yz} when k is set to 1. The basis are B(T) = {qr}, and when k = 2, U(B) = {qrvw}.

Sparse coding over the whole database D with the basis B incurs a loss function defined as $L_{B}(D) = {\Sigma }^{n}_{j = 1} |T_{j} - U(B(T_{j}))|$. In Example 1, the loss for B(T) = {qr} to encode T = qrvwx is 3 while the loss for B(T) = {qr, vw} is 1. Since vy is not a subset of qrvwx, it cannot be added into B. To better preserve the original information contained in a transaction database, a beneficial dictionary with less encoding loss is expected to be learned.

Problem 2

[Dictionary learning from candidates] Given a database of transactions D = {T₁,T₂,⋯ ,T_n}, the maximum number of basis allowed in a sparse code (sparsity constraint) k, and a set of candidate itemsets C, find a dictionary B^∗⊂ C with maximum m basis, such that B^∗ = arg min_B⊂CL_B(D).

To solve Problem 2, people may solve the following problem first:

Problem 3

[Candidate set Construction from database] The encoding loss function in Problem 2 requires a candidate set C for inclusion in the dictionary. How to construct a high-quality candidate set C from the database D is another challenging and important problem, as the C contents determine the quality of the learned dictionary and the encoding loss of the database to some extent.

Theorem 1

Problem 1 is NP-hard.

Proof

We prove the NP-hard nature of problem by reduction to the set-cover problem. Let S = {1,2,⋯ ,n} and H = {s₁,s₂,⋯ ,s_m}, where s_i ⊂ S. Set cover problem asks whether we can construct a set x ⊂ H such that |x| = k and $\cup ^{k}_{i = 1}s_{i}= S$. Let T = S and B = H, then solving (1) will result in sparse representation of T, that is b^∗⊂ B such that |T − U(b)| is minimized. Let b^∗ be the solution to Problem 1. If |T − U(b)| = 0 then it is easy to see that b^∗ is the set cover of S otherwise if the size is more than zero then no set-cover of size k exists. Hence solving (1) will solve the set-cover problem. Problem 1 has been reduced to the set cover problem and this reduction is polynomial in the problem input size. Hence, the theorem is proved. □

4 Dictionary Learning for Sparse Coding Itemsets (DSI)

In this section, we present our proposed algorithmic framework (DSI) to learn sparse code dictionary in detail, and the pseudocode is given in Algorithm 1. It iteratively selects m basis from a set of candidate itemsets C. In each iteration, a single itemset I from C is chosen to form a transitory dictionary B⁺ with already selected itemsets, and then the encoding loss for the database based on B⁺ is computed (lines 7–9). In addition, it also calculates the number of overlapping items between the newly selected itemset I and learned dictionary B. The new itemset I is added to the dictionary if the loss and overlaps with selected basis are less than other candidates so far (lines 10–14). We present Example 2 for the better understanding of DSI:

Example 2

Assume that we have a database D = {T₁ = qrvwx, T₂ = qrvyz, T₃ = qrvwyz}, C = {qr, vw, vy, yz}, m = 3 and k = 2. We explain Table 1 to show how Algorithm 1 works:

Step 1: Initially B is empty. In each iteration (lines 6 -14), we look for an itemset I such that when we add it to B, the database D can be encoded with minimum loss and overlaps. Step 1 shows loss of encoding each transaction in D using candidates I from C. As observed, the overall loss is minimum when I = qr with the loss equal to 10. Therefore qr is added to B = {qr} and deleted from C.
Step 2: The next itemset that works together with B to minimize the overall loss is I = vw with the overall loss equal to 6. We update B to {qr, vw} and remove vw from C accordingly.
Step 3: We calculate the encoding loss for each remaining candidate in C considering the learned dictionary B. We can see that {vy} and {yz} lead to the same loss value of 4. Nonetheless, item v in vy intersects with the dictionary element vw, making the overlap O_vy to be 1. On the other hand, yz has no overlap with itemsets in dictionary B, i.e., O_yz= 0. Ultimately, we update B to {qr, vw, yz} and stop the algorithm after selecting m = 3 basis.

Table 1 The illustration of running Algorithm 1 in Example 2. Selected items are emphasized in bold

Full size table

DSI uses a greedy method (MaxSetCover) to calculate the encoding loss, pseudocode is given in Algorithm 2. Our loss calculation method greedily encodes every transaction T_i ∈ D with the basis of B⁺. Algorithm 2 follows the standard procedure to solve the max set cover problem which guarantees an approximation factor of $1-\frac {1}{e}$ to the optimal solution [27]. The algorithm inputs a transaction T, a set of potential candidates C and a sparsity parameter k. It performs a greedy strategy to solve the max set cover problem by selecting up to k basis that curtail the encoding loss simultaneously. It returns the encoding loss, i.e., the number of items in T that have not been covered by the selected basis from B⁺. Example 3 explains the working of matching pursuit greedy approach given in Algorithm 2.

Example 3

Assume that T = qrvwyz, C = {qr, vw, vy, yz} and k = 2, Algorithm 2 performs following steps to calculate encoding loss used in Table 1:

Step 1: Initially G is empty.
Step 2: The itemset I ∈ C that maximizes the coverage of T is qr, so qr is added to G and deleted from C. (G = {qr} , C = {vw, vy, yz}).
Step 3: The next itemset I ∈ C that together with selected itemsets in G maximizes the overall coverage of T is vw so R = {qr, vw}. The algorithm stops when sparsity limit approaches, i.e., k = 2. The encoding loss is two (|T − U(G)| = 6 − 4 = 2), as two items in T are not covered by G.

5 Layered Convolutional Dictionary Learning for Sparse Coding of Itemsets (CDSI)

In this section, we introduce a novel convolutional sparse coding mechanism (CDSI) to learn statistically dependent sparse dictionary in a hierarchical fashion. Dictionary items are convolved in each layer to transform the database; allowing next layer to learn more complicated patterns. This is similar to the idea of the deep learning technique: Convolutional Neural Networks (CNNs) [14], where learned filters are convolved with the input image and next layer of convolutional filters work on the output of the previous layer, allowing CNN to capture features at different levels of abstractness [41]. The convolution process has an advantage that the itemsets are learned in a hierarchical way, and various dictionaries with different-granularity abstractions can be achieved for different applications. We provide an overview of our layered convolutional dictionary learning algorithm below, and outline how it works:

1.
Construct a candidate set C using chi-square (see Section 5.1 for a discussion of how to construct a meaningful candidate set).
2.
Run Algorithm 1 to learn a dictionary from C that sparse-codes the database D well.
3.
Run Algorithm 3 to transform the database D using the learned dictionary in the second step (see Section 5.2).
4.
To learn patterns in the next layer, return to step 1.

5.1 Candidate set construction

Quality of sparse dictionary learning (Algorithm 1) is highly dependent upon the contents of candidate set C. To build up C, a possible solution is to use frequent pattern mining algorithm such as the Apriori algorithm [2] which is subject to explosion (see Chapter 2 of [1]). In this section, we propose a refined approach to find statistically dependent itemsets. Intuitively, a pattern is only admissible if there is a strong dependency and correlation. Therefore, in order to compose the candidate set C, we use chi-square test [32]. Let q and r be two items and we define:

F_qr = |{T_i ∈ D|qr ∈ T_i}|, i.e., the frequency of the itemset qr.
$F_{q\bar {r}} \,=\, |\{T_{i} \!\in \! D| q \!\in \! T_{i}, r \!\notin \! T_{i}\}|$, i.e., the number of transactions that contain q but not r.
$F_{\bar {q}r} \,=\, |\{T_{i} \in D| q \notin T_{i}, r \in T_{i}\}|$, i.e., the number of transactions that contain r but not q.
$F_{\bar {q}\bar {r}} \,=\, |\{T_{i} \in D| q \notin T_{i}, r \notin T_{i}\}|$, i.e., the number of transactions that neither contain q nor r.
$E_{qr} = \frac {F^{2}_{qr}}{N}$, i.e., the expected frequency of qr given the assumption that q is independent from r.
$E_{q\bar {r}} = \frac {F^{2}_{q\bar {r}}}{N}$, i.e., the expected number of transactions that contain q but not r.
$E_{\bar {q}r} = \frac {F^{2}_{\bar {q}r}}{N}$, i.e., the expected number of transactions that contain r but not q.
$E_{\bar {q}\bar {r}} = \frac {F^{2}_{\bar {q}\bar {r}}}{N}$, i.e., the expected number of transactions that neither contain q nor r.

The chi-square statistics is defined as follows:

$$\begin{array}{@{}rcl@{}} chi-square = \frac{(F_{qr} - E_{qr})^{2}}{E_{qr}} + \frac{(F_{\bar{q}r} - E_{\bar{q}r})^{2}}{E_{\bar{q}r}} + \frac{(F_{q\bar{r}} - E_{q\bar{r}})^{2}}{E_{q\bar{r}}} + \frac{(F_{\bar{q}\bar{r}} - E_{\bar{q}\bar{r}})^{2}}{E_{\bar{q}\bar{r}}} \end{array} $$

(2)

If q and r are statistically independent then it follows a chi-square distribution with one degree of freedom. Based on this observation, we can test for our null hypothesis: q and r are statistically independent. The test can be performed for any pair of items in the database and only pair that passes the test (when null hypothesis is rejected at a significant level of 0.05) will be scrutinized as potential itemsets in the candidate set C. Adding statistically dependent item pairs into the candidate set, ultimately leads to the dictionary learning by running Algorithm 1.

5.2 Database transformation and convolution

We elucidate database transformation process with the toy database described in Example 2. Given a dictionary B = {qr, vw, yz}, Algorithm 3 transforms the database D into an advanced database with refined items where each item corresponds to an itemset in the dictionary B. Let us re-write the basis itemsets in B as B = {α = qr, β = vw, γ = yz}, where each basis itemset in B is now represented by a new item (symbol) that is not present in the current alphabet. Algorithm 3 transforms the database D = {T₁ = qrvwx, T₂ = qrvyz, T₃ = qrvwyz} into D^′ = {T₁ = αβx, T₂ = αvγ, T₃ = αβγ}. The new database D^′ contains transactions with dependent itemsets {α, β, γ, v, x}, while the original itemsets were {q, r, v, w, x, y, z}. Table 2 shows the process of dictionary learning for the transformed database D^′ at second layer. Note that the candidate set C in this example is constructed by randomly selecting item pairs from the transaction database, as there are only three transactions, making it impossible for chi-square to find any dependent patterns.

Table 2 CDSI: Dictionary learning from convolved and transformed database at second layer. Items placed in B = {αβ, γx, αv} are highlighted in bold

Full size table

6 Experiments

In interesting itemset mining, a powerful representation of data has higher values of (i) pattern interpretability, and (ii) classification accuracy. Our extensive empirical validation also considers these criteria to evaluate the effectiveness of proposed algorithmic framework. We compare our proposed sparse coding techniques with the IIM [7] and MTV [17], because they represent the state-of-the-art techniques for itemset mining and significantly outperform existing approaches developed in [21, 26, 28] on similar standard datasets as adopted in our experiment.

6.1 Dataset description

We use discretized version of Semi Interval Partial Order (SIPO) datasets (introduced in [21]) and UCI datasets [6] for classification. Table 3 summarizes the characteristics of datasets used. It is always a challenging task to measure the meaningfulness of discovered patterns as a potential solution, thus text datasets are used to informally evaluate the quality by comparing pattern interpretability and relevance. We use the JMLR abstract text dataset from Journal of Machine Learning website^{Footnote 1} which is easy to interpret.

Table 3 Summary of datasets

Full size table

6.2 Interpretability of Sparse Representation

Table 4 shows MTV returns interrelated and less diverse frequent patterns, e.g., “synthetic real”, “real datasets”, “train classifi”, “classifi class”, etc. IIM derives relevant patterns (e.g., “anomali detect” and “semi supervised”, etc.), however, a few patterns (e.g., “parameter”, “parameters” and “sequenc”, “sequential”) require stemming to remove redundant patterns. Patterns extracted by CDSI at 4th layer of convolution dictionary with parameters (m = 10,k = 5) are also given. We can observe that CDSI generates more revealing, diverse and comprehensive patterns, e.g., “machine learning”, “graphic variable”, “probabl distribut”, etc. Besides, they do not require stemming. To conclude, CDSI comparatively generates interpretable, heterogeneous and less redundant patterns.

Table 4 Top 10 non-singleton patterns selected from the JMLR abstracts dataset to compare pattern interpretability for CDSI (Section 5), IIM [7] and MTV [17]

Full size table

6.3 Classification accuracy

Classification accuracy inflates conceding that sparse representation techniques or interesting itemset mining algorithms are employed on data [13, 33]. Table 5 presents a fictitious scenario to explain our experimental setup with a database D containing 5 transactions: D = {T₁, T₂, ⋯ , T₅} and two class labels {A, L}. These transactions are illustrating the purchase of items {q, r, v, w, x} with the proportionate input vector presentations, e.g., (T₁) ={1,1,1,0,0}. Since 0 and 1 exhibit if any specific item has been purchased, third element of T₁ is set 1 to depict purchase of v. These labeled transactions are feed to various classifiers. To evaluate if mined patterns are boosting the classification accuracy, we wrap them as binary values within transactions. To do so, we increase the length of transactions to append discovered patterns. Let us say if CDSI discovers two patterns (r, v) and (r, x) then 6^th and 7^th elements are added in each transaction to demonstrate the presence of distinct pattern (given in the extended transaction column of Table 5). Now the vector representations of T₁ will become {1,1,1,0,0,1,0} while preserving record of purchase of remaining items q, r, v.

Table 5 Data preparation for classification using CDSI (Section 5), IIM [7] and MTV [17] mined patterns as binary features

Full size table

Table 6 presents the accuracy of different classifiers (e.g., Naive Bayes, J48, Random Forest, and IBk) for SIPO and UCI datasets described in Table 3. To be unbiased, the number of mined patterns is set to minimum patterns returned by any of the interesting itemset mining algorithms. These patterns are incorporated in the transactions (singletons) following the way extended input vectors are created in Table 5. We run our experiments using WEKA [10] over 5 fold cross-validation with parameters set to default values. Patterns are extracted using CDSI (with parameters layers = 10, k = 10), IIM [7] and MTV [17] (default parameter values adjusted in online available codes are used for existing approaches). Each cell of Table 6 shows the accuracies of different methods for respective classifiers. The highest prediction accuracy for any input vector type is emphasized in bold. The last column (Best) shows the highest accuracy for all types of input data and highlights the topmost value in bold. The prediction accuracy of all datasets increases when extended transactions are fed in comparison to when the classifier is only trained on the transactions themselves (singletons). Generally, CDSI significantly improves the prediction accuracy certifying our assumption about convolutional sparse coded dictionary carrying influential objective information.

Table 6 CDSI (Section 5) improves the prediction accuracy of IIM [7] and MTV [17]. For a fair comparison, identical number of patterns returned from each method are used

Full size table

7 Conclusions and future work

Convolutional sparse model dictionary learning has been used before in the image processing domain [12, 42], it is still not studied for the itemset mining so far. In this paper, we present approximation based algorithms to find the sparse representation of itemsets, which is discrete in nature. We propose an optimization technique to learn dictionary under the sparsity constraint from the transaction dataset. Based on this mechanism, a convolutional dictionary learning method is presented that allows extracting dictionaries at different levels of abstractness. Chi-square test is performed to extract statistically dependent patterns from the transaction data and input it to the layered dictionary learning algorithm; generating increasingly complex and statistically dependent patterns in each layer. We conduct extensive experiments on various datasets showing that sparse representation forms a succinct input representation and when combined with different classifiers, their efficacy is increased. In future, we plan to incorporate layered convolutional sparse dictionary learning techniques to tackle sequential, streaming and uncertain data mining problems [13, 19, 29, 35, 37, 43].

Notes

http://jmlr.csail.mit.edu/

References

Aggarwal, C.C., Han, J.: Frequent Pattern Mining. Springer, Berlin (2014)
Book MATH Google Scholar
Agrawal, R., Srikant, R., et al.: Fast Algorithms for Mining Association Rules. Morgan Kaufmann, San Mateo (1994)
Google Scholar
Boureau, Y.-l., Cun, Y. L., et al.: Sparse feature learning for deep belief networks. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1185–1192 (2008)
Calders, T., Goethals, B.: Non-derivable itemset mining. Data Min. Knowl. Discov. 14(1), 171–206 (2007)
Article MathSciNet Google Scholar
Chandola, V., Kumar, V.: Summarization—compressing data into an informative representation. Knowl. Inf. Syst. 12(3), 355–378 (2007)
Article Google Scholar
Coenen, F.: The lucs-kdd discretised/normalised arm and carm data library. http://www.csc.liv.ac.uk/frans/KDD/Software/LUCS-KDD-DN/DataSets/dataSets.html
Fowkes, J., Sutton, C.: A bayesian network model for interesting itemsets. In: Proceeding of European Conference Machine Learning and Knowledge Discovery in Databases, pp. 410–425. Springer, Berlin (2016)
Fowkes, J., Sutton, C.: A subsequence interleaving model for sequential pattern mining. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining, pp. 835–844. KDD, USA (2016)
Geerts, F., Goethals, B., Mielikäinen, T.: Tiling Databases, pp. 278–289. Springer, Berlin (2004)
MATH Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. ACM SIGKDD Explor. Newsl. 11 (1), 10–18 (2009)
Article Google Scholar
Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. SIGMOD Rec. 29(2), 1–12 (2000)
Article Google Scholar
Kavukcuoglu, K., Sermanet, P., Boureau, Y.-L., Gregor, K., Mathieu, M., Cun, Y.L.: Learning convolutional feature hierarchies for visual recognition. In: Proceedings of Advances in Neural Information Processing Systems, pp. 1090–1098 (2010)
Lam, H.T., Mörchen, F., Fradkin, D., Calders, T.: Mining compressing sequential patterns. Stat. Anal. Data Min. 7(1), 34–52 (2014)
Article MathSciNet Google Scholar
LeCun, Y., et al: Lenet-5, convolutional neural networks. http://yann.lecun.com/exdb/lenet (2015)
Lee, H., Battle, A., Raina, R., Ng, A.Y.: Efficient sparse coding algorithms. In: Proceedings of Advances in Neural Information Processing Systems, pp. 801–808 (2006)
Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: Proceedings of International Conference on Machine Learning, pp. 689–696 (2009)
Mampaey, M., Tatti, N., Vreeken, J.: Tell me what i need to know: succinctly summarizing data with itemsets. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining, pp. 573–581 (2011)
Mampaey, M., Vreeken, J., Tatti, N.: Summarizing data succinctly with the most informative itemsets. ACM Trans. Knowl. Discov. Data 6(4), 16:1–16:42 (2012)
Article Google Scholar
Mansha, S., Babar, Z., Kamiran, F., Karim, A.: Neural network based association rule mining from uncertain data. In: Proceedings of Neural Information Processing, pp. 129–136. Springer, Berlin (2016)
Mansha, S., Kamiran, F., Karim, A., Anwar, A.: A self-organizing map for identifying influentialcommunities in speech-based networks. In: Proceedings of ACM International on Conference on Information and Knowledge Management, pp. 1965–1968. CIKM (2016)
Mörchen, F., Fradkin, D.: Robust mining of time intervals with semi-interval partial order patterns. In: Proceedings of Society for Industrial and Applied Mathematics (2010)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Efficient mining of association rules using closed itemset lattices. Inf. Syst. 24(1), 25–46 (1999)
Article MATH Google Scholar
Shang, S., Ding, R., Yuan, B., Xie, K., Zheng, K., Kalnis, P.: User oriented trajectory search for trip recommendation. In: Proceedings of the 15th International Conference on Extending Database Technology, pp. 156–167. ACM (2012)
Shang, S., Chen, L., Jensen, C.S., Wen, J.-R., Kalnis, P.: Searching trajectories by regions of interest. IEEE Trans. Knowl. Data Eng. 29(7), 1549–1562 (2017)
Article Google Scholar
Smets, K., Vreeken, J.: Slim: directly mining descriptive patterns. In: Proceedings of SIAM International Conference on Data Mining, pp. 236–247 (2012)
Tatti, N., Vreeken, J.: The long and the short of it: summarising event sequences with serial episodes. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining, pp. 462–470. KDD (2012)
Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2013)
Google Scholar
Vreeken, J., Van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Discov. 23(1), 169–214 (2011)
Article MathSciNet MATH Google Scholar
Wang, W., Yin, H., Sadiq, S., Chen, L., Xie, M., Spore, X. Zhou.: A sequential personalized spatial item recommender system. In: Proceedings of International Conference on Data Engineering, pp. 954–965 (2016)
Webb, G.I.: Self-sufficient itemsets: an approach to screening potentially interesting associations between items. ACM Trans. Knowl. Data Discov. 4(1), 3:1–3:20 (2010)
Article MathSciNet Google Scholar
Webb, G.I., Vreeken, J.: Efficient discovery of the most interesting associations. ACM Trans. Knowl. Discov. Data 8(3), 15:1–15:31 (2013)
Article Google Scholar
Weisstein, E.W.: Chi-squared test. From MathWorld–A Wolfram Web Resource (1999)
Wright, J., Yang, A.Y., Ganesh, A., Sastry, S.S., Ma, Y.: Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
Article Google Scholar
Xie, K., Deng, K., Shang, S., Zhou, X., Zheng, K.: Finding alternative shortest paths in spatial networks. ACM Trans. Database Syst. 37(4), 29:1–29:31 (2012)
Article Google Scholar
Xie, Q., Shang, S., Yuan, B., Pang, C., Zhang, X.: Local correlation detection with linearity enhancement in streaming data. In: Proceedings of ACM International on Conference on Information and Knowledge Management, pp. 309–318. CIKM (2013)
Xie, M., Yin, H., Wang, H., Xu, F., Chen, W., Wang, S.: Learning graph-based poi embedding for location-based recommendation. In: Proceedings of ACM International on Conference on Information and Knowledge Management, pp. 15–24 (2016)
Yang, B., Guo, C., Jensen, C.S., Kaul, M., Shang, S.: Stochastic skyline route planning under time-varying uncertainty. In: Proceedings of IEEE International Conference on Data Engineering, pp. 136–147 (2014)
Yin, H., Sun, Y., Cui, B., Hu, Z., Lcars, L. Chen.: A location-content-aware recommender system. In: Proceedings of the ACM International Conference on Knowledge Discovery and Data Mining, pp. 221–229. KDD (2013)
Yiu, M.L., Mamoulis, N., Papadias, D.: Aggregate nearest neighbor queries in road networks. IEEE Trans. Knowl. Data Eng. 17(6), 820–833 (2005)
Article Google Scholar
Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)
Article Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Proceedings of European Conference on Computer Vision, pp. 818–833. Springer, Berlin (2014)
Zeiler, M.D., Krishnan, D., Taylor, G.W., Fergus, R.: Deconvolutional networks. In: Proceeding of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2528–2535 (2010)
Zhang, A., Shi, W., Webb, G.I.: Mining significant association rules from uncertain data. Data Min. Knowl. Discov. 30(4), 928–963 (2016)
Article MathSciNet MATH Google Scholar
Zheng, K., Shang, S., Yuan, N.J., Yang, Y.: Towards efficient search for activity trajectories. In: Proceedings of IEEE International Conference on Data Engineering, pp. 230–241 (2013)
Zheng, K., Zheng, Y., Yuan, N.J., Shang, S.: On discovery of gathering patterns from trajectories. In: Proceedings of IEEE International Conference on Data Engineering, pp. 242–253 (2013)
Zheng, K., Su, H., Zheng, B., Shang, S., Xu, J., Liu, J., Zhou, X.: Interactive top-k spatial keyword queries. In: Proceedings of IEEE International Conference on Data Engineering, pp. 423–434 (2015)
Zhu, S., Wang, Y., Shang, S., Zhao, G., Wang, J.: Probabilistic routing using multimodal data. Neurocomputing 253(C), 49–55 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, Australia
Sameen Mansha & Hongzhi Yin
IBM Research, Dublin, Ireland
Hoang Thanh Lam
Information Technology University of The Punjab, Lahore, Pakistan
Faisal Kamiran & Mohsen Ali

Authors

Sameen Mansha
View author publications
You can also search for this author in PubMed Google Scholar
Hoang Thanh Lam
View author publications
You can also search for this author in PubMed Google Scholar
Hongzhi Yin
View author publications
You can also search for this author in PubMed Google Scholar
Faisal Kamiran
View author publications
You can also search for this author in PubMed Google Scholar
Mohsen Ali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sameen Mansha.

Additional information

This article belongs to the Topical Collection: Special Issue on Big Data Management and Intelligent Analytics

Guest Editors: Junping Du, Panos Kalnis, Wenling Li, and Shuo Shang

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mansha, S., Lam, H.T., Yin, H. et al. Layered convolutional dictionary learning for sparse coding itemsets. World Wide Web 22, 2225–2239 (2019). https://doi.org/10.1007/s11280-018-0565-2

Download citation

Received: 28 February 2018
Revised: 30 March 2018
Accepted: 05 April 2018
Published: 11 May 2018
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s11280-018-0565-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Layered convolutional dictionary learning for sparse coding itemsets

Abstract

1 Introduction and motivation

2 Related work

2.1 Sparse dictionary learning

2.2 Layered convolutional sparse dictionary learning

2.3 Interesting itemset mining

3 Problem definition and proof of NP-hardness

Problem 1

Example 1

Problem 2

Problem 3

Theorem 1

Proof

4 Dictionary Learning for Sparse Coding Itemsets (DSI)

Example 2

Example 3

5 Layered Convolutional Dictionary Learning for Sparse Coding of Itemsets (CDSI)

5.1 Candidate set construction

5.2 Database transformation and convolution

6 Experiments

6.1 Dataset description

6.2 Interpretability of Sparse Representation

6.3 Classification accuracy

7 Conclusions and future work

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation