Elsevier

Neurocomputing

Volume 336, 7 April 2019, Pages 27-35
Neurocomputing

Mining concise patterns on graph-connected itemsets

https://doi.org/10.1016/j.neucom.2018.03.084Get rights and content

Abstract

The itemset is a basic and usual form of data. People can obtain new insights into their business by discovering its implicit regularities through pattern mining. In some real applications, e.g., network alarm association, the itemsets usually have the following two characteristics: (1) the observed samples come from different entities, with inherent structural relationships implied in their static properties; (2) the samples are scarce, which may lead to incomplete pattern extraction. This paper considers how to efficiently find a concise set of patterns on such kind of data. Firstly, we use a graph to express the entities and their interconnections and propagate every sample to every node with a weight, determined by the pre-defined combination of kernel functions based on the similarities of the nodes and patterns. Next, the weight values can be naturally imported into the MDL-based filtering process and bring a differentiated pattern set for each node. Experiments show that the solution can outperform the global solution (trading all nodes as one) and isolated solution (removing all edges) on simulated and real data, and its effectiveness and scalability can be further verified in the application of large-scale network operation and maintenance.

Introduction

Pattern mining aims to discover potential co-occurrence relationships among the items in a database. Classical frequent pattern methods, such as Apriori [1] and FP-Growth [2], extract qualified patterns by generating, sorting and filtering the possible candidates from data by merely counting their occurrences. These patterns can be used either as a final result for human analyzers or as intermediate features for subsequent data mining tasks, such as classification, clustering, etc. These methods are employed in wide variety of domains, for its intuitive design and ability to run fast.

However, some common defects exposed cannot be overlooked in practice. The main problem is the pattern explosion [3]. If a highly frequent sub-itemset passes the threshold examination, it will probably have many similar companions that also satisfy the test. In this case, it is likely that a large number of redundant results will be obtained. For human labors, it is tedious to check and comprehend them one by one; for data mining tasks, it may make the subsequent model overfit on latent noise and deteriorate its predicting accuracy. A straightforward approach here is to set the filtering threshold (i.e., support) high enough, to control the total number of patterns in a reasonable range. Nevertheless, this may cause the final results less informative, for they are so apparent that they sometimes can even be found only by bare eyes.

To solve this problem, people have switched their attention from frequent patterns to interesting or useful patterns. The critical issue is to re-design a more meaningful, and also computable optimization target. One popular approach is to filter out redundant patterns based on the MDL criterion, the Krimp algorithm [3]. It assumes that the set of patterns to be sought is a dictionary to encode the data, including code table itself and data body, and the corresponding compressed size is calculated based the total empirical entropy. Be aware of searching for the best combinations of patterns is an NP-hard problem, Krimp employs a heuristic approach to find a sub-optimal solution in polynomial time. Experiments in paper [3] show that, Krimp can generally reduce the number of outputs at least 2–3 orders of magnitude, and thus uncover those rare but helpful patterns.

Here we mainly consider how to apply the Krimp on itemsets with their structural relationships. In relational databases, the historical records are usually not produced namelessly, but with an identity field, marking who has generated them. Besides, there is typically a property table, where the identities act as a primary key instead of the foreign key in records, describing the static attributes of each entity, such as a user’s basic information, or a networking device’s uplink and downlink, etc. These properties makes sense in two respects. Firstly, it is possible to make some “reuse” of data between similar objects to improve the completeness and robustness of results, by analyzing the similarity between entities when the sample size is insufficient. Secondly, sometimes users are more concerned about each entity’s specific patterns to see whether it has a unique personality, in contrast to the traditional global approach.

Similar to the scenario of multi-task learning, the key question here is how to acquire and exploit the relatedness of multiple tasks [4]. The cross-task structure can be either learned through the data or defined from prior knowledge. Once we have it, the relationship can be used to direct the data sharing between multiple tasks; or to provide a regularization for co-training multiple models; or, alternatively, to control a multi-output models complexity – in fact, all these three statements are equivalent. For our problem discussed here, unfortunately only a few samples can be collected, compared to the number of entities, and there is definitely no label in its unsupervised setting, thus we have to rely on domain knowledge to define the relevance and reflect it into the model through sensible regularization conditions.

The Krimp applies the MDL principle directly to the raw data through the self-defined heuristic search, which indeed bypasses the usual numerical optimization methods. A more practical multi-tasking solution is not to construct multiple coding tables simultaneously by some sophisticated tactics, but to make records under every node fully visible to each other, yet differentiate them by generating distinct weight matrices based on their relative positions and content. This similarity of nodes on a graph can be derived either by a classic random walk or a max entropy random walk. Next, the weights can be introduced directly into the Krimps criterion of evaluating pattern sets based on MDL. Moreover, this whole computation process is easy to be parallelized, and the performance can be enhanced almost linearly on a multi-core machine without much effort.

In the sections below, we first introduce the scenario of network alarm analysis and point out why the multi-task pattern mining is important, then summary the related works we have surveyed in Section 3. Next, some necessary background of compressing patterns, distribution embedding, and graph kernels are briefly given. In Sections 5 and 6, the theoretical model and algorithm design are presented in detail. A set of experiments are designed and conducted in Section 7. Finally, we summarize the whole article.

Section snippets

Motivated application: network alarm correlation

Nowadays, mobile communication has become a pivot ingredient for everyone’s life. Wireless carriers, to provide such a service, deploy a grid of base stations at an appropriate density to achieve ubiquitous coverage of a geographic area and provide adequate bandwidth with reliability in a load-balanced manner. This type of network is often referred to as a cellular network because its stations are laid out in a hive-like layout (Fig. 1(b)), though the coverage area of one tower is of course not

Structural pattern mining

A recent survey of alarm correlation methods can be found in [5]. There are two approaches for this issue: the rule-based and data-based, the former mainly relies on experts to define hard-coded rules, while a large part of the latter preferred the pattern mining to generate rules on the machine itself. The initial work on mining association rules on alarms can be found in [6], which built a semi-automatical system in service mainly based on rules from human, and let the machine propose

MDL-based pattern mining

Let A be a character set, and an itemset I be a non-empty subset of A. A database D={t1,,tn} is a collection of transactions, and each transaction t is an itemset. The pattern X is also an itemset, which may appear in multiple transactions. Usually, we say that the transaction t supports X, if and only if Xt. Obviously, all the subsets of a transaction support it. The set of patterns contained by a database D is tD{X|Xt}.

For frequent pattern mining, the problem is to find out all the

Model

First of all, a global model is required to estimate the distribution density for all nodes with a limited amount of samples. The distribution density, in the form of weighted average of mapped existing samples, is estimated with two regularizations: one for preventing overfitting on noise, and the other for dumping the influence of every observed sample. Second, the available kernels, including graph and content, are combined in a separable form to enable the optimization process. A particular

Procedure

Let us analyze the complexity of the Algorithm 1 step by step. The step 4 needs O(|V|3) time to complete, and the step 5 needs only O(|E|)O(|V|2) and can be omitted later; The line 7 calls a matrix exponential function as the name of scipy.linalg.expm routine [42], using the Pade approximation [43] which complexity can be rough estimated as O(|V|3) [44]; the SMO method in step 10 requires O(|D|2.2) empirically to converge [40]; we can summarize the running time of all the above steps to O(|V|3+

Data

We use two types of data to verify the effectiveness of the solution, including the synthesis and real dataset based on the scenario in Section 2. There are two main reasons for generating a simulated dataset: (1) whether the behavior of the algorithm is consistent with what we expect, and the key properties of the dataset, such as the region of the items occurrence, the specific characters it contains, etc., can be freely controlled as needed. (2) Because the real data, or even possible clues

Conclusion

In this paper, we have implemented and tested a solution built upon a two-phase framework: (1) a kernel-base multi-task density estimation, representing each target probability as a combination of all existing samples blending by differentiated weights, derived from kernels based on business understanding; (2) these mutually compensated samples, can be easily imported into the entropy calculation of Krimp algorithm. This solution builds a bridge between structural collaboration in multi-tasking

Acknowledgments

The work is partially supported by NSF of China under no. 11301420; NSF of Jiangsu Province under nos. BK20150373 and BK20171237; Suzhou Science and Technology Program under no. SZS201613 and the XJTLU Key Programme Special Fund (KSF) under no. KSF-A-01.

Di Zhang is currently a Ph.D. student at the School of Computer Science, Communication University of China, Beijing, and also a researcher in Noah’s Ark Lab, Huawei Corporation since 2011. He received the M.Sc. degree of Computer Science from the Beijing University of Aeronautics and Astronautics, China in 2006, and worked as research engineer in Institute of Software, Chinese Academy of Sciences from 2006 to 2010. His research interests include data mining, machine learning and distributed

References (45)

  • R. Agrawal et al.

    Fast algorithms for mining association rules

    Proceedings of the Twentieth International Conference on very large data bases (VLDB)

    (1994)
  • HanJ. et al.

    Mining frequent patterns without candidate generation

    Proceedings of the ACM SIGMOD International Conference on Management of Data, ACM sigmod record

    (2000)
  • J. Vreeken et al.

    Krimp: mining itemsets that compress

    Data Min. Knowl. Discov.

    (2011)
  • S.A. Mirheidari et al.

    Alert correlation algorithms: a survey and taxonomy

    Cyberspace Safety and Security

    (2013)
  • M. Klemettinen et al.

    Rule discovery in telecommunication alarm data

    J. Netw. Syst. Manag.

    (1999)
  • ZhaoZ.-d. et al.

    Alarm correlation analysis in SDH network failure

    Proceedings of the National Conference on Information Technology and Computer Science

    (2012)
  • H.-J. Hung et al.

    When social influence meets item inference

    Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining

    (2016)
  • ZhangJ. et al.

    StructInf: mining structural influence from social streams

    Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence

    (2017)
  • A. Anagnostopoulos et al.

    Influence and correlation in social networks

    Proceedings of the Fourteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    (2008)
  • A. Silva et al.

    Structural correlation pattern mining for large graphs

    Proceedings of the Eighth Workshop on Mining and Learning with Graphs

    (2010)
  • J.-F. Boulicaut et al.

    Local pattern detection in attributed graphs

    Solving Large Scale Learning Tasks. Challenges and Algorithms

    (2016)
  • SongL. et al.

    Kernel embeddings of conditional distributions: a unified kernel framework for nonparametric inference in graphical models

    IEEE Signal Process. Mag.

    (2013)
  • Cited by (1)

    Di Zhang is currently a Ph.D. student at the School of Computer Science, Communication University of China, Beijing, and also a researcher in Noah’s Ark Lab, Huawei Corporation since 2011. He received the M.Sc. degree of Computer Science from the Beijing University of Aeronautics and Astronautics, China in 2006, and worked as research engineer in Institute of Software, Chinese Academy of Sciences from 2006 to 2010. His research interests include data mining, machine learning and distributed computing.

    View full text