Analyzing Similarities of Datasets Using a Pattern Set Kernel

Ibrahim, A.; Sastry, P. S.; Sastry, Shivakumar

doi:10.1007/978-3-319-31753-3_22

A. Ibrahim¹⁹,
P. S. Sastry¹⁹ &
Shivakumar Sastry²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9651))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2621 Accesses

Abstract

In the area of pattern discovery, there is much interest in discovering small sets of patterns that characterize the data well. In such scenarios, when data is represented by a small set of characterizing patterns, an interesting problem is the comparison of datasets, by comparing the respective representative sets of patterns. In this paper, we propose a novel kernel function for measuring similarities between two sets of patterns, which is based on evaluating the structural similarities between the patterns in the two sets, weighted using their relative frequencies in the data. We define the kernel for injective serial episodes and itemsets. We also present an efficient algorithm for computing this kernel. We demonstrate the effectiveness of our kernel on classification scenarios and for change detection using sequential datasets and transaction databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Finding Periodic Patterns in Big Data

Kernel and Dissimilarity Methods for Exploratory Analysis in a Social Context

Process Mining as a Time Series Analysis Tool via Graph Kernels

Notes

1.
A preliminary version of this paper was presented as a poster at 2nd IKDD Conference on Data Sciences, CoDS 2015 [6].
2.
There are various definitions of frequency proposed for episodes [1]. We are not imposing any condition on what frequency we are considering, and hence $fr(\alpha )$ could be any measure of relative significance of episode $\alpha $ in the data.
3.
Injective because itemsets, by definition, do not have repetitive items.

References

Achar, A., Laxman, S., Sastry, P.S.: A unified view of the apriori-based algorithms for frequent episode discovery. Knowl. Inf. Syst. 31(2), 223–250 (2012)
Article Google Scholar
Archer, B., Shivakumar, S., Rowe, A., Rajkumar, R.: Profiling primitives of networked embedded automation. In: IEEE International Conference on Automation Science and Engineering, CASE 2009, pp. 531–536. IEEE (2009)
Google Scholar
Fernando, B., Fromont, E., Tuytelaars, T.: Effective use of frequent itemset mining for image classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 214–227. Springer, Heidelberg (2012)
Chapter Google Scholar
Gärtner, T., Flach, P.A., Wrobel, S.: On graph kernels: hardness results and efficient alternatives. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 129–143. Springer, Heidelberg (2003)
Chapter Google Scholar
Ibrahim, A.: Effective characterization of sequence data through frequent episodes. Ph.D. thesis, (Under review), Indian Institute of Science, Bangalore (2015, submitted)
Google Scholar
Ibrahim, A., Sastry, P.S., Sastry, S.: Pattern set kernel. In: Proceedings of the Second ACM IKDD Conference on Data Sciences, pp. 122–123. ACM (2015)
Google Scholar
Ibrahim, A., Sastry, S., Sastry, P.S.: Discovering compressing serial episodes from event sequences. Knowl. Inf. Syst. 1–28 (2015). http://link.springer.com/article/10.1007/s10115-015-0854-3
Kondor, R., Jebara, T.: A kernel between sets of vectors. In: ICML, vol. 20, p. 361 (2003)
Google Scholar
Lam, H.T., Mörchen, F., Fradkin, D., Calders, T.: Mining compressing sequential patterns. Stat. Anal. Data Min. 7(1), 34–52 (2014)
Article MathSciNet Google Scholar
Lichman, M.: UCI machine learning repository (2013)
Google Scholar
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)
MATH Google Scholar
Lyu, S.: A kernel between unordered sets of data: the gaussian mixture approach. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 255–267. Springer, Heidelberg (2005)
Chapter Google Scholar
Tatti, N., Vreeken, J.: The long, the short of it: summarising event sequences with serial episodes. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 462–470. ACM (2012)
Google Scholar
van Leeuwen, M., Vreeken, J., Siebes, A.: Compression picks item sets that matter. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 585–592. Springer, Heidelberg (2006)
Chapter Google Scholar
Vreeken, J., Van Leeuwen, M., Siebes, A.: Characterising the difference. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 765–774. ACM (2007)
Google Scholar
Vreeken, J., Van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Disc. 23(1), 169–214 (2011)
Article MathSciNet MATH Google Scholar
Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 709–720. VLDB Endowment (2005)
Google Scholar
Yan, X., Han, J., Afshar, R.: Clospan: mining closed sequential patterns in large datasets. In: Proceedings of SIAM International Conference on Data Mining, pp. 166–177. SIAM (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

Indian Institute of Science, Bangalore, India
A. Ibrahim & P. S. Sastry
University of Akron, Akron, USA
Shivakumar Sastry

Authors

A. Ibrahim
View author publications
You can also search for this author in PubMed Google Scholar
P. S. Sastry
View author publications
You can also search for this author in PubMed Google Scholar
Shivakumar Sastry
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to A. Ibrahim .

Editor information

Editors and Affiliations

The University of Melbourne, Melbourne, Victoria, Australia
James Bailey
The University of Texas at Dallas, Richardson, Texas, USA
Latifur Khan
Osaka University, Osaka, Japan
Takashi Washio
University of Auckland, Auckland, New Zealand
Gill Dobbie
Shenzhen University, Shenzhen, China
Joshua Zhexue Huang
Massey University, Auckland, New Zealand
Ruili Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ibrahim, A., Sastry, P.S., Sastry, S. (2016). Analyzing Similarities of Datasets Using a Pattern Set Kernel. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9651. Springer, Cham. https://doi.org/10.1007/978-3-319-31753-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-31753-3_22
Published: 12 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-31752-6
Online ISBN: 978-3-319-31753-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics