Skip to main content

Analyzing Similarities of Datasets Using a Pattern Set Kernel

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9651))

Included in the following conference series:

  • 2621 Accesses

Abstract

In the area of pattern discovery, there is much interest in discovering small sets of patterns that characterize the data well. In such scenarios, when data is represented by a small set of characterizing patterns, an interesting problem is the comparison of datasets, by comparing the respective representative sets of patterns. In this paper, we propose a novel kernel function for measuring similarities between two sets of patterns, which is based on evaluating the structural similarities between the patterns in the two sets, weighted using their relative frequencies in the data. We define the kernel for injective serial episodes and itemsets. We also present an efficient algorithm for computing this kernel. We demonstrate the effectiveness of our kernel on classification scenarios and for change detection using sequential datasets and transaction databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    A preliminary version of this paper was presented as a poster at 2nd IKDD Conference on Data Sciences, CoDS 2015 [6].

  2. 2.

    There are various definitions of frequency proposed for episodes [1]. We are not imposing any condition on what frequency we are considering, and hence \(fr(\alpha )\) could be any measure of relative significance of episode \(\alpha \) in the data.

  3. 3.

    Injective because itemsets, by definition, do not have repetitive items.

References

  1. Achar, A., Laxman, S., Sastry, P.S.: A unified view of the apriori-based algorithms for frequent episode discovery. Knowl. Inf. Syst. 31(2), 223–250 (2012)

    Article  Google Scholar 

  2. Archer, B., Shivakumar, S., Rowe, A., Rajkumar, R.: Profiling primitives of networked embedded automation. In: IEEE International Conference on Automation Science and Engineering, CASE 2009, pp. 531–536. IEEE (2009)

    Google Scholar 

  3. Fernando, B., Fromont, E., Tuytelaars, T.: Effective use of frequent itemset mining for image classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 214–227. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  4. Gärtner, T., Flach, P.A., Wrobel, S.: On graph kernels: hardness results and efficient alternatives. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 129–143. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  5. Ibrahim, A.: Effective characterization of sequence data through frequent episodes. Ph.D. thesis, (Under review), Indian Institute of Science, Bangalore (2015, submitted)

    Google Scholar 

  6. Ibrahim, A., Sastry, P.S., Sastry, S.: Pattern set kernel. In: Proceedings of the Second ACM IKDD Conference on Data Sciences, pp. 122–123. ACM (2015)

    Google Scholar 

  7. Ibrahim, A., Sastry, S., Sastry, P.S.: Discovering compressing serial episodes from event sequences. Knowl. Inf. Syst. 1–28 (2015). http://link.springer.com/article/10.1007/s10115-015-0854-3

  8. Kondor, R., Jebara, T.: A kernel between sets of vectors. In: ICML, vol. 20, p. 361 (2003)

    Google Scholar 

  9. Lam, H.T., Mörchen, F., Fradkin, D., Calders, T.: Mining compressing sequential patterns. Stat. Anal. Data Min. 7(1), 34–52 (2014)

    Article  MathSciNet  Google Scholar 

  10. Lichman, M.: UCI machine learning repository (2013)

    Google Scholar 

  11. Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., Watkins, C.: Text classification using string kernels. J. Mach. Learn. Res. 2, 419–444 (2002)

    MATH  Google Scholar 

  12. Lyu, S.: A kernel between unordered sets of data: the gaussian mixture approach. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 255–267. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Tatti, N., Vreeken, J.: The long, the short of it: summarising event sequences with serial episodes. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 462–470. ACM (2012)

    Google Scholar 

  14. van Leeuwen, M., Vreeken, J., Siebes, A.: Compression picks item sets that matter. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 585–592. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Vreeken, J., Van Leeuwen, M., Siebes, A.: Characterising the difference. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 765–774. ACM (2007)

    Google Scholar 

  16. Vreeken, J., Van Leeuwen, M., Siebes, A.: Krimp: mining itemsets that compress. Data Min. Knowl. Disc. 23(1), 169–214 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  17. Xin, D., Han, J., Yan, X., Cheng, H.: Mining compressed frequent-pattern sets. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 709–720. VLDB Endowment (2005)

    Google Scholar 

  18. Yan, X., Han, J., Afshar, R.: Clospan: mining closed sequential patterns in large datasets. In: Proceedings of SIAM International Conference on Data Mining, pp. 166–177. SIAM (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. Ibrahim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Ibrahim, A., Sastry, P.S., Sastry, S. (2016). Analyzing Similarities of Datasets Using a Pattern Set Kernel. In: Bailey, J., Khan, L., Washio, T., Dobbie, G., Huang, J., Wang, R. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9651. Springer, Cham. https://doi.org/10.1007/978-3-319-31753-3_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-31753-3_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-31752-6

  • Online ISBN: 978-3-319-31753-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics