Skip to main content
Log in

Mining colossal patterns with length constraints

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Mining of colossal patterns is used to mine patterns in databases with many attributes and values, but the number of instances in each database is small. Although many efficient approaches for extracting colossal patterns have been proposed, they cannot be applied to colossal pattern mining with constraints. In this paper, we solve the challenge of extracting colossal patterns with length constraints. Firstly, we describe the problems of min-length constraint and max-length constraint and combine them with length constraints. After that, we evolve a proposal for efficiently truncating candidates in the mining process and another one for fast checking of candidates. Based on these properties, we offer the mining algorithm of Length Constraints for Colossal Pattern (LCCP) to extract colossal patterns with length constraints. Experiments are also conducted to show the effectiveness of the proposed LCCP algorithm with a comparison to some other ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Telikani A, Gandomi A, Shahbahrami A (2020) A survey of evolutionary computation for association rule mining. Inf Sci 524:318–352

    Article  MathSciNet  Google Scholar 

  2. Shao Y, Liu B, Wang S, Li G (2020) Software defect prediction based on correlation weighted class association rule mining. Knowl Based Syst 196:105742

    Article  Google Scholar 

  3. Alibasa M, Calvo R, Yacef K (2019) Sequential pattern mining suggests wellbeing supportive behaviors. IEEE Access 7:130133–130143

    Article  Google Scholar 

  4. Huynh B, Trinh C, Huynh H, Van T, Vo B, Snásel V (2018) An efficient approach for mining sequential patterns using multiple threads on very large databases. Eng Appl of AI 74:242–251

    Article  Google Scholar 

  5. Fournier-Viger P, Yang Y, Yang P, Lin J, Yun U (2020) TKE: Mining Top-K frequent Episodes, in IEA/AIE 2020: 832–845

  6. Smedt J, Deeva G, Weerdt J (2020) Mining behavioral sequence constraints for classification. IEEE Trans Knowl Data Eng 32(6):1130–1142

    Article  Google Scholar 

  7. Zou H (2020) Clustering algorithm and its application in data mining. Wirel Pers Commun 110(1):21–30

    Article  Google Scholar 

  8. Astrova I, Koschel A, Lee S (2020) Using market basket analysis to find semantic duplicates in ontology. ICCSA 4:197–211

    Google Scholar 

  9. Hagen M, Stein B (2018) Weblog analysis, in Encyclopedia of Social Network Analysis and Mining. 2nd Ed.

  10. Littmann M, Goldberg T, Seitz S, Bodén M, Rost B (2019) Detailed prediction of protein sub-nuclear localization. BMC Bioinformatics 20(1):205:1–205:15

    Google Scholar 

  11. Dessouky M, Taha E, Dessouky M, Eltholth A, Hassan E, El-Samie F (2019) Non-parametric spectral estimation techniques for DNA sequence analysis and exon region prediction. Comput Electric Eng 73:334–348

    Article  Google Scholar 

  12. Kumar D, Sharma D (2019) Deep learning in gene expression modeling, in Handbook of Deep Learning Applications, pp. 363–383

  13. Bachman J, Gyori B, Sorger P (2018) FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining. BMC Bioinform 19(1):248:1–248:14

    Article  Google Scholar 

  14. Deng N, Chen X, Li D, Xiong C (2019) Frequent patterns mining in DNA sequence. IEEE Access 7:108400–108410

    Article  Google Scholar 

  15. Lin J, Yang L, Fournier-Viger P, Hong T (2019) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl AI 77:229–238

    Article  Google Scholar 

  16. Sohrabi M, Barforoush A (2012) Efficient colossal pattern mining in high dimensional datasets. Knowl-Based Syst 33:41–52

    Article  Google Scholar 

  17. Zhu F, Yan X, Han J, Yu P, Cheng H (2007) Mining colossal frequent patterns by core pattern fusion. ICDE’07, pp. 706–715

  18. Dabbiru M, Shashi M (2010) An efficient approach to colossal pattern mining. Int J Comput Sci Network Security 6:304–312

    Google Scholar 

  19. Prasanna K, Seetha M (2015) A doubleton pattern mining approach for discovering colossal patterns from biological dataset. Int J Comput Appl 119(21)

  20. Prasanna K, Seetha M (2015) Efficient and accurate discovery of colossal pattern sequences from biological datasets: A doubleton pattern mining strategy (DPMine). IMCIP 54:412–421

    Google Scholar 

  21. Nguyen T, Vo B, Snásel V (2017) Efficient algorithms for mining colossal patterns in high dimensional databases. Knowl-Based Syst 122:75–89

    Article  Google Scholar 

  22. Van T, Vo B, Le B (2018) Mining sequential patterns with itemset constraints. Knowl Inf Syst 57(2):311–330

    Article  Google Scholar 

  23. Le T, Nguyen A, Huynh B, Vo B, Pedrycz W (2018) Mining constrained inter-sequence patterns: a novel approach to cope with item constraints. Appl Intell 48(5):1327–1343

    Article  Google Scholar 

  24. Nguyen D, Nguyen L, Vo B, Pedrycz W (2016) Efficient mining of class association rules with the itemset constraint. Knowl-Based Syst 103:73–88

    Article  Google Scholar 

  25. Bessiere C, Lazaar N, Lebbah Y, M. M. (2018) Users constraints in itemset mining, CoRR abs/1801.00345

  26. Nguyen D, Nguyen L, Vo B, Hong T (2015) A novel method for constrained class association rule mining. Inf Sci 320:107–125

    Article  MathSciNet  Google Scholar 

  27. Vo B, Le T, Pedrycz W, Nguyen G, Baik S (2017) Mining erasable itemsets with subset and superset itemset constraints. Expert Syst Appl 69:50–61

    Article  Google Scholar 

  28. Nguyen T, Bay V, Huynh B, Snasel V, Nguyen L (2017) Constraint-based method for mining colossal patterns in high dimensional databases, in Information Systems Architecture and Technology - ISAT, Advances in Intelligent Systems and Computing, pp. 195–204

  29. Zulkurnain N (2012) DisClose : discovering colossal closed itemsets from high dimensional datasets via a compact row-tree. University of Manchester

  30. Vanahalli M, Patil N (2019) An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets. Inf Sci 496:343–362

    Article  Google Scholar 

  31. Zaki FM, Zulkurnain N (2018) RARE: mining colossal closed itemset in high dimensional data. Knowl-Based Syst 161:1–11

    Article  Google Scholar 

  32. Vanahalli M, Patil N (2018) Distributed mining of significant frequent colossal closed itemsets from long biological dataset. ISDA 1:891–902

    Google Scholar 

  33. Hosseininasab A, Hoeve W-J, Ciré A (2019) Constraint-based sequential pattern mining with decision diagrams. AAAI 33:1495–1502

    Article  Google Scholar 

  34. Abeysinghe R, Cui L (2018) Query constraint based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource. BMC Med Inf Decis Making 18(S-2):89–100

    Google Scholar 

  35. Belaid M, Bessiere C, Lazaar N (2019) Constraint programming for association rules. SDM:127–135

  36. Van T, Yoshitaka A, Le B (2018) Mining web access patterns with super-pattern constraint. Appl Intell 48(11):3902–3914

    Article  Google Scholar 

  37. Song W, Cai K, Zhang M, Yuen C (2018) Codes with run-length and GC-content constraints for DNA-based data storage. IEEE Commun Lett 22(10):2004–2007

    Article  Google Scholar 

  38. Singh K, Bhaskar Biswas B (2019) Efficient algorithm for mining high utility pattern considering length constraints. Int J Data Warehous Min 15(3):1–27

    Article  Google Scholar 

  39. Wu Y, Fan J, Li Y, Guo L, Wu X (2020) NetDAP: (δ, γ) - approximate pattern matching with length constraints. Appl Intell 50(11):4094–4116

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bao Huynh.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Le, T., Nguyen, TL., Huynh, B. et al. Mining colossal patterns with length constraints. Appl Intell 51, 8629–8640 (2021). https://doi.org/10.1007/s10489-021-02357-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02357-8

Keywords

Navigation