Abstract
Mining of colossal patterns is used to mine patterns in databases with many attributes and values, but the number of instances in each database is small. Although many efficient approaches for extracting colossal patterns have been proposed, they cannot be applied to colossal pattern mining with constraints. In this paper, we solve the challenge of extracting colossal patterns with length constraints. Firstly, we describe the problems of min-length constraint and max-length constraint and combine them with length constraints. After that, we evolve a proposal for efficiently truncating candidates in the mining process and another one for fast checking of candidates. Based on these properties, we offer the mining algorithm of Length Constraints for Colossal Pattern (LCCP) to extract colossal patterns with length constraints. Experiments are also conducted to show the effectiveness of the proposed LCCP algorithm with a comparison to some other ones.
Similar content being viewed by others
References
Telikani A, Gandomi A, Shahbahrami A (2020) A survey of evolutionary computation for association rule mining. Inf Sci 524:318–352
Shao Y, Liu B, Wang S, Li G (2020) Software defect prediction based on correlation weighted class association rule mining. Knowl Based Syst 196:105742
Alibasa M, Calvo R, Yacef K (2019) Sequential pattern mining suggests wellbeing supportive behaviors. IEEE Access 7:130133–130143
Huynh B, Trinh C, Huynh H, Van T, Vo B, Snásel V (2018) An efficient approach for mining sequential patterns using multiple threads on very large databases. Eng Appl of AI 74:242–251
Fournier-Viger P, Yang Y, Yang P, Lin J, Yun U (2020) TKE: Mining Top-K frequent Episodes, in IEA/AIE 2020: 832–845
Smedt J, Deeva G, Weerdt J (2020) Mining behavioral sequence constraints for classification. IEEE Trans Knowl Data Eng 32(6):1130–1142
Zou H (2020) Clustering algorithm and its application in data mining. Wirel Pers Commun 110(1):21–30
Astrova I, Koschel A, Lee S (2020) Using market basket analysis to find semantic duplicates in ontology. ICCSA 4:197–211
Hagen M, Stein B (2018) Weblog analysis, in Encyclopedia of Social Network Analysis and Mining. 2nd Ed.
Littmann M, Goldberg T, Seitz S, Bodén M, Rost B (2019) Detailed prediction of protein sub-nuclear localization. BMC Bioinformatics 20(1):205:1–205:15
Dessouky M, Taha E, Dessouky M, Eltholth A, Hassan E, El-Samie F (2019) Non-parametric spectral estimation techniques for DNA sequence analysis and exon region prediction. Comput Electric Eng 73:334–348
Kumar D, Sharma D (2019) Deep learning in gene expression modeling, in Handbook of Deep Learning Applications, pp. 363–383
Bachman J, Gyori B, Sorger P (2018) FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining. BMC Bioinform 19(1):248:1–248:14
Deng N, Chen X, Li D, Xiong C (2019) Frequent patterns mining in DNA sequence. IEEE Access 7:108400–108410
Lin J, Yang L, Fournier-Viger P, Hong T (2019) Mining of skyline patterns by considering both frequent and utility constraints. Eng Appl AI 77:229–238
Sohrabi M, Barforoush A (2012) Efficient colossal pattern mining in high dimensional datasets. Knowl-Based Syst 33:41–52
Zhu F, Yan X, Han J, Yu P, Cheng H (2007) Mining colossal frequent patterns by core pattern fusion. ICDE’07, pp. 706–715
Dabbiru M, Shashi M (2010) An efficient approach to colossal pattern mining. Int J Comput Sci Network Security 6:304–312
Prasanna K, Seetha M (2015) A doubleton pattern mining approach for discovering colossal patterns from biological dataset. Int J Comput Appl 119(21)
Prasanna K, Seetha M (2015) Efficient and accurate discovery of colossal pattern sequences from biological datasets: A doubleton pattern mining strategy (DPMine). IMCIP 54:412–421
Nguyen T, Vo B, Snásel V (2017) Efficient algorithms for mining colossal patterns in high dimensional databases. Knowl-Based Syst 122:75–89
Van T, Vo B, Le B (2018) Mining sequential patterns with itemset constraints. Knowl Inf Syst 57(2):311–330
Le T, Nguyen A, Huynh B, Vo B, Pedrycz W (2018) Mining constrained inter-sequence patterns: a novel approach to cope with item constraints. Appl Intell 48(5):1327–1343
Nguyen D, Nguyen L, Vo B, Pedrycz W (2016) Efficient mining of class association rules with the itemset constraint. Knowl-Based Syst 103:73–88
Bessiere C, Lazaar N, Lebbah Y, M. M. (2018) Users constraints in itemset mining, CoRR abs/1801.00345
Nguyen D, Nguyen L, Vo B, Hong T (2015) A novel method for constrained class association rule mining. Inf Sci 320:107–125
Vo B, Le T, Pedrycz W, Nguyen G, Baik S (2017) Mining erasable itemsets with subset and superset itemset constraints. Expert Syst Appl 69:50–61
Nguyen T, Bay V, Huynh B, Snasel V, Nguyen L (2017) Constraint-based method for mining colossal patterns in high dimensional databases, in Information Systems Architecture and Technology - ISAT, Advances in Intelligent Systems and Computing, pp. 195–204
Zulkurnain N (2012) DisClose : discovering colossal closed itemsets from high dimensional datasets via a compact row-tree. University of Manchester
Vanahalli M, Patil N (2019) An efficient parallel row enumerated algorithm for mining frequent colossal closed itemsets from high dimensional datasets. Inf Sci 496:343–362
Zaki FM, Zulkurnain N (2018) RARE: mining colossal closed itemset in high dimensional data. Knowl-Based Syst 161:1–11
Vanahalli M, Patil N (2018) Distributed mining of significant frequent colossal closed itemsets from long biological dataset. ISDA 1:891–902
Hosseininasab A, Hoeve W-J, Ciré A (2019) Constraint-based sequential pattern mining with decision diagrams. AAAI 33:1495–1502
Abeysinghe R, Cui L (2018) Query constraint based mining of association rules for exploratory analysis of clinical datasets in the National Sleep Research Resource. BMC Med Inf Decis Making 18(S-2):89–100
Belaid M, Bessiere C, Lazaar N (2019) Constraint programming for association rules. SDM:127–135
Van T, Yoshitaka A, Le B (2018) Mining web access patterns with super-pattern constraint. Appl Intell 48(11):3902–3914
Song W, Cai K, Zhang M, Yuen C (2018) Codes with run-length and GC-content constraints for DNA-based data storage. IEEE Commun Lett 22(10):2004–2007
Singh K, Bhaskar Biswas B (2019) Efficient algorithm for mining high utility pattern considering length constraints. Int J Data Warehous Min 15(3):1–27
Wu Y, Fan J, Li Y, Guo L, Wu X (2020) NetDAP: (δ, γ) - approximate pattern matching with length constraints. Appl Intell 50(11):4094–4116
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Le, T., Nguyen, TL., Huynh, B. et al. Mining colossal patterns with length constraints. Appl Intell 51, 8629–8640 (2021). https://doi.org/10.1007/s10489-021-02357-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02357-8