Swarm intelligent based online feature selection (OFS) and weighted entropy frequent pattern mining (WEFPM) algorithm for big data analysis

Devi, S. Gayathri; Sabrigiriraj, M.

doi:10.1007/s10586-017-1489-9

Swarm intelligent based online feature selection (OFS) and weighted entropy frequent pattern mining (WEFPM) algorithm for big data analysis

Published: 16 December 2017

Volume 22, pages 11791–11803, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

419 Accesses
7 Citations
Explore all metrics

Abstract

During the past two decades, frequent pattern mining (FPM) has acquired the interests of many researchers: which involves extracting the itemsets from transactions, sequences from big dataset, which occurs frequently and to recognize from the molecular structures, the common sub graph. In this big data era, the unpredictable flow and huge quantity of data brings new challenges in FPM such as space and time complexity. In general, most of the research work focus on recognizing the patterns that occurs frequently, from the set of specific data, where the patterns within every transaction were definitely known a priori. Among these, the users focus only on the small part of this FP. In order to tackle such problems in the current scenario, it is necessary sometimes to select the important features alone, using appropriate FPM algorithms, in order to reduce the complexity level. The major objective of this work is to improve FPM mining results and improve classification accuracy of big dataset samples. To tackle the first challenge, the levy flight bat algorithm (LFBA) along with online feature selection (OFS) approach is proposed, which is used to filter the low quality features from the big data in an online manner. Subsequently to address the second challenge, a weighted entropy frequent pattern mining (WEFPM) is enforced for FPM, to accomplish better computation time when compared with other methods such as direct discriminative pattern mining (DDPMine) and iterative sampling based frequent itemset mining (ISbFIM), where enumeration of entire feature combinations were completed. So the WEFPM algorithm employed in this paper, targets to identify only the specific frequent patterns which are required by the user. By iterating this procedure, it assures that the acquired frequent patterns can be enumerated by using both the theoretical and empirical research, so that enumeration doesn’t proceed into a combinatorial explosion. And also, using the above said LFBA–OFS approach and WEFPM algorithm, frequent patterns that are different in nature, are generated for building high quality learning model. For finding the frequent patterns, here the minimum support threshold is matched with entropy. As a final step, multiple Kernel learning support vector machine is employed as a classifier, to evaluate the performance of the big data samples for efficiency and accuracy. Empirical study reveal that considerable progress is obtained in terms of accuracy and computation time when applied to UCI benchmark big datasets, using the proposed approach for efficient and effective FPM of the online features. It is clear that WEFPM is the most efficient method, because it produces higher average accuracy results of 92.34, 93.218, 91.374 and 87.87% values for adult, chess, hybo and sick dataset respectively. It outperforms when compared to other methods such as DDPMine and ISbFIM using an LIBSVM classifier.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Online Feature Selection (OFS) with Accelerated Bat Algorithm (ABA) and Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) for big data streams

Article Open access 21 November 2019

D. Renuka Devi & S. Sasikala

RETRACTED ARTICLE: A hybrid metaheuristic approach for efficient feature selection methods in big data

Article 01 January 2020

S. Meera & C. Sundar

Association Rule Mining Based on Bat Algorithm

References

Cai, C.H., Fu, A.W.C., Cheng, C.H., Kwong, W.W.: Mining association rules with weighted items. In: International Database Engineering and Applications Symposium, 1998 (IDEAS’98), pp. 68–77 (1998)
Zaki, M.J., Hsiao, C.: CHARM: an efficient algorithm for closed itemset mining. In: Proc. of SDM, pp. 457–473 (2002)
Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., Hsu, M.-C.: PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth. In: Proc. of ICDE, pp. 215–226 (2001)
Washio, T., Motoda, H.: State of the art of graph-based data mining. ACM SIGKDD Explor. Newsl. 5(1), 59–68 (2003)
Article Google Scholar
Yan, X., Yu, P.S., Han, J.: Graph indexing: a frequent structure-based approach. In: Proc. of SIGMOD, pp. 335–346 (2004)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2012)
MATH Google Scholar
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(107–113), 12 (2008)
Google Scholar
Fan, W., Zhang, K., Cheng, H., Gao, J., Yan, X., Han, J., Yu, P., Verscheure, O.: Direct mining of discriminative and essential frequent patterns via model-based search tree. In: Proceeding of KDD ’08, pp. 230–238. ACM, New York (2008)
Shintani, T., Kitsuregawa, M.: Parallel mining algorithms for generalized association rules with classification hierarchy. ACM SIGMOD Record 27(2), 25–36 (1998)
Article Google Scholar
Borgelt, C., Kruse, R.: Induction of association rules: a priori implementation. In: Compstat, pp. 395–400 (2002)
Pan F., Cong, G., Tung, A.K.H., Yang, J., Zaki, M.J.: CARPENTER: finding closed patterns in long biological datasets. In: Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (2003)
Pan, F., Tung, A.K.H., Cong, G., Xu, X.: COBBLER: combining column and row enumeration for closed pattern discovery. In: Proc. 2004 Int. Conf. on Scientific and Statistical Database Management (SSDBM’04), Santorini Island, Greece, pp. 21–30 (2004)
Cong, G., Tan, K.-L., Tung, A.K.H., Xu, X.: Mining top-k covering rule groups for gene expression data. In: 24th ACM International Conference on Management of Data (2005)
Lin, M.Y., Lee, P.Y., Hsueh, S.C.: Apriori-based frequent item set mining algorithms on mapreduce. In: Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication, ICUIMC’12, pp 76:1–76:8. ACM, New York (2012)
Zaki, M., Parthasarathy, S., Ogihara, M., Li, W.: Parallel algorithms for discovery of association rules. Data Min. Knowl. Discov. 1, 343–373 (1997)
Article Google Scholar
Li, H., Wang, Y., Zhang, D., Zhang, M., Chang, E.Y.: PFP: parallel FP-growth for query recommendation. In: Proceedings of the 2008 ACM Conference on Recommender Systems, RecSys’08, pp. 107–114. ACM, New York (2008)
Yang, G.: Computational aspects of mining maximal frequent patterns. Theor. Comput. Sci. 362(1–3), 63–85 (2006)
Article MathSciNet Google Scholar
Wang, J., Zhao, P., Hoi, S.C., Jin, R.: Online feature selection and its applications. IEEE Trans. Knowl. Data Eng. 26(3), 698–710 (2014)
Article Google Scholar
Hoi, S.C., Wang, J., Zhao, P., Jin, R.: Online feature selection for mining big data. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 93–100 (2012)
Aridhi, S., d’Orazio, L., Maddouri, M., Nguifo, E.M.: Density based data partitioning strategy to approximate large-scale subgraph mining. Inf. Syst. 48, 213–223 (2015)
Article Google Scholar
Qiu, H., Gu, R., Yuan, C., Huang, Y.: Yafim: a parallel frequent itemset mining algorithm with spark. In: IEEE International Parallel & Distributed Processing Symposium Workshops (IPDPSW), pp. 1664–1671 (2014)
Cheng, H., Yan, X., Han, J., Hsu, C.W.: Discriminative frequent pattern analysis for effective classification. In: International Conference on Data Engineering, pp. 716–725 (2007)
Cheng, H., Yan, X., Han, J., Yu, P.S.: Direct discriminative pattern mining for effective classification. In: Proceedings of ICDM ’08. IEEE Computer Society, Washington, DC, pp. 169–178 (2008)
Wu, X., Fan, W., Peng, J., Zhang, K., Yu, Y.: Iterative sampling based frequent itemset mining for big data. Int. J. Mach. Learn. Cybern. 6(6), 875–882 (2015)
Article Google Scholar
Gole, S., Tidke, B.: ClustBIGFIM-frequent itemset mining of big data using pre-processing based on mapreduce framework. Int. J. Found. Comput. Sci. Technol. 5(3), 79–89 (2015)
Article Google Scholar
Gawwad, M.A., Ahmed, M.F., Fayek, M.B.: Frequent itemset mining for big data using greatest common divisor technique. Data Sci. J. 16(25), 1–10 (2017)
Google Scholar
Hasançebi, O., Teke, T., Pekcan, O.: A bat-inspired algorithm for structural optimization. Comput. Struct. 128, 77–90 (2013)
Article Google Scholar
Xie, J., Zhou, Y., Chen, H.: A novel bat algorithm based on differential operator and Lévy flights trajectory. Comput. Intell. Neurosci. 2013, 1–13 (2013)
Article Google Scholar
Yilmaz, S., Küçüksille, E.U.: A new modification approach on bat algorithm for solving optimization problems. Appl. Soft Comput. 28, 259–275 (2015)
Article Google Scholar
Yang, X.-S., Deb, S.: Eagle strategy using Lévy walk and firefly algorithms for stochastic optimization. Stud. Comput. Intell. 284, 101–111 (2010)
MATH Google Scholar
Mantegna, R.N.: Fast, accurate algorithm for numerical simulation of Lévy stable stochastic processes. Phys. Rev. E 49(5), 4677–4683 (1994)
Article Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge, MA (2002)
MATH Google Scholar
Cao, H., Naito, T., Ninomiya, Y.: Approximate RBF kernel SVM and its applications in pedestrian classification. In: The 1st International Workshop on Machine Learning for Vision-Based Motion Analysis-MLVMA’08 (2008)
Yekkehkhany, B., Safari, A., Homayouni, S., Hasanlou, M.: A comparison study of different Kernel functions for SVM-based classification of multi-temporal polarimetry SAR data. Int. Arch. Photogramm. Remote Sens. Spat. Inform. Sci. 40(2), 281–285 (2014)
Google Scholar
Lanckriet, G., De Bie, T., Cristianini, N., Jordan, M.I., Stafford, Noble W.: A statistical framework for genomic data fusion. Bioinfomatics 20(16), 2626–2635 (2004)
Article Google Scholar
Tsochantaridis, I., Hoffmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and sturcutured output spaces. In: Proceedings of the 16th International Conference on Machine Learning (2004)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27–27 (2011)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Coimbatore Institute of Engineering and Technology, Coimbatore, Tamilnadu, 641 109, India
S. Gayathri Devi
Department of Electronics and Communication Engineering, SVS College of Engineering, Coimbatore, Tamilnadu, 642 109, India
M. Sabrigiriraj

Authors

S. Gayathri Devi
View author publications
You can also search for this author in PubMed Google Scholar
M. Sabrigiriraj
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Gayathri Devi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Devi, S.G., Sabrigiriraj, M. Swarm intelligent based online feature selection (OFS) and weighted entropy frequent pattern mining (WEFPM) algorithm for big data analysis. Cluster Comput 22 (Suppl 5), 11791–11803 (2019). https://doi.org/10.1007/s10586-017-1489-9

Download citation

Received: 08 November 2017
Revised: 03 December 2017
Accepted: 06 December 2017
Published: 16 December 2017
Issue Date: September 2019
DOI: https://doi.org/10.1007/s10586-017-1489-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Swarm intelligent based online feature selection (OFS) and weighted entropy frequent pattern mining (WEFPM) algorithm for big data analysis

Abstract

Access this article

Similar content being viewed by others

Online Feature Selection (OFS) with Accelerated Bat Algorithm (ABA) and Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) for big data streams

RETRACTED ARTICLE: A hybrid metaheuristic approach for efficient feature selection methods in big data

Association Rule Mining Based on Bat Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Swarm intelligent based online feature selection (OFS) and weighted entropy frequent pattern mining (WEFPM) algorithm for big data analysis

Abstract

Access this article

Similar content being viewed by others

Online Feature Selection (OFS) with Accelerated Bat Algorithm (ABA) and Ensemble Incremental Deep Multiple Layer Perceptron (EIDMLP) for big data streams

RETRACTED ARTICLE: A hybrid metaheuristic approach for efficient feature selection methods in big data

Association Rule Mining Based on Bat Algorithm

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation