Skip to main content

Distributed Mining of Significant Frequent Colossal Closed Itemsets from Long Biological Dataset

  • Conference paper
  • First Online:
Intelligent Systems Design and Applications (ISDA 2018 2018)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 940))

Abstract

Mining colossal itemsets have gained more attention in recent times. An extensive set of short and average sized itemsets do not confine complete and valuable information for decision making. But, the traditional itemset mining algorithms expend a gigantic measure of time in mining these little and average sized itemsets. Colossal itemsets are very significant for numerous applications including the field of bioinformatics and are influential during the decision making. The new mode of dataset known as long biological dataset was contributed by Bioinformatics. These datasets are high dimensional datasets, which are depicted by an expansive number of features (attributes) and a less number of rows (samples). Extracting huge amount of information and knowledge from high dimensional long biological dataset is a nontrivial task. The existing algorithms are computationally expensive and sequential in mining significant Frequent Colossal Closed itemsets (FCCI) from long biological dataset. Distributed computing is a good strategy to overcome the inefficiency of the existing sequential algorithm. The paper proposes a distributed computing approach for mining FCCI. The row enumerated mining search space is efficiently cut down by pruning strategy enclosed in Distributed Row Enumerated Frequent Colossal Closed Itemset Mining (DREFCCIM) algorithm. The proposed DREFCCIM algorithm is the first distributed algorithm to mine FCCI from long biological dataset. The experimental results demonstrate the efficient performance of the DREFCCIM algorithm in comparison to the current algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alves, R., Rodriguez-Baena, D.S., Aguilar-Ruiz, J.S.: Gene association analysis: a survey of frequent pattern mining from gene expression data. Briefings Bioinform. 11, 210–224 (2009)

    Article  Google Scholar 

  2. Biological-Datasets. http://datam.i2r.a-star.edu.sg/datasets/krbd/index.html

  3. Djenouri, Y., Djenouri, D., Belhadi, A., Cano, A.: Exploiting GPU and cluster parallelism in single scan frequent itemset mining. Inf. Sci. (2018)

    Google Scholar 

  4. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM Sigmod Rec. 29, 1–12 (2000)

    Article  Google Scholar 

  5. Javed, A., Khokhar, A.: Frequent pattern mining on message passing multiprocessor systems. Distrib. Parallel Databases 16(3), 321–334 (2004)

    Article  Google Scholar 

  6. Lin, K.C., Liao, I.E., Chang, T.P., Lin, S.F.: A frequent itemset mining algorithm based on the principle of inclusion-exclusion and transaction mapping. Inf. Sci. 276, 278–289 (2014)

    Article  Google Scholar 

  7. Liu, H., Han, J., Xin, D., Shao, Z.: Mining frequent patterns on very high dimensional data: a topdown row enumeration approach. In: Proceeding of the 2006 SIAM International Conference on Data Mining (SDM 2006), Bethesda, MD, pp. 280–291. SIAM (2006)

    Google Scholar 

  8. Liu, H., Wang, X., He, J., Han, J., Xin, D., Shao, Z.: Top-down mining of frequent closed patterns from very high dimensional data. Inf. Sci. 179(7), 899–924 (2009)

    Article  Google Scholar 

  9. Lucchese, C., Orlando, S., Perego, R.: Parallel mining of frequent closed patterns: harnessing modern computer architectures. In: Seventh IEEE International Conference on Data Mining, ICDM 2007, pp. 242–251. IEEE (2007)

    Google Scholar 

  10. Naulaerts, S., Meysman, P., Bittremieux, W., Vu, T.N., Berghe, W.V., Goethals, B., Laukens, K.: A primer to frequent itemset mining for bioinformatics. Briefings Bioinform. 16(2), 216–231 (2015)

    Article  Google Scholar 

  11. Negrevergne, B., Termier, A., Méhaut, J.F., Uno, T.: Discovering closed frequent itemsets on multicore: parallelizing computations and optimizing memory accesses. In: 2010 International Conference on High Performance Computing and Simulation (HPCS), pp. 521–528. IEEE (2010)

    Google Scholar 

  12. Negrevergne, B., Termier, A., Rousset, M.C., Méhaut, J.F.: Para miner: a generic pattern mining algorithm for multi-core architectures. Data Min. Knowl. Discov. 28(3), 593–633 (2014)

    Article  MathSciNet  Google Scholar 

  13. Pan, F., Tung, A.K., Cong, G., Xu, X.: Cobbler: combining column and row enumeration for closed pattern discovery. In: 16th International Conference on Scientific and Statistical Database Management, Proceedings, pp. 21–30. IEEE (2004)

    Google Scholar 

  14. Sohrabi, M.K., Barforoush, A.A.: Efficient colossal pattern mining in high dimensional datasets. Knowl.-Based Syst. 33, 41–52 (2012)

    Article  Google Scholar 

  15. Song, W., Yang, B., Xu, Z.: Index-BitTableFI: an improved algorithm for mining frequent itemsets. Knowl.-Based Syst. 21(6), 507–513 (2008)

    Article  Google Scholar 

  16. Tanbeer, S.K., Ahmed, C.F., Jeong, B.S., Lee, Y.K.: Efficient single-pass frequent pattern mining using a prefix-tree. Inf. Sci. 179(5), 559–583 (2009)

    Article  MathSciNet  Google Scholar 

  17. Vo, B., Hong, T.P., Le, B.: DBV-miner: a dynamic bit-vector approach for fast mining frequent closed itemsets. Expert Syst. Appl. 39(8), 7196–7206 (2012)

    Article  Google Scholar 

  18. Wang, J., Han, J., Pei, J.: Closet+: searching for the best strategies for mining frequent closed itemsets. In: Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 236–245. ACM (2003)

    Google Scholar 

  19. Xun, Y., Zhang, J., Qin, X.: Fidoop: parallel mining of frequent itemsets using mapreduce. IEEE Trans. Syst. Man Cybern. Syst. 46(3), 313–325 (2016)

    Article  Google Scholar 

  20. Yu, K.M., Zhou, J.: Parallel TID-based frequent pattern mining algorithm on a PC cluster and grid computing system. Expert Syst. Appl. 37(3), 2486–2494 (2010)

    Article  Google Scholar 

  21. Zaki, M.J., Hsiao, C.J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17(4), 462–478 (2005)

    Article  Google Scholar 

  22. Zhu, F., Yan, X., Han, J., Yu, P.S., Cheng, H.: Mining colossal frequent patterns by core pattern fusion. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 706–715. IEEE (2007)

    Google Scholar 

  23. Zulkurnain, N.F., Haglin, D.J., Keane, J.A.: Disclose: discovering colossal closed itemsets via a memory efficient compact row-tree. In: Emerging Trends in Knowledge Discovery and Data Mining, pp. 141–156. Springer (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manjunath K. Vanahalli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vanahalli, M.K., Patil, N. (2020). Distributed Mining of Significant Frequent Colossal Closed Itemsets from Long Biological Dataset. In: Abraham, A., Cherukuri, A.K., Melin, P., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2018 2018. Advances in Intelligent Systems and Computing, vol 940. Springer, Cham. https://doi.org/10.1007/978-3-030-16657-1_83

Download citation

Publish with us

Policies and ethics