Skip to main content

Distributed Scalable Association Rule Mining over Covid-19 Data

  • Conference paper
  • First Online:
Future Data and Security Engineering (FDSE 2021)

Abstract

The worldwide Covid-19 widespread in 2020 has turned into a phenomenon that has shaken human life significantly. It is widely recognized that taking faster measurements is crucial for monitoring and preventing the further spread of COVID-19. The advent of distributive computing frameworks provides one efficient solution for the issue. One method uses non-clinical techniques, such as data mining tools and other artificial intelligence technologies. Spark is a widely used framework and accepted by the big data community. This research used a cross-country Covid-19 dataset to assess the performance of the Apriori and FP-growth through different components of Spark (different numbers of cores and transactions). This involves a scheme for classification and prediction by recognizing the associated rules relating to Coronavirus. This research aims to understand the difference between FP-growth and Apriori and find the ideal parameters of Spark that can improve the performance by adding nodes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/beoutbreakprepared/nCoV2019.

References

  1. Abdelghani, B., Guven, E.: Predicting breast cancer survivability using data mining techniques. In: SIAM International Conference on Data Mining (2006)

    Google Scholar 

  2. Agrawal, R., Imieliński, T., Swami, A.: Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)

    Google Scholar 

  3. Agrawal, R., Srikant, R., et al.: Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB, vol. 1215, pp. 487–499. Citeseer (1994)

    Google Scholar 

  4. Anwar, H., Khan, Q.U.: Pathology and therapeutics of COVID-19: a review. Int. J. Med. Stud. 8(2), 113–120 (2020)

    Google Scholar 

  5. Armbrust, M., et al.: Spark SQL: Relational data processing in spark. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1383–1394 (2015)

    Google Scholar 

  6. Banks, D., House, L., McMorris, F.R., Arabie, P., Gaul, W.A.: Classification, Clustering, and Data Mining Applications: Proceedings of the Meeting of the International Federation of Classification Societies (IFCS), Illinois Institute of Technology, Chicago, 15–18 July 2004. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-17103-1

  7. Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: a case study. In: Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 254–260 (1999)

    Google Scholar 

  8. Chen, Y., Li, F., Fan, J.: Mining association rules in big data with NGEP. Clust. Comput. 18(2), 577–585 (2015)

    Article  Google Scholar 

  9. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans. Inf. Theory 13(1), 21–27 (1967)

    Article  Google Scholar 

  10. Fix, E., Hodges, J.L.: Discriminatory analysis. nonparametric discrimination: consistency properties. Int. Stat. Rev./Revue Int. Stat. 57(3), 238–247 (1989)

    Google Scholar 

  11. Han, J., Pei, J., Yin, Y.: Mining frequent patterns without candidate generation. ACM SIGMOD Rec. 29(2), 1–12 (2000)

    Article  Google Scholar 

  12. Inoubli, W., Aridhi, S., Mezni, H., Maddouri, M., Nguifo, E.: A comparative study on streaming frameworks for big data. In: VLDB 2018–44th International Conference on Very Large Data Bases: Workshop LADaS-Latin American Data Science, pp. 1–8 (2018)

    Google Scholar 

  13. Inoubli, W., Aridhi, S., Mezni, H., Maddouri, M., Nguifo, E.M.: An experimental survey on big data frameworks. Futur. Gener. Comput. Syst. 86, 546–564 (2018)

    Article  Google Scholar 

  14. Inoubli, W., Aridhi, S., Mezni, H., Mondher, M., Nguifo, E.: A distributed algorithm for large-scale graph clustering (2019)

    Google Scholar 

  15. Kate, R.J., Nadig, R.: Stage-specific predictive models for breast cancer survivability. Int. J. Med. Inf. 97, 304–311 (2017)

    Article  Google Scholar 

  16. Kaur, G., Aggarwal, S.: Performance analysis of association rule mining algorithms. Int. J. Adv. Res. Comput. Sci. Softw. Eng. 3(8), 856–58 (2013)

    Google Scholar 

  17. Kaushik, M., Sharma, R., Peious, S.A., Shahin, M., Ben Yahia, S., Draheim, D.: On the potential of numerical association rule mining. In: Dang, T.K., Küng, J., Takizawa, M., Chung, T.M. (eds.) FDSE 2020. CCIS, vol. 1306, pp. 3–20. Springer, Singapore (2020). https://doi.org/10.1007/978-981-33-4370-2_1

    Chapter  Google Scholar 

  18. Kaushik, M., Sharma, R., Peious, S.A., Shahin, M., Yahia, S.B., Draheim, D.: A systematic assessment of numerical association rule mining methods. SN Comput. Sci. 2(5), 1–13 (2021)

    Article  Google Scholar 

  19. Li, H., Sheu, P.C.-Y.: A scalable association rule learning heuristic for large datasets. J. Big Data 8(1), 1–32 (2021). https://doi.org/10.1186/s40537-021-00473-3

    Article  Google Scholar 

  20. Qiu, H., Gu, R., Yuan, C., Huang, Y.: YAFIM: a parallel frequent itemset mining algorithm with spark. In: 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, pp. 1664–1671. IEEE (2014)

    Google Scholar 

  21. Rasheed, J., et al.: A survey on artificial intelligence approaches in supporting frontline workers and decision makers for the COVID-19 pandemic. Chaos Solit. Fractals 141, 110337 (2020). https://doi.org/10.1016/j.chaos.2020.110337. https://www.sciencedirect.com/science/article/pii/S0960077920307323

  22. Senthilkumar, A., Hari Prasad, D.: An efficient FP-growth based association rule mining algorithm using hadoop MapReduce. Indian J. Sci. Technol. 13(34), 3561–3571 (2020)

    Article  Google Scholar 

  23. Shahin, M., et al.: Big data analytic in association rule mining: A systematic literature review. In: Proceedings of the International Conference on Big Data Engineering and Technology (2021). (in press)

    Google Scholar 

  24. Shahin, M., et al.: Cluster-based association rule mining for an intersection accident dataset. In: Proceedings of the IEEE International Conference on Computing, Electronic and Electrical Engineering (ICECUBE) (2021)

    Google Scholar 

  25. Shukla, N., Hagenbuchner, M., Win, K.T., Yang, J.: Breast cancer data analysis for survivability studies and prediction. Comput. Methods Program. Biomed. 155, 199–208 (2018)

    Article  Google Scholar 

  26. Spark, A.: Unified analytics engine for big data (2018). Accessed 5 Feb 2019

    Google Scholar 

  27. Wu, W., Zhou, H.: Data-driven diagnosis of cervical cancer with support vector machine-based approaches. IEEE Access 5, 25189–25195 (2017)

    Article  Google Scholar 

  28. Xin, R.S., Gonzalez, J.E., Franklin, M.J., Stoica, I.: GraphX: a resilient distributed graph system on spark. In: First International Workshop on Graph Data Management Experiences and Systems, pp. 1–6 (2013)

    Google Scholar 

  29. Xu, B., et al.: Epidemiological data from the COVID-19 outbreak, real-time case information. Sci. Data 7(1), 1–6 (2020)

    Article  Google Scholar 

  30. Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I., et al.: Spark: cluster computing with working sets. HotCloud 10(10–10), 95 (2010)

    Google Scholar 

  31. Zaki, M.J.: Scalable algorithms for association mining. IEEE Trans. Knowl. Data Eng. 12(3), 372–390 (2000)

    Article  Google Scholar 

  32. Zhang, S., Webb, G.I.: Further pruning for efficient association rule discovery. In: Stumptner, M., Corbett, D., Brooks, M. (eds.) AI 2001. LNCS (LNAI), vol. 2256, pp. 605–618. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-45656-2_52

    Chapter  Google Scholar 

Download references

Acknowledgements

This work has been conducted in the project “ICT programme” which was supported by the European Union through the European Social Fund.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mahtab Shahin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Shahin, M., Inoubli, W., Shah, S.A., Yahia, S.B., Draheim, D. (2021). Distributed Scalable Association Rule Mining over Covid-19 Data. In: Dang, T.K., Küng, J., Chung, T.M., Takizawa, M. (eds) Future Data and Security Engineering. FDSE 2021. Lecture Notes in Computer Science(), vol 13076. Springer, Cham. https://doi.org/10.1007/978-3-030-91387-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-91387-8_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91386-1

  • Online ISBN: 978-3-030-91387-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics