Skip to main content
Log in

Scalable malware detection system using big data and distributed machine learning approach

  • Application of soft computing
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Computer, Internet, and Smartphone have changed our life as never before. Today, we cannot even imagine our life without these technologies. If we look around, we find everything, everywhere connected and controlled by system and software. We find amazing software and mobile applications which have become nerve of our daily life. Our dependency on this software and systems is so and so much that it is scary even to imagine, what if this system fails at any point in time. There is always a threat surrounded by various types of cyber-attacks. Every day cybercriminals are evolving their attacking strategy. Cyber-attacks using ever-more sophisticated malware are the major cause of concern for all types of users. Cyber-world has witnessed rapid changes in malware attacking strategy in the recent past. The volume, velocity, and complexity of malware are posing new challenges for malware detection systems. A scalable malware detection system with the capability to detect complex attacks is the time of need. In this paper, we have proposed a scalable malware detection system using big data and a machine learning approach. The machine learning model proposed in the system is implemented using Apache Spark which supports distributed learning. Locality-sensitive hashing is used for malware detection, which significantly reduces the malware detection time. A five-stage iterative process has been used to carry out the implementation and experimental analysis. The proposed model shown in the paper has achieved 99.8% accuracy. The proposed model has also significantly reduced the learning and malware detection time compared to models proposed by other researchers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Availability of data and material

Data sharing is not applicable to this article as no new data were created or analyzed in this study.

Code availability

Not applicable.

References

Download references

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

The paper is authored by a single author, and all the works in the paper are carried out by him.

Corresponding author

Correspondence to Manish Kumar.

Ethics declarations

Conflict of interest

The author hereby declares that they have no conflict of interest. No research grant or fund has been received from any agency to carry out the research work discussed in the manuscript.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Human animal and rights

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kumar, M. Scalable malware detection system using big data and distributed machine learning approach. Soft Comput 26, 3987–4003 (2022). https://doi.org/10.1007/s00500-021-06492-9

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-06492-9

Keywords

Navigation