A Deep Dive into the VirusTotal File Feed

van Liebergen, Kevin; Caballero, Juan; Kotzias, Platon; Gates, Chris

doi:10.1007/978-3-031-35504-2_8

Kevin van Liebergen¹¹,
Juan Caballero¹¹,
Platon Kotzias¹² &
…
Chris Gates¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13959))

Included in the following conference series:

International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment

547 Accesses
1 Citations
4 Altmetric

Abstract

Online scanners analyze user-submitted files with a large number of security tools and provide access to the analysis results. As the most popular online scanner, VirusTotal (VT) is often used for determining if samples are malicious, labeling samples with their family, hunting for new threats, and collecting malware samples. We analyze 328M VT reports for 235M samples collected for one year through the VT file feed. We use the reports to characterize the VT file feed in depth and compare it with the telemetry of an AV vendor. We answer questions such as How diverse is the feed? How fresh are the samples it provides? What fraction of samples can be labeled on first sight? How different are the malware families in the feed and the AV telemetry?

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Virustotal API 2.0 reference: File feed. http://developers.virustotal.com/v2.0/reference/file-feed
Allix, K., Bissyandé, T.F., Klein, J., Le Traon, Y.: AndroZoo: collecting millions of android apps for the research community. In: International Conference on Mining Software Repositories (2016)
Google Scholar
Alrawi, O., et al.: The circle of life: a large-scale study of the IoT malware lifecycle. In: USENIX Security Symposium (2021)
Google Scholar
Arp, D., Spreitzenbarth, M., Huebner, M., Gascon, H., Rieck, K.: Drebin: efficient and explainable detection of android malware in your pocket. In: Network and Distributed System Security (2014)
Google Scholar
Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: International Symposium on Recent Advances in Intrusion Detection (2007)
Google Scholar
Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Network and Distributed System Security (2009)
Google Scholar
Bayer, U., Habibi, I., Balzarotti, D., Kirda, E., Kruegel, C.: A view on current malware behaviors. In: LEET (2009)
Google Scholar
Botacin, M., Ceschin, F., de Geus, P., Grégio, A.: We need to talk about antiviruses: challenges & pitfalls of av evaluations. Comput. Secur. 95, 101859 (2020)
Article Google Scholar
Bouwman, X., Griffioen, H., Egbers, J., Doerr, C., Klievink, B., Van Eeten, M.: A different cup of TI? The added value of commercial threat intelligence. In: USENIX Security Symposium (2020)
Google Scholar
Buyukkayhan, A.S., Oprea, A., Li, Z., Robertson, W.K.: Lens on the endpoint: hunting for malicious software through endpoint data analysis. In: International Symposium on Research in Attacks, Intrusions, and Defenses (2017)
Google Scholar
Canto, J., Dacier, M., Kirda, E., Leita, C.: Large scale malware collection: lessons learned. In: IEEE SRDS Workshop (2008)
Google Scholar
Cozzi, E., Graziano, M., Fratantonio, Y., Balzarotti, D.: Understanding Linux malware. In: IEEE Symposium on Security and Privacy (2018)
Google Scholar
Graziano, M., Canali, D., Bilge, L., Lanzi, A., Balzarotti, D.: Needles in a haystack: mining information from public dynamic analysis sandboxes for malware intelligence. In: USENIX Security Symposium (2015)
Google Scholar
Huang, H., et al.: Android malware development on public malware scanning platforms: a large-scale data-driven study. In: International Conference on Big Data (2016)
Google Scholar
Huang, W., Stokes, J.W.: MtNet: a multi-task neural network for dynamic malware classification. In: Detection of Intrusions and Malware, and Vulnerability Assessment (2016)
Google Scholar
Hurier, M., et al.: Euphony: harmonious unification of cacophonous anti-virus vendor labels for android malware. In: IEEE/ACM International Conference on Mining Software Repositories (2017)
Google Scholar
Jindal, C., Salls, C., Aghakhani, H., Long, K., Kruegel, C., Vigna, G.: Neurlux: dynamic malware analysis without feature engineering. In: Annual Computer Security Applications Conference (2019)
Google Scholar
Kaczmarczyck, F., et al.: Spotlight: malware lead generation at scale. In: Annual Computer Security Applications Conference (2020)
Google Scholar
Kotzias, P., Bilge, L., Caballero, J.: Measuring PUP prevalence and PUP distribution through pay-per-install services. In: USENIX Security Symposium (2016)
Google Scholar
Kotzias, P., Caballero, J., Bilge, L.: How did that get in my phone? Unwanted app distribution on android devices. In: IEEE Symposium on Security and Privacy (2021)
Google Scholar
Kotzias, P., Matic, S., Rivera, R., Caballero, J.: Certified PUP: abuse in authenticode code signing. In: ACM Conference on Computer and Communication Security (2015)
Google Scholar
Lever, C., Kotzias, P., Balzarotti, D., Caballero, J., Antonakakis, M.: A lustrum of malware network communication: evolution and insights. In: IEEE Symposium on Security and Privacy (2017)
Google Scholar
Li, B., Roundy, K., Gates, C., Vorobeychik, Y.: Large-scale identification of malicious singleton files. In: ACM Conference on Data and Application Security and Privacy (2017)
Google Scholar
Lindorfer, M., Neugschwandtner, M., Weichselbaum, L., Fratantonio, Y., Van Der Veen, V., Platzer, C.: Andrubis-1,000,000 apps later: a view on current android malware behaviors. In: International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (2014)
Google Scholar
Maffia, L., Nisi, D., Kotzias, P., Lagorio, G., Aonzo, S., Balzarotti, D.: Longitudinal study of the prevalence of malware evasive techniques. arXiv preprint arXiv:2112.11289 (2021)
Mantovani, A., Aonzo, S., Ugarte-Pedrero, X., Merlo, A., Balzarotti, D.: Prevalence and impact of low-entropy packing schemes in the malware ecosystem. In: Network and Distributed Systems Security Symposium (2020)
Google Scholar
Masri, R., Aldwairi, M.: Automated malicious advertisement detection using VirusTotal, UrlVoid, and TrendMicro. In: International Conference on Information and Communication Systems (2017)
Google Scholar
Pendlebury, F., Pierazzi, F., Jordaney, R., Kinder, J., Cavallaro, L.: Tesseract: eliminating experimental bias in malware classification across space and time. In: USENIX Security Symposium (2019)
Google Scholar
Peng, P., Yang, L., Song, L., Wang, G.: Opening the blackbox of VirusTotal: analyzing online phishing scan engines. In: Internet Measurement Conference (2019)
Google Scholar
Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In: USENIX Symposium on Networked Systems Design and Implementation (2010)
Google Scholar
Pontello, M.: TrID - File Identifier (2021). http://mark0.net/soft-trid-e.html
Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.K.: Malware detection by eating a whole EXE. In: Workshops at the AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Detection of Intrusions and Malware, and Vulnerability Assessment (2008)
Google Scholar
Salem, A., Banescu, S., Pretschner, A.: Maat: automatically analyzing VirusTotal for accurate labeling and effective malware detection. ACM Trans. Privacy Secur. 24(4), 1–35 (2021)
Article Google Scholar
Sebastian, M., Rivera, R., Kotzias, P., Caballero, J.: AVClass: a tool for massive malware labeling. In: Research in Attacks, Intrusions, and Defenses (2016)
Google Scholar
Sebastián, S., Caballero, J.: AVClass2: massive malware tag extraction from AV labels. In: Annual Computer Security Applications Conference (2020)
Google Scholar
Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural features. In: Annual Computer Security Applications Conference (2012)
Google Scholar
Suarez-Tangil, G., Stringhini, G.: Eight years of rider measurement in the android malware ecosystem. IEEE Trans. Depend. Secure Comput. (2020)
Google Scholar
Thirumuruganathan, S., Nabeel, M., Choo, E., Khalil, I., Yu, T.: SIRAJ: a unified framework for aggregation of malicious entity detectors. In: IEEE Symposium on Security and Privacy (2022)
Google Scholar
Ugarte-Pedrero, X., Graziano, M., Balzarotti, D.: A close look at a daily dataset of malware samples. ACM Trans. Privacy Secur. 22(1), 1–30 (2019)
Article Google Scholar
Li, V.G., Dunn, M., Pearce, P., McCoy, D., Voelker, G.M., Savage, S.: Reading the Tea leaves: a comparative analysis of threat intelligence. In: USENIX Security Symposium (2019)
Google Scholar
VirusTotal. http://www.virustotal.com/
Yuan, L.-P., Wenjun, H., Ting, Yu., Liu, P., Zhu, S.: Towards large-scale hunting for android negative-day malware. In: International Symposium on Research in Attacks, Intrusions and Defenses (2019)
Google Scholar
Zhu, S., et al.: Measuring and modeling the label dynamics of online anti-malware engines. In: USENIX Security Symposium (2020)
Google Scholar

Download references

Acknowledgment

This work has been partially supported by the PRODIGY Project (TED2021-132464B-I00) funded by MCIN/AEI/10.13039/501100011033/ and EU NextGeneration funds.

Author information

Authors and Affiliations

IMDEA Software Institute, Madrid, Spain
Kevin van Liebergen & Juan Caballero
Norton Research Group, Tempe, USA
Platon Kotzias & Chris Gates

Authors

Kevin van Liebergen
View author publications
You can also search for this author in PubMed Google Scholar
Juan Caballero
View author publications
You can also search for this author in PubMed Google Scholar
Platon Kotzias
View author publications
You can also search for this author in PubMed Google Scholar
Chris Gates
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kevin van Liebergen .

Editor information

Editors and Affiliations

Graz University of Technology, Graz, Austria
Daniel Gruss
AWS Italy, Milan, Italy
Federico Maggi
Universität Hamburg, Hamburg, Germany
Mathias Fischer
Politecnico di Milano, Milan, Italy
Michele Carminati

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Liebergen, K., Caballero, J., Kotzias, P., Gates, C. (2023). A Deep Dive into the VirusTotal File Feed. In: Gruss, D., Maggi, F., Fischer, M., Carminati, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2023. Lecture Notes in Computer Science, vol 13959. Springer, Cham. https://doi.org/10.1007/978-3-031-35504-2_8

Download citation

DOI: https://doi.org/10.1007/978-3-031-35504-2_8
Published: 10 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35503-5
Online ISBN: 978-3-031-35504-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Deep Dive into the VirusTotal File Feed