Abstract
Online scanners analyze user-submitted files with a large number of security tools and provide access to the analysis results. As the most popular online scanner, VirusTotal (VT) is often used for determining if samples are malicious, labeling samples with their family, hunting for new threats, and collecting malware samples. We analyze 328M VT reports for 235M samples collected for one year through the VT file feed. We use the reports to characterize the VT file feed in depth and compare it with the telemetry of an AV vendor. We answer questions such as How diverse is the feed? How fresh are the samples it provides? What fraction of samples can be labeled on first sight? How different are the malware families in the feed and the AV telemetry?
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Virustotal API 2.0 reference: File feed. http://developers.virustotal.com/v2.0/reference/file-feed
Allix, K., Bissyandé, T.F., Klein, J., Le Traon, Y.: AndroZoo: collecting millions of android apps for the research community. In: International Conference on Mining Software Repositories (2016)
Alrawi, O., et al.: The circle of life: a large-scale study of the IoT malware lifecycle. In: USENIX Security Symposium (2021)
Arp, D., Spreitzenbarth, M., Huebner, M., Gascon, H., Rieck, K.: Drebin: efficient and explainable detection of android malware in your pocket. In: Network and Distributed System Security (2014)
Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: International Symposium on Recent Advances in Intrusion Detection (2007)
Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Network and Distributed System Security (2009)
Bayer, U., Habibi, I., Balzarotti, D., Kirda, E., Kruegel, C.: A view on current malware behaviors. In: LEET (2009)
Botacin, M., Ceschin, F., de Geus, P., Grégio, A.: We need to talk about antiviruses: challenges & pitfalls of av evaluations. Comput. Secur. 95, 101859 (2020)
Bouwman, X., Griffioen, H., Egbers, J., Doerr, C., Klievink, B., Van Eeten, M.: A different cup of TI? The added value of commercial threat intelligence. In: USENIX Security Symposium (2020)
Buyukkayhan, A.S., Oprea, A., Li, Z., Robertson, W.K.: Lens on the endpoint: hunting for malicious software through endpoint data analysis. In: International Symposium on Research in Attacks, Intrusions, and Defenses (2017)
Canto, J., Dacier, M., Kirda, E., Leita, C.: Large scale malware collection: lessons learned. In: IEEE SRDS Workshop (2008)
Cozzi, E., Graziano, M., Fratantonio, Y., Balzarotti, D.: Understanding Linux malware. In: IEEE Symposium on Security and Privacy (2018)
Graziano, M., Canali, D., Bilge, L., Lanzi, A., Balzarotti, D.: Needles in a haystack: mining information from public dynamic analysis sandboxes for malware intelligence. In: USENIX Security Symposium (2015)
Huang, H., et al.: Android malware development on public malware scanning platforms: a large-scale data-driven study. In: International Conference on Big Data (2016)
Huang, W., Stokes, J.W.: MtNet: a multi-task neural network for dynamic malware classification. In: Detection of Intrusions and Malware, and Vulnerability Assessment (2016)
Hurier, M., et al.: Euphony: harmonious unification of cacophonous anti-virus vendor labels for android malware. In: IEEE/ACM International Conference on Mining Software Repositories (2017)
Jindal, C., Salls, C., Aghakhani, H., Long, K., Kruegel, C., Vigna, G.: Neurlux: dynamic malware analysis without feature engineering. In: Annual Computer Security Applications Conference (2019)
Kaczmarczyck, F., et al.: Spotlight: malware lead generation at scale. In: Annual Computer Security Applications Conference (2020)
Kotzias, P., Bilge, L., Caballero, J.: Measuring PUP prevalence and PUP distribution through pay-per-install services. In: USENIX Security Symposium (2016)
Kotzias, P., Caballero, J., Bilge, L.: How did that get in my phone? Unwanted app distribution on android devices. In: IEEE Symposium on Security and Privacy (2021)
Kotzias, P., Matic, S., Rivera, R., Caballero, J.: Certified PUP: abuse in authenticode code signing. In: ACM Conference on Computer and Communication Security (2015)
Lever, C., Kotzias, P., Balzarotti, D., Caballero, J., Antonakakis, M.: A lustrum of malware network communication: evolution and insights. In: IEEE Symposium on Security and Privacy (2017)
Li, B., Roundy, K., Gates, C., Vorobeychik, Y.: Large-scale identification of malicious singleton files. In: ACM Conference on Data and Application Security and Privacy (2017)
Lindorfer, M., Neugschwandtner, M., Weichselbaum, L., Fratantonio, Y., Van Der Veen, V., Platzer, C.: Andrubis-1,000,000 apps later: a view on current android malware behaviors. In: International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (2014)
Maffia, L., Nisi, D., Kotzias, P., Lagorio, G., Aonzo, S., Balzarotti, D.: Longitudinal study of the prevalence of malware evasive techniques. arXiv preprint arXiv:2112.11289 (2021)
Mantovani, A., Aonzo, S., Ugarte-Pedrero, X., Merlo, A., Balzarotti, D.: Prevalence and impact of low-entropy packing schemes in the malware ecosystem. In: Network and Distributed Systems Security Symposium (2020)
Masri, R., Aldwairi, M.: Automated malicious advertisement detection using VirusTotal, UrlVoid, and TrendMicro. In: International Conference on Information and Communication Systems (2017)
Pendlebury, F., Pierazzi, F., Jordaney, R., Kinder, J., Cavallaro, L.: Tesseract: eliminating experimental bias in malware classification across space and time. In: USENIX Security Symposium (2019)
Peng, P., Yang, L., Song, L., Wang, G.: Opening the blackbox of VirusTotal: analyzing online phishing scan engines. In: Internet Measurement Conference (2019)
Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In: USENIX Symposium on Networked Systems Design and Implementation (2010)
Pontello, M.: TrID - File Identifier (2021). http://mark0.net/soft-trid-e.html
Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., Nicholas, C.K.: Malware detection by eating a whole EXE. In: Workshops at the AAAI Conference on Artificial Intelligence (2018)
Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: Detection of Intrusions and Malware, and Vulnerability Assessment (2008)
Salem, A., Banescu, S., Pretschner, A.: Maat: automatically analyzing VirusTotal for accurate labeling and effective malware detection. ACM Trans. Privacy Secur. 24(4), 1–35 (2021)
Sebastian, M., Rivera, R., Kotzias, P., Caballero, J.: AVClass: a tool for massive malware labeling. In: Research in Attacks, Intrusions, and Defenses (2016)
Sebastián, S., Caballero, J.: AVClass2: massive malware tag extraction from AV labels. In: Annual Computer Security Applications Conference (2020)
Smutz, C., Stavrou, A.: Malicious PDF detection using metadata and structural features. In: Annual Computer Security Applications Conference (2012)
Suarez-Tangil, G., Stringhini, G.: Eight years of rider measurement in the android malware ecosystem. IEEE Trans. Depend. Secure Comput. (2020)
Thirumuruganathan, S., Nabeel, M., Choo, E., Khalil, I., Yu, T.: SIRAJ: a unified framework for aggregation of malicious entity detectors. In: IEEE Symposium on Security and Privacy (2022)
Ugarte-Pedrero, X., Graziano, M., Balzarotti, D.: A close look at a daily dataset of malware samples. ACM Trans. Privacy Secur. 22(1), 1–30 (2019)
Li, V.G., Dunn, M., Pearce, P., McCoy, D., Voelker, G.M., Savage, S.: Reading the Tea leaves: a comparative analysis of threat intelligence. In: USENIX Security Symposium (2019)
VirusTotal. http://www.virustotal.com/
Yuan, L.-P., Wenjun, H., Ting, Yu., Liu, P., Zhu, S.: Towards large-scale hunting for android negative-day malware. In: International Symposium on Research in Attacks, Intrusions and Defenses (2019)
Zhu, S., et al.: Measuring and modeling the label dynamics of online anti-malware engines. In: USENIX Security Symposium (2020)
Acknowledgment
This work has been partially supported by the PRODIGY Project (TED2021-132464B-I00) funded by MCIN/AEI/10.13039/501100011033/ and EU NextGeneration funds.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
van Liebergen, K., Caballero, J., Kotzias, P., Gates, C. (2023). A Deep Dive into the VirusTotal File Feed. In: Gruss, D., Maggi, F., Fischer, M., Carminati, M. (eds) Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2023. Lecture Notes in Computer Science, vol 13959. Springer, Cham. https://doi.org/10.1007/978-3-031-35504-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-35504-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-35503-5
Online ISBN: 978-3-031-35504-2
eBook Packages: Computer ScienceComputer Science (R0)