skip to main content
10.1145/2661829.2662086acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Rebuilding the Tower of Babel: Towards Cross-System Malware Information Sharing

Published:03 November 2014Publication History

ABSTRACT

Anti-virus systems developed by different vendors often demonstrate strong discrepancies in how they name malware, which signficantly hinders malware information sharing. While existing work has proposed a plethora of malware naming standards, most anti-virus vendors were reluctant to change their own naming conventions. In this paper we explore a new, more pragmatic alternative. We propose to exploit the correlation between malware naming of different anti-virus systems to create their consensus classification, through which these systems can share malware information without modifying their naming conventions. Specifically we present Latin, a novel classification integration framework leveraging the correspondence between participating anti-virus systems as reflected in heterogeneous information sources at instance-instance, instance-name, and name-name levels. We provide results from extensive experimental studies using real malware datasets and concrete use cases to verify the efficacy of Latin in supporting cross-system malware information sharing.

References

  1. M. Bailey, J. Andersen, Z. M. Mao, and F. Jahanian. Automated classification and analysis of internet malware. In RAID, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. A. Bernstein, J. Madhavan, and E. Rahm. Generic schema matching, ten years later. In VLDB, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. Bontchev. Current status of the caro malware naming scheme. www.people.frisk-software.com/?bontchev/papers/naming.html.Google ScholarGoogle Scholar
  5. P.-M. Bureau and D. Harley. A dose by any other name. In VB, 2008.Google ScholarGoogle Scholar
  6. S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani. Robust and efficient fuzzy match for online data cleaning. In SIGMOD, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. CNET. Most popular security software: www.cnet.com.au/software/security/most-popular.htm, 2012.Google ScholarGoogle Scholar
  8. Damballa. Integration partners: www.damballa.com/solutions/integration_partners.php.Google ScholarGoogle Scholar
  9. A. Doan, P. Domingos, and A. Y. Halevy. Reconciling schemas of disparate data sources: a machine-learning approach. In SIGMOD, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. FitzGerald. A virus by any other name: Towards the revised caro naming convention. In AVAR, 2002.Google ScholarGoogle Scholar
  11. F. Giunchiglia and P. Shvaiko. Semantic matching. Knowl. Eng. Rev., 18(3):265--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. D. Harley. The game of the name malware naming, shape shifters and sympathetic magic. In CFET, 2009.Google ScholarGoogle Scholar
  13. J. A. Hartigan. Clustering Algorithms. John Wiley & Sons, Inc., 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Kelchner. The (in)consistent naming of malcode. Computer Fraud & Security, 2010(2):5--7.Google ScholarGoogle ScholarCross RefCross Ref
  15. F. Lin and W. W. Cohen. Power iteration clustering. In ICML, 2010.Google ScholarGoogle Scholar
  16. J. Lin. Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theor., 37(1):145--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Long, Z. M. Zhang, and P. S. Yu. Combining multiple clusterings by soft correspondence. In ICDM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Luo, R. C. Wilson, and E. R. Hancock. Spectral clustering of graphs. In GbRPR, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Madhavan, P. A. Bernstein, and E. Rahm. Generic schema matching with cupid. In VLDB, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. F. Maggi, A. Bellini, G. Salvaneschi, and S. Zanero. Finding non-trivial malware naming inconsistencies. In ICISS, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. B. Newcombe and J. M. Kennedy. Record linkage: Making maximum use of the discriminating power of identifying information. Commun. ACM, 5(11):563--566. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. D. Preda, M. Christodorescu, S. Jha, and S. Debray. A semantics-based approach to malware detection. In POPL, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Rieck, P. Trinius, C. Willems, and T. Holz. Automatic analysis of malware behavior using machine learning. J. Comput. Secur., 19(4):639--668. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Scheidl. Virus naming convention 1999 (vnc99). http://members.chello.at/erikajo/vnc99b2.txt.Google ScholarGoogle Scholar
  26. T. Wang and R. Pottinger. Semap: a generic mapping construction system. In EDBT, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Ye, T. Li, Y. Chen, and Q. Jiang. Automatic malware categorization using cluster ensemble. In KDD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Rebuilding the Tower of Babel: Towards Cross-System Malware Information Sharing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
          November 2014
          2152 pages
          ISBN:9781450325981
          DOI:10.1145/2661829

          Copyright © 2014 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 3 November 2014

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          CIKM '14 Paper Acceptance Rate175of838submissions,21%Overall Acceptance Rate1,861of8,427submissions,22%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader