research-article

Rebuilding the Tower of Babel: Towards Cross-System Malware Information Sharing

Authors:
Ting Wang

IBM Research, Yorktown Heights, NY, USA

IBM Research, Yorktown Heights, NY, USA
View Profile

,
Shicong Meng

Facebook, Menlo Park, CA, USA

Facebook, Menlo Park, CA, USA
View Profile

,
Wei Gao

University of Tennessee, Knoxville, TN, USA

University of Tennessee, Knoxville, TN, USA
View Profile

,
Xin Hu

IBM Research, Yorktown Heights, NY, USA

IBM Research, Yorktown Heights, NY, USA
View Profile

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge ManagementNovember 2014Pages 1239–1248https://doi.org/10.1145/2661829.2662086

Published:03 November 2014Publication History

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

Pages 1239–1248

ABSTRACT

Anti-virus systems developed by different vendors often demonstrate strong discrepancies in how they name malware, which signficantly hinders malware information sharing. While existing work has proposed a plethora of malware naming standards, most anti-virus vendors were reluctant to change their own naming conventions. In this paper we explore a new, more pragmatic alternative. We propose to exploit the correlation between malware naming of different anti-virus systems to create their consensus classification, through which these systems can share malware information without modifying their naming conventions. Specifically we present Latin, a novel classification integration framework leveraging the correspondence between participating anti-virus systems as reflected in heterogeneous information sources at instance-instance, instance-name, and name-name levels. We provide results from extensive experimental studies using real malware datasets and concrete use cases to verify the efficacy of Latin in supporting cross-system malware information sharing.

References

M. Bailey, J. Andersen, Z. M. Mao, and F. Jahanian. Automated classification and analysis of internet malware. In RAID, 2007. Google ScholarDigital Library
P. A. Bernstein, J. Madhavan, and E. Rahm. Generic schema matching, ten years later. In VLDB, 2011.Google ScholarDigital Library
D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. J. Mach. Learn. Res., 3:993--1022, 2003. Google ScholarDigital Library
V. Bontchev. Current status of the caro malware naming scheme. www.people.frisk-software.com/?bontchev/papers/naming.html.Google Scholar
P.-M. Bureau and D. Harley. A dose by any other name. In VB, 2008.Google Scholar
S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani. Robust and efficient fuzzy match for online data cleaning. In SIGMOD, 2003. Google ScholarDigital Library
CNET. Most popular security software: www.cnet.com.au/software/security/most-popular.htm, 2012.Google Scholar
Damballa. Integration partners: www.damballa.com/solutions/integration_partners.php.Google Scholar
A. Doan, P. Domingos, and A. Y. Halevy. Reconciling schemas of disparate data sources: a machine-learning approach. In SIGMOD, 2001. Google ScholarDigital Library
N. FitzGerald. A virus by any other name: Towards the revised caro naming convention. In AVAR, 2002.Google Scholar
F. Giunchiglia and P. Shvaiko. Semantic matching. Knowl. Eng. Rev., 18(3):265--280. Google ScholarDigital Library
D. Harley. The game of the name malware naming, shape shifters and sympathetic magic. In CFET, 2009.Google Scholar
J. A. Hartigan. Clustering Algorithms. John Wiley & Sons, Inc., 1975. Google ScholarDigital Library
T. Kelchner. The (in)consistent naming of malcode. Computer Fraud & Security, 2010(2):5--7.Google ScholarCross Ref
F. Lin and W. W. Cohen. Power iteration clustering. In ICML, 2010.Google Scholar
J. Lin. Divergence measures based on the shannon entropy. IEEE Trans. Inf. Theor., 37(1):145--151. Google ScholarDigital Library
B. Long, Z. M. Zhang, and P. S. Yu. Combining multiple clusterings by soft correspondence. In ICDM, 2005. Google ScholarDigital Library
B. Luo, R. C. Wilson, and E. R. Hancock. Spectral clustering of graphs. In GbRPR, 2003. Google ScholarDigital Library
J. Madhavan, P. A. Bernstein, and E. Rahm. Generic schema matching with cupid. In VLDB, 2001. Google ScholarDigital Library
F. Maggi, A. Bellini, G. Salvaneschi, and S. Zanero. Finding non-trivial malware naming inconsistencies. In ICISS, 2011. Google ScholarDigital Library
S. Melnik, H. Garcia-Molina, and E. Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In ICDE, 2002. Google ScholarDigital Library
H. B. Newcombe and J. M. Kennedy. Record linkage: Making maximum use of the discriminating power of identifying information. Commun. ACM, 5(11):563--566. Google ScholarDigital Library
M. D. Preda, M. Christodorescu, S. Jha, and S. Debray. A semantics-based approach to malware detection. In POPL, 2007. Google ScholarDigital Library
K. Rieck, P. Trinius, C. Willems, and T. Holz. Automatic analysis of malware behavior using machine learning. J. Comput. Secur., 19(4):639--668. Google ScholarDigital Library
G. Scheidl. Virus naming convention 1999 (vnc99). http://members.chello.at/erikajo/vnc99b2.txt.Google Scholar
T. Wang and R. Pottinger. Semap: a generic mapping construction system. In EDBT, 2008. Google ScholarDigital Library
Y. Ye, T. Li, Y. Chen, and Q. Jiang. Automatic malware categorization using cluster ensemble. In KDD, 2010. Google ScholarDigital Library

Index Terms

Rebuilding the Tower of Babel: Towards Cross-System Malware Information Sharing
1. Information systems
  1. Information systems applications
    1. Data mining
2. Security and privacy
  1. Intrusion/anomaly detection and malware mitigation
  2. Systems security
    1. Operating systems security

Recommendations

Babel's tower revisited: a universal resource for cross-referencing across annotation databases

Motivation: Annotation databases are widely used as public repositories of biological knowledge. However, most of these resources have been developed by independent groups which used different designs and different identifiers for the same biological ...
Read More
Classification integration and reclassification using constraint databases

Objective: We propose classification integration as a new method for data integration from different sources. We also propose reclassification as a new method of combining existing medical classifications for different classes. Background: In many ...
Read More
WormTerminator: an effective containment of unknown and polymorphic fast spreading worms
ANCS '06: Proceedings of the 2006 ACM/IEEE symposium on Architecture for networking and communications systems

The fast spreading worm is becoming one of the most serious threats to today's networked information systems. A fast spreading worm could infect hundreds of thousands of hosts within a few minutes. In order to stop a fast spreading worm, we need the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management
November 2014
2152 pages
ISBN:9781450325981
DOI:10.1145/2661829
General Chairs:
Jianzhong Li
Harbin Inst. of Technology
,
X. Sean Wang
Fudan University
,
Program Chairs:
Minos Garofalakis
Technical University of Crete, Greece
,
Ian Soboroff
National Institute of Standards, USA
,
Torsten Suel
New York University, USA
,
Min Wang
Google Research, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 3 November 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
classification integration
consensus learning
malware naming
Qualifiers
- research-article
Conference

Acceptance Rates
CIKM '14 Paper Acceptance Rate175of838submissions,21%Overall Acceptance Rate1,861of8,427submissions,22%
More
Upcoming Conference
CIKM '24

Sponsor:

sigir

sigir

The 33rd ACM International Conference on Information and Knowledge Management

October 21 - 25, 2024

Boise , ID , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 126
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Rebuilding the Tower of Babel: Towards Cross-System Malware Information Sharing

CIKM '14: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management

ABSTRACT

References

Cited By

Index Terms

Recommendations

Babel's tower revisited: a universal resource for cross-referencing across annotation databases

Classification integration and reclassification using constraint databases

WormTerminator: an effective containment of unknown and polymorphic fast spreading worms