research-article

WOLF: automated machine learning workflow management framework for malware detection and other applications

Authors:

Bo LuoAuthors Info & Claims

HotSoS '20: Proceedings of the 7th Symposium on Hot Topics in the Science of Security

Article No.: 11, Pages 1 - 8

https://doi.org/10.1145/3384217.3385625

Published: 21 September 2020 Publication History

Abstract

Applying machine learning techniques to solve real-world problems is a highly iterative process. The process from idea to code and then to experiment may require up to thousands of iterations to find the optimum set of hyper-parameters. Also, it is hard to find best machine learning techniques for a given dataset. The WOLF framework has been designed to simultaneously automate the process of selecting the best algorithm and searching for the optimum hyper-parameters. It can be useful to both who are novice in machine learning and just want to find best algorithm for their dataset, and also to those who are experts in the field and want to compare their new features or algorithm with state of the art techniques. By incorporating the WOLF framework in their designs, it is easier for novices to apply machine learning techniques on their dataset. With a wide range of evaluation metrics provided, WOLF also helps data scientists to develop better intuition towards machine learning techniques and speed up the process of algorithm development. Another main feature of the WOLF framework is that user can easily integrate new algorithms at any stage of the machine learning pipeline. In this paper, we present the WOLF architecture, and demonstrate how it could be used for standard machine learning datasets and for Android malware detection tasks. Experimental results show the flexibility and performance of WOLF.

References

[1]

I. Sutskever A. Krizhevsky and G. Hinton. 2012.

[2]

Marcin Andrychowicz, Misha Denil, Sergio Gómez Colmenarejo, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando de Freitas. 2016. Learning to Learn by Gradient Descent by Gradient Descent. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS'16). Curran Associates Inc., Red Hook, NY, USA, 3988--3996.

Digital Library

[3]

D. Arp, M. Spreitzenbarth, M. Huebner, H. Gascon, and K. Rieck. 2014. Drebin: Efficient and Explainable Detection of Android Malware in Your Pocket. In 21st Annual Network and Distributed System Security Symposium (NDSS).

[4]

Michael R. Berthold, Nicolas Cebron, Fabian Dill, Thomas R. Gabriel, Tobias Kötter, Thorsten Meinl, Peter Ohl, Christoph Sieb, Kilian Thiel, and Bernd Wiswedel. 2008. The Konstanz Information Miner (KNIME). In Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Berlin, Heidelberg.

[5]

H. Hoos K. Leyton-Brown C. Thornton, F. Hutter. 2013. Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 847--855.

Digital Library

[6]

Chih-Chung Chang and Chin-Jen Lin. 2011.

[7]

C.Y.Peng, J.Lee, K. Land, and G.M. Ingersoll. 2002.

[8]

Misha Denil, David Matheson, and Nando de Frietas. 2002.

[9]

S. Eyheramendy, D.D. Lewis, and D. Madigan. 2003. On the Naive Bayes model for text categorization. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (2003), pp. 332--339.

[10]

Pedregosa F., Varoquaux G., Gramfort A., and Grisel Duchesnay E. Michel V., Thirion. 2011. Scikit-learn: Machine learning in Python. In The Journal of Machine Learning Research. 2825--2830.

[11]

Y. Freund and Schapire R. 1997. A decision theoretic generalization of on-line learning and an application to boosting. In Journal of Computer and System Sciences 119--139.

[12]

Benyamin Ghojogh and Mark Crowley. 2019. Linear and Quadratic Discriminant Analysis: Tutorial. arXiv:1906.02590 [stat.ML]

[13]

Kirthevasan Kandasamy, Jeff Schneider, and Barnabas Poczos. 2017. Query Efficient Posterior Estimation in Scientific Experiments via Bayesian Active Learning. In Artificial Intelligence Journal (AIJ).

[14]

Tim Kraska, Ameet Talwalkar, Rean Griffith John Duchi, Michael Franklin, and Michael Jordan. 2013. MLbase: A Distributed Machine-learning System. In 6th Biennial Conference on Innovative Data Systems Research (CIDR).

[15]

Bertram Ludäscher, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones, Edward A. Lee, Jing Tao, and Yang Zhao. 2006. Scientific Workflow Management and the Kepler System: Research Articles. Concurr. Comput.: Pract. Exper. 18, 10 (Aug. 2006), 1039--1065.

[16]

Lichman M. 2013. UCI Machine Learning Repository. Retrieved February 28, 2019 from http://archive.ics.uci.edu/ml

[17]

Randal S. Olson and Jason H. Moore. 2016. TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning. In The Journal of Machine Learning Research (JMLR). 66--74.

[18]

Jim Pivarski, Collin Bennett, and Robert L. Grossman. 2016. Deploying Analytics with the Portable Format for Analytics (PFA). In Proceedings of the 122nd ACM SIGKDD international conference on Knowledge discovery and data mining. 579--588.

[19]

Sebastian Raschka. 2015. Python machine learning : unlock deeper insights into machine learning with this vital guide to cutting-edge predictive analytics (1st. ed.). Packt Publishing Ltd, irmingham B3 2PB, UK.

[20]

T. Schaul, J. Bayer, D. Wierstra, F. Sehnke T. Rückstieß Y. Sun, M. Felder, and J. Schmidhuber. 2010. PyBrain. In The Journal of Machine Learning Research (JMLR).

[21]

Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems.

Digital Library

[22]

Yan-Yan Song and Ying Lu. 2015. Decision tree methods: applications for classification and prediction. In Shanghai archives of psychiatry vol. 27,2 (2015): 130--5.

[23]

Qiang Yang and Sinno Jialin Pan. 2010. A Survey on Transfer Learning. In IEEE Transactions on Knowledge & Data Engineering. 1345--1359.

Cited By

Kiani SAwan SLan CLi FLuo B(2021)Two Souls in an Adversarial Image: Towards Universal Adversarial Example Detection using Multi-view InconsistencyProceedings of the 37th Annual Computer Security Applications Conference10.1145/3485832.3485904(31-44)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.1145/3485832.3485904

WOLF: automated machine learning workflow management framework for malware detection and other applications
1. Computing methodologies

Recommendations

PSO Algorithm for Support Vector Machine
ISECS '10: Proceedings of the 2010 Third International Symposium on Electronic Commerce and Security

Statistical Learning Theory focuses on the machine learning theory for small samples. Support vector machine (SVM) are new methods based on statistical learning theory. There are many kinds of function can be used for kernel of SVM. Wavelet function is ...
ELM-based name disambiguation in bibliography

It is common that different people share the same name. When it occurs in bibliography databases, it worsens the performance of information retrieval and data management. In this paper, we address the problem of name disambiguation and propose two ...
A generalized Gilbert algorithm and an improved MIES for one-class support vector machine

The primal maximum margin problem of OCSVM is equivalent to a nearest point problem.A generalized Gilbert (GG) algorithm is proposed to solve the nearest point problem.An improved MIES is developed for the Gaussian kernel parameter selection.The GG ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

HotSoS '20: Proceedings of the 7th Symposium on Hot Topics in the Science of Security

September 2020

189 pages

ISBN:9781450375610

DOI:10.1145/3384217

General Chair:
Perry Alexander
The University of Kansas
,
Program Chairs:
Drew Davidson
The University of Kansas
,
Baek-Young Choi
University of Missouri, Kansas City

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 September 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

HotSoS '20

HotSoS '20: Hot Topics in the Science of Security

September 21 - 23, 2020

Kansas, Lawrence

Acceptance Rates

Overall Acceptance Rate 34 of 60 submissions, 57%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
87
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 19 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kiani SAwan SLan CLi FLuo B(2021)Two Souls in an Adversarial Image: Towards Universal Adversarial Example Detection using Multi-view InconsistencyProceedings of the 37th Annual Computer Security Applications Conference10.1145/3485832.3485904(31-44)Online publication date: 6-Dec-2021
https://dl.acm.org/doi/10.1145/3485832.3485904

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten