skip to main content
10.1145/3384217.3385625acmotherconferencesArticle/Chapter ViewAbstractPublication PageshotsosConference Proceedingsconference-collections
research-article

WOLF: automated machine learning workflow management framework for malware detection and other applications

Published: 21 September 2020 Publication History

Abstract

Applying machine learning techniques to solve real-world problems is a highly iterative process. The process from idea to code and then to experiment may require up to thousands of iterations to find the optimum set of hyper-parameters. Also, it is hard to find best machine learning techniques for a given dataset. The WOLF framework has been designed to simultaneously automate the process of selecting the best algorithm and searching for the optimum hyper-parameters. It can be useful to both who are novice in machine learning and just want to find best algorithm for their dataset, and also to those who are experts in the field and want to compare their new features or algorithm with state of the art techniques. By incorporating the WOLF framework in their designs, it is easier for novices to apply machine learning techniques on their dataset. With a wide range of evaluation metrics provided, WOLF also helps data scientists to develop better intuition towards machine learning techniques and speed up the process of algorithm development. Another main feature of the WOLF framework is that user can easily integrate new algorithms at any stage of the machine learning pipeline. In this paper, we present the WOLF architecture, and demonstrate how it could be used for standard machine learning datasets and for Android malware detection tasks. Experimental results show the flexibility and performance of WOLF.

References

[1]
I. Sutskever A. Krizhevsky and G. Hinton. 2012.
[2]
Marcin Andrychowicz, Misha Denil, Sergio Gómez Colmenarejo, Matthew W. Hoffman, David Pfau, Tom Schaul, Brendan Shillingford, and Nando de Freitas. 2016. Learning to Learn by Gradient Descent by Gradient Descent. In Proceedings of the 30th International Conference on Neural Information Processing Systems (Barcelona, Spain) (NIPS'16). Curran Associates Inc., Red Hook, NY, USA, 3988--3996.
[3]
D. Arp, M. Spreitzenbarth, M. Huebner, H. Gascon, and K. Rieck. 2014. Drebin: Efficient and Explainable Detection of Android Malware in Your Pocket. In 21st Annual Network and Distributed System Security Symposium (NDSS).
[4]
Michael R. Berthold, Nicolas Cebron, Fabian Dill, Thomas R. Gabriel, Tobias Kötter, Thorsten Meinl, Peter Ohl, Christoph Sieb, Kilian Thiel, and Bernd Wiswedel. 2008. The Konstanz Information Miner (KNIME). In Data Analysis, Machine Learning and Applications. Studies in Classification, Data Analysis, and Knowledge Organization. Springer Berlin, Heidelberg.
[5]
H. Hoos K. Leyton-Brown C. Thornton, F. Hutter. 2013. Auto-weka: combined selection and hyperparameter optimization of classification algorithms. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. 847--855.
[6]
Chih-Chung Chang and Chin-Jen Lin. 2011.
[7]
C.Y.Peng, J.Lee, K. Land, and G.M. Ingersoll. 2002.
[8]
Misha Denil, David Matheson, and Nando de Frietas. 2002.
[9]
S. Eyheramendy, D.D. Lewis, and D. Madigan. 2003. On the Naive Bayes model for text categorization. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics (2003), pp. 332--339.
[10]
Pedregosa F., Varoquaux G., Gramfort A., and Grisel Duchesnay E. Michel V., Thirion. 2011. Scikit-learn: Machine learning in Python. In The Journal of Machine Learning Research. 2825--2830.
[11]
Y. Freund and Schapire R. 1997. A decision theoretic generalization of on-line learning and an application to boosting. In Journal of Computer and System Sciences 119--139.
[12]
Benyamin Ghojogh and Mark Crowley. 2019. Linear and Quadratic Discriminant Analysis: Tutorial. arXiv:1906.02590 [stat.ML]
[13]
Kirthevasan Kandasamy, Jeff Schneider, and Barnabas Poczos. 2017. Query Efficient Posterior Estimation in Scientific Experiments via Bayesian Active Learning. In Artificial Intelligence Journal (AIJ).
[14]
Tim Kraska, Ameet Talwalkar, Rean Griffith John Duchi, Michael Franklin, and Michael Jordan. 2013. MLbase: A Distributed Machine-learning System. In 6th Biennial Conference on Innovative Data Systems Research (CIDR).
[15]
Bertram Ludäscher, Ilkay Altintas, Chad Berkley, Dan Higgins, Efrat Jaeger, Matthew Jones, Edward A. Lee, Jing Tao, and Yang Zhao. 2006. Scientific Workflow Management and the Kepler System: Research Articles. Concurr. Comput.: Pract. Exper. 18, 10 (Aug. 2006), 1039--1065.
[16]
Lichman M. 2013. UCI Machine Learning Repository. Retrieved February 28, 2019 from http://archive.ics.uci.edu/ml
[17]
Randal S. Olson and Jason H. Moore. 2016. TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning. In The Journal of Machine Learning Research (JMLR). 66--74.
[18]
Jim Pivarski, Collin Bennett, and Robert L. Grossman. 2016. Deploying Analytics with the Portable Format for Analytics (PFA). In Proceedings of the 122nd ACM SIGKDD international conference on Knowledge discovery and data mining. 579--588.
[19]
Sebastian Raschka. 2015. Python machine learning : unlock deeper insights into machine learning with this vital guide to cutting-edge predictive analytics (1st. ed.). Packt Publishing Ltd, irmingham B3 2PB, UK.
[20]
T. Schaul, J. Bayer, D. Wierstra, F. Sehnke T. Rückstieß Y. Sun, M. Felder, and J. Schmidhuber. 2010. PyBrain. In The Journal of Machine Learning Research (JMLR).
[21]
Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. 2012. Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems.
[22]
Yan-Yan Song and Ying Lu. 2015. Decision tree methods: applications for classification and prediction. In Shanghai archives of psychiatry vol. 27,2 (2015): 130--5.
[23]
Qiang Yang and Sinno Jialin Pan. 2010. A Survey on Transfer Learning. In IEEE Transactions on Knowledge & Data Engineering. 1345--1359.

Cited By

View all
  • (2021)Two Souls in an Adversarial Image: Towards Universal Adversarial Example Detection using Multi-view InconsistencyProceedings of the 37th Annual Computer Security Applications Conference10.1145/3485832.3485904(31-44)Online publication date: 6-Dec-2021
  1. WOLF: automated machine learning workflow management framework for malware detection and other applications

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    HotSoS '20: Proceedings of the 7th Symposium on Hot Topics in the Science of Security
    September 2020
    189 pages
    ISBN:9781450375610
    DOI:10.1145/3384217
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 September 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. classifier
    2. malware detection
    3. parameter selection

    Qualifiers

    • Research-article

    Conference

    HotSoS '20
    HotSoS '20: Hot Topics in the Science of Security
    September 21 - 23, 2020
    Kansas, Lawrence

    Acceptance Rates

    Overall Acceptance Rate 34 of 60 submissions, 57%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Two Souls in an Adversarial Image: Towards Universal Adversarial Example Detection using Multi-view InconsistencyProceedings of the 37th Annual Computer Security Applications Conference10.1145/3485832.3485904(31-44)Online publication date: 6-Dec-2021

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media