Skip to main content
Log in

A Two-Stage Methodology Using K-NN and False-Positive Minimizing ELM for Nominal Data Classification

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

This paper focuses on the problem of making decisions in the context of nominal data under specific constraints. The underlying goal driving the methodology proposed here is to build a decision-making model capable of classifying as many samples as possible while avoiding false positives at all costs, all within the smallest possible computational time. Under such constraints, one of the best type of model is the cognitive-inspired extreme learning machine (ELM), for the final decision process. A two-stage decision methodology using two types of classifiers, a distance-based one, K-NN, and the cognitive-based one, ELM, provides a fast means of obtaining a classification decision on a sample, keeping false positives as low as possible while classifying as many samples as possible (high coverage). The methodology only has two parameters, which, respectively, set the precision of the distance approximation and the final trade-off between false-positive rate and coverage. Experimental results using a specific dataset provided by F-Secure Corporation show that this methodology provides a rapid decision on new samples, with a direct control over the false positives and thus on the decision capabilities of the model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Details of the implementation are not given in this paper, but can be found from the publications and deliverables of the Finnish ICT SHOK Programme Future Internet: http://www.futureinternet.fi

References

  1. Lele S, Richtsmeier JT. Euclidean distance matrix analysis: a coordinate-free approach for comparing biological shapes using landmark data. Am J Phys Anthropol. 1991;86(3):415–27.

    Article  CAS  PubMed  Google Scholar 

  2. Broder AZ, Glassman SC, Manasse MS, Zweig G. Syntactic clustering of the Web. Comput Netw ISDN Syst. 1997;29(8–13):1157–66.

    Article  Google Scholar 

  3. Broder AZ. On the resemblance and containment of documents. In: Compression and complexity of sequences (SEQUENCES’97). IEEE Computer Society; 1997. p. 21–29.

  4. Robiah Y, Rahayu SS, Zaki MM, Shahrin S, Faizal MA, Marliza R. A new generic taxonomy on hybrid malware detection technique. arXiv.org cs.CR.

  5. Srivastava A, Giffin J. Automatic discovery of parasitic malware. In: Jha S, Sommer R, Kreibich C, editors. Recent advances in intrusion detection (RAID’10). Berlin: Springer; 2010. p. 97–117.

    Chapter  Google Scholar 

  6. Bailey M, Andersen J, Morleymao Z, Jahanian F. Automated classification and analysis of internet malware. In: Recent advances in intrusion detection (RAID’07); 2007.

  7. F-Secure Corporation. F-Secure DeepGuard: a proactive response to the evolving threat scenario. 2006. http://www.f-secure.com/export/system/fsgalleries/white-papers/f-secure_deepguard_whitepaper-06-11-2006.pdf

  8. Willems C, Holz T, Freiling F. Toward automated dynamic malware analysis using CWSandbox. IEEE Secur Priv. 2007;5:32–9.

    Article  Google Scholar 

  9. Yoshioka K, Hosobuchi Y, Orii T, Matsumoto T. Vulnerability in public malware sandbox analysis systems. In: Proceedings of the 2010 10th IEEE/IPSJ international symposium on applications and the internet. Washington, DC: IEEE Computer Society; 2010. pp. 265–268.

  10. Jaccard P. Étude comparative de la distribution florale dans une portion des alpes et du jura. Bulletin de la Société Vaudoise des Sciences Naturelles. 1901;37:547–79.

    Google Scholar 

  11. Tan P-N, Steinbach M, Kumar V. Introduction to data mining. 1st ed. Boston: Addison Wesley; 2005.

    Google Scholar 

  12. Python. Python algorithms complexity. 2010. http://wiki.python.org/moin/TimeComplexity#set http://wiki.python.org/moin/TimeComplexity#set.

  13. Carter JL, Wegman MN. Universal classes of hash functions. J Comput Syst Sci. 1979;18(2):143–54.

    Article  Google Scholar 

  14. Broder AZ, Charikar M, Frieze AM, Mitzenmacher M. Min-wise independent permutations. J Compu Syst Sci. 1998;60:327–36.

    Google Scholar 

  15. Cover TM, Hart PE. Nearest neighbor pattern classification. IEEE Trans Inf Theory. 1967;13(1):21–7.

    Article  Google Scholar 

  16. Huang G-B, Zhu Q-Y, Siew C-K. Extreme learning machine: theory and applications. Neurocomputing. 2006;70:489–501.

    Article  Google Scholar 

  17. Huang G-B, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern. 2012;42(2):513–29.

    Article  Google Scholar 

  18. Huang G-B, Zhu Q-Y, Mao KZ, Siew C-K, Saratchandran P, Sundararajan N. Can threshold networks be trained directly? IEEE Trans Circuits Syst II Express Briefs. 2006;53(3):187–91.

    Article  Google Scholar 

  19. Huang G-B, Chen L, Siew C-K. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans Neural Netw. 2006;17(4):879–92.

    Article  PubMed  Google Scholar 

  20. Savitha R, Suresh S, Kim H. A meta-cognitive learning algorithm for an extreme learning machine classifier. Cogn Comput. (2013);1–11. doi:10.1007/s12559-013-9223-2.

  21. Grassi M, Cambria E, Hussain A, Piazza F. Sentic web: a new paradigm for managing social media affective information. Cogn Comput. 2011;3(3):480–9. doi:10.1007/s12559-011-9101-8.

    Article  Google Scholar 

  22. Cambria E, Hussain A. Sentic album: content-, concept-, and context-based online personal photo management system. Cogn Comput. 2012;4(4):477–96. doi:10.1007/s12559-012-9145-4.

    Article  Google Scholar 

  23. Cambria E, Hussain A. Sentic computing: techniques, tools, and applications, springerBriefs in cognitive computation. Springer: Dordrecht; 2012. doi:10.1007/978-94-007-5070-8.

  24. Wang Q-F, Cambria E, Liu C-L, Hussain A. Common sense knowledge for handwritten chinese text recognition. Cogn Comput. 2013;5(2):234–42. doi:10.1007/s12559-012-9183-y.

    Article  Google Scholar 

  25. Rao CR, Mitra SK. Generalized inverse of matrices and its applications. New York: Wiley; 1971.

    Google Scholar 

  26. Myers R. Classical and modern regression with applications. 2nd ed. Pacific Grove, CA: Duxbury Press; 1990.

    Google Scholar 

  27. Bontempi G, Birattari M, Bersini H, Recursive lazy learning for modeling and control. In: European conference on machine learning. 1998; pp. 292–303.

  28. Miche Y, van Heeswijk M, Bas P, Simula O, Lendasse A. TROP-ELM: a double-regularized ELM using LARS and tikhonov regularization. Neurocomputing. 2011;74(16):2413–21. doi:10.1016/j.neucom.2010.12.042.

    Article  Google Scholar 

  29. Group E. The op-elm toolbox. 2009. http://www.cis.hut.fi/projects/eiml/research/downloads/op-elm-toolbox.

  30. Miche Y, Sorjamaa A, Bas P, Simula O, Jutten C, Lendasse A. OP-ELM: optimally-pruned extreme learning machine. IEEE Trans Neural Netw. 2010;21(1):158–62. doi:10.1109/TNN.2009.2036259.

    Article  PubMed  Google Scholar 

  31. Miche Y, Bas P, Jutten C, Simula O, Lendasse A. A methodology for building regression models using extreme learning machine: OP-ELM. In: Verleysen M, editor. ESANN 2008, European Symposium on Artificial Neural Networks, Bruges, Belgium, d-side publ. Belgium: Evere; 2008. p. 247–52.

  32. van Heeswijk M, Miche Y, Oja E, Lendasse A. GPU-accelerated and parallelized ELM ensembles for large-scale regression. Neurocomputing. 2011;74(16):2430–7. doi:10.1016/j.neucom.2010.11.034.

    Article  Google Scholar 

  33. van Heeswijk M, Miche Y, Oja E, Lendasse A. Solving large regression problems using an ensemble of GPU-accelerated ELMs. In: Verleysen M, editor. ESANN2010: 18th European symposium on artificial neural networks, computational intelligence and machine learning, d-side publications. Belgium: Bruges; 2010. p. 309–14.

  34. Lan Y, Soh YC, Huang G-B. Constructive hidden nodes selection of extreme learning machine for regression. Neurocomputing. 2010;73(16–18):3191–9.

    Article  Google Scholar 

  35. Frank A, Asuncion A. UCI machine learning repository. 2010. http://archive.ics.uci.edu/ml.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anton Akusok.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Akusok, A., Miche, Y., Hegedus, J. et al. A Two-Stage Methodology Using K-NN and False-Positive Minimizing ELM for Nominal Data Classification. Cogn Comput 6, 432–445 (2014). https://doi.org/10.1007/s12559-014-9253-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-014-9253-4

Keywords

Navigation