research-article

A Novel Classification Technique based on Formal Methods

Authors:

Gerardo Canfora,

Francesco Mercaldo,

Antonella SantoneAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data, Volume 17, Issue 8

Article No.: 113, Pages 1 - 30

https://doi.org/10.1145/3592796

Published: 28 June 2023 Publication History

Abstract

In last years, we are witnessing a growing interest in the application of supervised machine learning techniques in the most disparate fields. One winning factor of machine learning is represented by its ability to easily create models, as it does not require prior knowledge about the application domain. Complementary to machine learning are formal methods, that intrinsically offer safeness check and mechanism for reasoning on failures. Considering the weaknesses of machine learning, a new challenge could be represented by the use of formal methods. However, formal methods require the expertise of the domain, knowledge about modeling language with its semantic and mathematical rigour to specify properties. In this article, we propose a novel learning technique based on the adoption of formal methods for classification thanks to the automatic generation both of the formula and of the model. In this way the proposed method does not require any human intervention and thus it can be applied also to complex/large datasets. This leads to less effort both in using formal methods and in a better explainability and reasoning about the obtained results. Through a set of case studies from different real-world domains (i.e., driver detection, scada attack identification, arrhythmia characterization, mobile malware detection, and radiomics for lung cancer analysis), we demonstrate the usefulness of the proposed method, by showing that we are able to overcome the performances obtained from widespread classification algorithms.

References

[1]

Tom M. Mitchell. 1999. Machine learning and data mining. Communications of the ACM 42, 11 (1999), 30–36.

Digital Library

[2]

Tom Michael Mitchell. 2006. The Discipline of Machine Learning. Carnegie Mellon University, School of Computer Science, Machine Learning ....

[3]

David Lorge Parnas. 2017. The real risks of artificial intelligence. Communications of the ACM 60, 10 (2017), 27–31.

Digital Library

[4]

David Lorge Parnas. 1988. Why engineers should not use artificial intelligence. INFOR: Information Systems and Operational Research 26, 4 (1988), 234–246. DOI:

[5]

Antonella Santone, Gigliola Vaglini, and Maria Luisa Villani. 2013. Incremental construction of systems: An efficient characterization of the lacking sub-system. Science of Computer Programming 78, 9 (2013), 1346–1367.

[6]

A. Santone. 2003. Heuristic search + local model checking in selective mu-calculus. IEEE Transactions on Software Engineering 29, 6 (2003), 510–523.

Digital Library

[7]

Robin Milner. 1984. Lectures on a calculus for communicating systems. In Proceedings of the International Conference on Concurrency. Springer, 197–220.

[8]

E. Allen Emerson. 1997. Model checking and the mu-calculus. DIMACS Series in Discrete Mathematics 31, 31 (1997), 185–214.

[9]

S. Gradara, A. Santone, and M. L. Villani. 2006. DELFIN+: An efficient deadlock detection tool for CCS processes. Journal of Computer and System Sciences 72, 8 (2006), 1397–1412.

Digital Library

[10]

Nicoletta De Francesco, Giuseppe Lettieri, Antonella Santone, and Gigliola Vaglini. 2016. Heuristic search for equivalence checking. Software and System Modeling 15, 2 (2016), 513–530. DOI:

[11]

Colin Stirling. 1989. An introduction to modal and temporal logics for CCS. In Proceedings of the Concurrency: Theory, Language, and Architecture. 2–20.

[12]

Robin Milner. 1989. Communication and Concurrency. Prentice Hall.

Digital Library

[13]

Rance Cleaveland and Steve Sims. 1996. The NCSU concurrency workbench. In Proceedings of the International Conference on Computer Aided Verification. Springer, 394–397.

[14]

James Dougherty, Ron Kohavi, and Mehran Sahami. 1995. Supervised and unsupervised discretization of continuous features. In Proceedings of the Machine Learning Proceedings 1995. Elsevier, 194–202.

[15]

Mario Luca Bernardi, Marta Cimitile, Fabio Martinelli, and Francesco Mercaldo. 2018. Driver and path detection through time-series classification. Journal of Advanced Transportation 2018 23, 1758731 (2018), 1–21.

[16]

Maria Francesca Carfora, Fabio Martinelli, Francesco Mercaldo, Vittoria Nardone, Albina Orlando, Antonella Santone, and Gigliola Vaglini. 2018. A “pay-how-you-drive” car insurance approach through cluster analysis. Soft Computing 23, 13 (2018), 1–13.

[17]

Riccardo Taormina, Stefano Galelli M. ASCE, Nils Ole Tippenhauer, Elad Salomons, Avi Ostfeld F.ASCE, Demetrios G. Eliades, Mohsen Aghashahi S.M.ASCE, Raanju Sundararajan, Mohsen Pourahmadi, M. Katherine Banks F.ASCE, B. M. Brentan, Enrique Campbell, G. Lima, D. Manzi, D. Ayala-Cabrera, M. Herrera, I. Montalvo, J. Izquierdo, E. Luvizotto Jr., Sarin E. Chandy, Amin Rasekh, M.ASCE, Zachary A. Barker, Bruce Campbell, M. Ehsan Shafiee, Marcio Giacomoni, Nikolaos Gatsis, Ahmad Taha, Ahmed A. Abokifa, S.M.ASCE, Kelsey Haddad, Cynthia S. Lo, Pratim Biswas, M. Fayzul K. Pasha, Bijay Kc, Saravanakumar Lakshmanan Somasundaram, Mashor Housh, and Ziv Ohar. 2018. Battle of the attack detection algorithms: Disclosing cyber attacks on water distribution networks. Journal of Water Resources Planning and Management 144, 8 (2018), 04018048.

[18]

Mohammad Kachuee, Shayan Fazeli, and Majid Sarrafzadeh. 2018. ECG heartbeat classification: A deep transferable representation. IEEE International Conference on Healthcare Informatics (ICHI’18), IEEE, 443–444.

[19]

Daniel Arp, Michael Spreitzenbarth, Malte Hubner, Hugo Gascon, Konrad Rieck, and CERT Siemens. 2014. Drebin: Effective and explainable detection of android malware in your pocket. In Proceedings of the Ndss. 23–26.

[20]

Spreitzenbarth Michael, Echtler Florian, Schreck Thomas, C. Freiling Felix, and Johannes Hoffmann. 2013. Mobilesandbox: Looking deeper into android applications. In Proceedings of the 28th International ACM Symposium on Applied Computing.

[21]

Mario G. C. A. Cimino, Nicoletta De Francesco, Francesco Mercaldo, Antonella Santone, and Gigliola Vaglini. 2020. Model checking for malicious family detection and phylogenetic analysis in mobile environment. Computers and Security 90, 90 (2020), 101691.

Digital Library

[22]

Alfonso Reginelli, Roberta Grassi, Beatrice Feragalli, Maria Paola Belfiore, Alessandro Montanelli, Gianluigi Patelli, Michelearcangelo La Porta, Fabrizio Urraro, Roberta Fusco, Vincenza Granata, Antonella Petrillo, Giuliana Giacobbe, Gaetano Maria Russo, Palmino Sacco, Roberto Grassi, and Salvatore Cappabianca. 2021. Coronavirus disease 2019 (COVID-19) in Italy: Double reading of chest CT examination. Biology 10, 2 (2021), 1–10. DOI:

[23]

Alfonso Reginelli, Valerio Nardone, Giuliana Giacobbe, Maria Paola Belfiore, Roberta Grassi, Ferdinando Schettino, Mariateresa Del Canto, Roberto Grassi, and Salvatore Cappabianca. 2021. Radiomics as a new frontier of imaging for cancer prognosis: A narrative review. Diagnostics 11, 10 (2021), 1–22. DOI:

[24]

Luca Brunese, Francesco Mercaldo, Alfonso Reginelli, and Antonella Santone. 2019. Neural networks for lung cancer detection through radiomic features. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN’19). IEEE, 1–10.

[25]

Steven L. Salzberg. 1994. C4. 5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers.

[26]

Geoff Hulten, Laurie Spencer, and Pedro Domingos. 2001. Mining time-changing data streams. In Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 97–106.

Digital Library

[27]

Remco R. Bouckaert. 2008. Bayesian network classifiers in weka for version 3-5-7. Artificial Intelligence Tools 11, 3 (2008), 369–387.

[28]

William W. Cohen. 1995. Fast effective rule induction. In Proceedings of the Machine Learning Proceedings 1995. Elsevier, 115–123.

[29]

Chih-Chung Chang. 2001. LIBSVM: A library for support vector machines, 2001. Retrieved from http://www.csie.ntu.edu.tw/cjlin/libsvm.

[30]

Zhaodan Kong, Austin Jones, and Calin Belta. 2017. Temporal logics for learning and detection of anomalous behavior. IIEEE Transactions on Automatic Control 62, 3 (2017), 1210–1222.

[31]

Marcell Vazquez-Chanlatte, Jyotirmoy V. Deshmukh, Xiaoqing Jin, and Sanjit A. Seshia. 2017. Logical clustering and learning for time-series data. In Proceedings of the International Conference on Computer Aided Verification. Springer, 305–325.

[32]

David J. Ketchen and Christopher L. Shook. 1996. The application of cluster analysis in strategic management research: An analysis and critique. Strategic Management Journal 17, 6 (1996), 441–458.

[33]

Sara Bufo, Ezio Bartocci, Guido Sanguinetti, Massimo Borelli, Umberto Lucangelo, and Luca Bortolussi. 2014. Temporal logic based monitoring of assisted ventilation in intensive care patients. In Proceedings of the International Symposium On Leveraging Applications of Formal Methods, Verification and Validation. Springer, 391–403.

[34]

Laurence Calzone, Nathalie Chabrier-Rivier, François Fages, and Sylvain Soliman. 2006. Machine learning biochemical networks from temporal logic properties. In Proceedings of the Transactions on Computational Systems Biology VI. Springer, 68–94.

Digital Library

[35]

Bing Liu, Yiming Ma, and Ching-Kian Wong. 2001. Classification using association rules: Weaknesses and enhancements. In Proceedings of the Data Mining for Scientific and Engineering Applications. Springer, 591–605.

[36]

Radu Grosu, Scott A. Smolka, Flavio Corradini, Anita Wasilewska, Emilia Entcheva, and Ezio Bartocci. 2009. Learning and detecting emergent behavior in networks of cardiac myocytes. Communications of the ACM 52, 3 (2009), 97–105.

Digital Library

[37]

Hengyi Yang, Bardh Hoxha, and Georgios Fainekos. 2012. Querying parametric temporal logic properties on embedded systems. In Proceedings of the IFIP International Conference on Testing Software and Systems. Springer, 136–151.

[38]

Eugene Asarin, Alexandre Donzé, Oded Maler, and Dejan Nickovic. 2011. Parametric identification of temporal properties. In Proceedings of the International Conference on Runtime Verification. Springer, 147–160.

[39]

Shichao Zhang and Jiaye Li. 2021. Knn classification with one-step computation. IEEE Transactions on Knowledge and Data Engineering, IEEE.

[40]

Shichao Zhang, Jiaye Li, and Yangding Li. 2022. Reachable distance function for KNN classification. IEEE Transactions on Knowledge and Data Engineering 1, 1 (2022), 1–15.

Digital Library

[41]

Shichao Zhang, Xuelong Li, Ming Zong, Xiaofeng Zhu, and Ruili Wang. 2017. Efficient kNN classification with different numbers of nearest neighbors. IEEE Transactions on Neural Networks and Learning Systems 29, 5 (2017), 1774–1785.

[42]

Risto Miikkulainen, Jason Liang, Elliot Meyerson, Aditya Rawal, Daniel Fink, Olivier Francon, Bala Raju, Hormoz Shahrzad, Arshak Navruzyan, Nigel Duffy, and Babak Hodjat. 2019. Evolving deep neural networks. In Proceedings of the Artificial Intelligence in the Age of Neural Networks and Brain Computing. Elsevier, 293–312.

Cited By

Krichen MHarbaoui A(2024)Optimizing Traveler Behavior Between MADINA and JEDDA Using UPPAAL Stratego: A Stochastic Priced Timed Games ApproachMathematics10.3390/math1221342112:21(3421)Online publication date: 31-Oct-2024
https://doi.org/10.3390/math12213421
Husiev MRovenchak A(2024)A statistical approach to coronavirus classification based on nucleotide distributionsMathematical Modeling and Computing10.23939/mmc2024.04.98711:4(987-994)Online publication date: 2024
https://doi.org/10.23939/mmc2024.04.987
Abdalzaher MKrichen MYiltas-Kaplan DBen Dhaou IAdoni W(2023)Early Detection of Earthquakes Using IoT and Cloud Infrastructure: A SurveySustainability10.3390/su15151171315:15(11713)Online publication date: 28-Jul-2023
https://doi.org/10.3390/su151511713
Show More Cited By

Index Terms

A Novel Classification Technique based on Formal Methods
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning

Recommendations

A Formal Framework for ASTRAL Intralevel Proof Obligations

ASTRAL is a formal specification language for real-time systems. It is intended to support formal software development, and therefore has been formally defined. This paper focuses on how to formally prove the mathematical correctness of ASTRAL ...
A property based specification formalism classification

Specification formalisms may be classified through some common properties. Specification formalism classification may be used as a basis for the evaluation of the adequacy of formal specification languages within specific application domains. System ...
A formal requirements engineering method for specification, synthesis, and verification
SEE '97: Proceedings of the 8th International Conference on Software Engineering Environments (SEE '97)

This paper presents a formal requirements engineering method capturing specification, synthesis, and verification. Being multi-paradigm, our approach integrates individual established formal methods: temporal logics are used to express abstract ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 17, Issue 8

September 2023

348 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/3596449

Editor:
Charu Aggarwal
IBM T. J. Watson Research, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2023

Online AM: 14 April 2023

Accepted: 10 April 2023

Revised: 06 April 2023

Received: 12 July 2022

Published in TKDD Volume 17, Issue 8

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
362
Total Downloads

Downloads (Last 12 months)132
Downloads (Last 6 weeks)14

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Krichen MHarbaoui A(2024)Optimizing Traveler Behavior Between MADINA and JEDDA Using UPPAAL Stratego: A Stochastic Priced Timed Games ApproachMathematics10.3390/math1221342112:21(3421)Online publication date: 31-Oct-2024
https://doi.org/10.3390/math12213421
Husiev MRovenchak A(2024)A statistical approach to coronavirus classification based on nucleotide distributionsMathematical Modeling and Computing10.23939/mmc2024.04.98711:4(987-994)Online publication date: 2024
https://doi.org/10.23939/mmc2024.04.987
Abdalzaher MKrichen MYiltas-Kaplan DBen Dhaou IAdoni W(2023)Early Detection of Earthquakes Using IoT and Cloud Infrastructure: A SurveySustainability10.3390/su15151171315:15(11713)Online publication date: 28-Jul-2023
https://doi.org/10.3390/su151511713
Varriano GSorgente VMercaldo FSantone ABrunese L(2023)Computational cost of CT Radiomics workflow: a case study on COVID-192023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC57700.2023.00237(1539-1544)Online publication date: Jun-2023
https://doi.org/10.1109/COMPSAC57700.2023.00237

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Figures

Tables

Media

View full text|Download PDF

View Issue’s Table of Contents