research-article

Communication-less Strategies for the Widening of Rule Induction

Author:

Violeta N. Ivanova-RohlingAuthors Info & Claims

CompSysTech '18: Proceedings of the 19th International Conference on Computer Systems and Technologies

Pages 33 - 37

https://doi.org/10.1145/3274005.3274033

Published: 13 September 2018 Publication History

Abstract

In the age of Big Data and with the ever increasing availability of parallel compute resources there has been strong focus on research in parallel algorithms for data mining aiming to improve the efficiency of existing algorithms. We take a different view, instead of the usual focus on speed-up of the algorithm, we focus on investing parallel compute resources to improve the accuracy of models obtained by existing heuristics, without increasing the overall running time. We look for strategies to invest parallel compute resources in a smart way in order to improve the search space exploration, without the necessity of communication between the parallel workers. We demonstrate their effectiveness on the rule induction algorithm CN2.

References

[1]

Zaenal Akbar, Violeta N. Ivanova, and Michael R. Berthold. 2012. Parallel Data Mining Revisited. Better, Not Faster. In Proceedings of the 11th International Symposium on Intelligent Data Analysis(IDA 2012). 23--34.

Digital Library

[2]

Selim G. Akl. 2002. Parallel Real-Time Computation: Sometimes Quantity Means Quality. In Computing and Informatics. Vol. 21. 455--487.

[3]

K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml

[4]

Peter Clark and Robin Boswell. 1991. Rule Induction with CN2: Some Recent Improvements. In Proceedings of the European Working Session on Machine Learning. Springer-Verlag, 151--163.

Digital Library

[5]

John Darlington, Yike Guo, Janjao Sutiwaraphun, and Hing Wing To. 1997. Parallel Induction Algorithms for Data Mining. In Advances in Intelligent Data Analysis Reasoning about Data, Vol. 1280. Springer Berlin Heidelberg, 437--445.

Digital Library

[6]

Alberto Fernández, Salvador García, Julián Luengo, Ester Bernadó-Mansilla, and Francisco Herrera. 2010. Genetics-based Machine Learning for Rule Induction: State of the Art, Taxonomy, and Comparative Study. Trans. Evolutionary Computation 14, 6 (2010), 913--941.

Digital Library

[7]

Jerome H Friedman and Bogdan E Popescu. 2008. Predictive learning via rule ensembles. The Annals of Applied Statistics 2 (2008), 916--954.

[8]

Violeta Ivanova and Michael R. Berthold. 2013. Diversity-Driven Widening. In Proceedings of the 12th International Symposium on Intelligent Data Analysis(IDA 2013).

Digital Library

[9]

Hillol Kargupta and Philip Chan. 2000. Advances in Distributed and Parallel Knowledge Discovery. AAAI/MIT Press.

Digital Library

[10]

Richard Kufrin. 1995. Decision Trees on Parallel Processors. In Parallel Processing for Artificial Intelligence 3. Elsevier Science. 279--306.

[11]

Vipin Kumar. 2001. Special Issue on High-performance Data Mining. Academic Press.

[12]

GiseleL. Pappa and AlexA. Freitas. 2010. Creating Rule Ensembles from Automatically-Evolved Rule Induction Algorithms. In Advances in Machine Learning I. Studies in Computational Intelligence, Vol. 262. Springer Berlin Heidelberg, 257--273.

[13]

John Shafer, Rakeeh Agrawal, and Manish Mehta. 1996. SPRINT: A Scalable Parallel Classifier for Data Mining. In Proceedings of the 22th International Conference on Very Large Data Bases. 544--555.

Digital Library

[14]

Anurag Srivastava, Eui-Hong Han, Vipin Kumar, and Vineet Singh. 1999. Parallel Formulations of Decision-Tree Classification Algorithms. DMKD 3, 3 (1999), 237--261.

Digital Library

[15]

Frederic Stahl and Max Bramer. 2013. Scaling up classification rule induction through parallel processing. The Knowledge Engineering Review 28 (2013), 451--478 Issue 04.

[16]

Domenico Talia. 2002. Parallelism in Knowledge Discovery Techniques. In Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing. 127--138.

Digital Library

[17]

Mohammed J. Zaki. 1999. Parallel and Distributed Association Mining: a Survey. Concurrency, IEEE 7, 4 (1999), 14--25.

Digital Library

[18]

Mohammed J. Zaki and Ching-Tien Ho. 2000. Large-Scale Parallel Data Mining. Springer.

[19]

Mohammed J. Zaki, Ching-Tien Ho, and Rakesh Agrawal. 1999. Parallel Classification for Data Mining on Shared-Memory Multiprocessors. In ICDE. 198--205.

Digital Library

[20]

Mohammed J. Zaki and Yi Pan. 2002. Introduction: Recent Developments in Parallel and Distributed Data Mining. Distributed and Parallel Databases 11, 2 (2002), 123--127.

Digital Library

Cited By

Ivanova-Rohling VRohling N(2020)Evaluating Machine Learning Approaches for Discovering Optimal Sets of Projection Operators for Quantum State Tomography of Qubit SystemsCybernetics and Information Technologies10.2478/cait-2020-006120:6(61-73)Online publication date: 31-Dec-2020
https://doi.org/10.2478/cait-2020-0061

Recommendations

Three strategies to rule induction from data with numerical attributes
Transactions on Rough Sets II

Rule induction from data with numerical attributes must be accompanied by discretization. Our main objective was to compare two discretization techniques, both based on cluster analysis, with a new rule induction algorithm called MLEM2, in which ...
Open rule induction
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

Rules have a number of desirable properties. It is easy to understand, infer new knowledge, and communicate with other inference systems. One weakness of the previous rule induction systems is that they only find rules within a knowledge base (KB) and ...
Instance Guided Rule Induction
DS '98: Proceedings of the First International Conference on Discovery Science

This paper proposes a new supervised induction algorithm, IGR, that uses each training instances as a guide of rule induction. IGR learns a set of if-then rules by inducing a pseudo-optimun classification rule for each training instance. IGR weighs the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CompSysTech '18: Proceedings of the 19th International Conference on Computer Systems and Technologies

September 2018

206 pages

ISBN:9781450364256

DOI:10.1145/3274005

Copyright © 2018 ACM.

© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

ERSVB: EURORISC SYSTEMS - Varna, Bulgaria
FOSEUB: FEDERATION OF THE SCIENTIFIC ENGINEERING UNIONS - Bulgaria
UORB: University of Ruse, Bulgaria
TECHUVB: Technical University of Varna, Bulgaria

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

CompSysTech'18

CompSysTech'18: 19th International Conference on Computer Systems and Technologies

September 13 - 14, 2018

Ruse, Bulgaria

Acceptance Rates

Overall Acceptance Rate 241 of 492 submissions, 49%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
19
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ivanova-Rohling VRohling N(2020)Evaluating Machine Learning Approaches for Discovering Optimal Sets of Projection Operators for Quantum State Tomography of Qubit SystemsCybernetics and Information Technologies10.2478/cait-2020-006120:6(61-73)Online publication date: 31-Dec-2020
https://doi.org/10.2478/cait-2020-0061

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten