skip to main content
10.1145/3274005.3274033acmotherconferencesArticle/Chapter ViewAbstractPublication PagescompsystechConference Proceedingsconference-collections
research-article

Communication-less Strategies for the Widening of Rule Induction

Published: 13 September 2018 Publication History

Abstract

In the age of Big Data and with the ever increasing availability of parallel compute resources there has been strong focus on research in parallel algorithms for data mining aiming to improve the efficiency of existing algorithms. We take a different view, instead of the usual focus on speed-up of the algorithm, we focus on investing parallel compute resources to improve the accuracy of models obtained by existing heuristics, without increasing the overall running time. We look for strategies to invest parallel compute resources in a smart way in order to improve the search space exploration, without the necessity of communication between the parallel workers. We demonstrate their effectiveness on the rule induction algorithm CN2.

References

[1]
Zaenal Akbar, Violeta N. Ivanova, and Michael R. Berthold. 2012. Parallel Data Mining Revisited. Better, Not Faster. In Proceedings of the 11th International Symposium on Intelligent Data Analysis(IDA 2012). 23--34.
[2]
Selim G. Akl. 2002. Parallel Real-Time Computation: Sometimes Quantity Means Quality. In Computing and Informatics. Vol. 21. 455--487.
[3]
K. Bache and M. Lichman. 2013. UCI Machine Learning Repository. http://archive.ics.uci.edu/ml
[4]
Peter Clark and Robin Boswell. 1991. Rule Induction with CN2: Some Recent Improvements. In Proceedings of the European Working Session on Machine Learning. Springer-Verlag, 151--163.
[5]
John Darlington, Yike Guo, Janjao Sutiwaraphun, and Hing Wing To. 1997. Parallel Induction Algorithms for Data Mining. In Advances in Intelligent Data Analysis Reasoning about Data, Vol. 1280. Springer Berlin Heidelberg, 437--445.
[6]
Alberto Fernández, Salvador García, Julián Luengo, Ester Bernadó-Mansilla, and Francisco Herrera. 2010. Genetics-based Machine Learning for Rule Induction: State of the Art, Taxonomy, and Comparative Study. Trans. Evolutionary Computation 14, 6 (2010), 913--941.
[7]
Jerome H Friedman and Bogdan E Popescu. 2008. Predictive learning via rule ensembles. The Annals of Applied Statistics 2 (2008), 916--954.
[8]
Violeta Ivanova and Michael R. Berthold. 2013. Diversity-Driven Widening. In Proceedings of the 12th International Symposium on Intelligent Data Analysis(IDA 2013).
[9]
Hillol Kargupta and Philip Chan. 2000. Advances in Distributed and Parallel Knowledge Discovery. AAAI/MIT Press.
[10]
Richard Kufrin. 1995. Decision Trees on Parallel Processors. In Parallel Processing for Artificial Intelligence 3. Elsevier Science. 279--306.
[11]
Vipin Kumar. 2001. Special Issue on High-performance Data Mining. Academic Press.
[12]
GiseleL. Pappa and AlexA. Freitas. 2010. Creating Rule Ensembles from Automatically-Evolved Rule Induction Algorithms. In Advances in Machine Learning I. Studies in Computational Intelligence, Vol. 262. Springer Berlin Heidelberg, 257--273.
[13]
John Shafer, Rakeeh Agrawal, and Manish Mehta. 1996. SPRINT: A Scalable Parallel Classifier for Data Mining. In Proceedings of the 22th International Conference on Very Large Data Bases. 544--555.
[14]
Anurag Srivastava, Eui-Hong Han, Vipin Kumar, and Vineet Singh. 1999. Parallel Formulations of Decision-Tree Classification Algorithms. DMKD 3, 3 (1999), 237--261.
[15]
Frederic Stahl and Max Bramer. 2013. Scaling up classification rule induction through parallel processing. The Knowledge Engineering Review 28 (2013), 451--478 Issue 04.
[16]
Domenico Talia. 2002. Parallelism in Knowledge Discovery Techniques. In Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing. 127--138.
[17]
Mohammed J. Zaki. 1999. Parallel and Distributed Association Mining: a Survey. Concurrency, IEEE 7, 4 (1999), 14--25.
[18]
Mohammed J. Zaki and Ching-Tien Ho. 2000. Large-Scale Parallel Data Mining. Springer.
[19]
Mohammed J. Zaki, Ching-Tien Ho, and Rakesh Agrawal. 1999. Parallel Classification for Data Mining on Shared-Memory Multiprocessors. In ICDE. 198--205.
[20]
Mohammed J. Zaki and Yi Pan. 2002. Introduction: Recent Developments in Parallel and Distributed Data Mining. Distributed and Parallel Databases 11, 2 (2002), 123--127.

Cited By

View all
  • (2020)Evaluating Machine Learning Approaches for Discovering Optimal Sets of Projection Operators for Quantum State Tomography of Qubit SystemsCybernetics and Information Technologies10.2478/cait-2020-006120:6(61-73)Online publication date: 31-Dec-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CompSysTech '18: Proceedings of the 19th International Conference on Computer Systems and Technologies
September 2018
206 pages
ISBN:9781450364256
DOI:10.1145/3274005
© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

  • ERSVB: EURORISC SYSTEMS - Varna, Bulgaria
  • FOSEUB: FEDERATION OF THE SCIENTIFIC ENGINEERING UNIONS - Bulgaria
  • UORB: University of Ruse, Bulgaria
  • TECHUVB: Technical University of Varna, Bulgaria

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2018

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CompSysTech'18

Acceptance Rates

Overall Acceptance Rate 241 of 492 submissions, 49%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Evaluating Machine Learning Approaches for Discovering Optimal Sets of Projection Operators for Quantum State Tomography of Qubit SystemsCybernetics and Information Technologies10.2478/cait-2020-006120:6(61-73)Online publication date: 31-Dec-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media