skip to main content
10.1145/2983323.2983327acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
demonstration

Ease the Process of Machine Learning with Dataflow

Published: 24 October 2016 Publication History

Abstract

Machine learning algorithms have become the key components in many big data applications. However, the full potential of machine learning is still far from been realized because using machine learning algorithms is hard, especially on distributed platforms such as Hadoop and Spark. The key barriers come from not only the implementation of the algorithms themselves, but also the processing for applying them to real applications which often involve multiple steps and different algorithms. In this demo we present a general-purpose dataflow-based system for easing the process of applying machine learning algorithms to real world tasks. In the system, a learning task is formulated as a directed acyclic graph (DAG) in which each node represents an operation (e.g., a machine learning algorithm), and each edge represents the flow of the data from one node to its descendants. Graphical user interface is implemented for making users to create, configure, submit, and monitor a task in a drag-and-drop manner. Advantages of the system include 1) lowering the barriers of defining and executing machine learning tasks; 2) sharing and re-using the implementations of the algorithms, the task dataflow DAGs, and the (intermediate) experimental results; 3) seamlessly integrating the stand-alone algorithms as well as the distributed algorithms in one task. The system has been deployed as a machine learning service and can be access from the Internet.

References

[1]
http://gethue.com/.
[2]
https://azkaban.github.io/.
[3]
https://mahout.apache.org/.
[4]
http://spark.apache.org/mllib/.
[5]
R. Barga, V. Fontama, and W. H. Tok. Predictive Analytics with Microsoft Azure Machine Learning: Build and Deploy Actionable Solutions in Minutes, chapter Introducing Microsoft Azure Machine Learning, pages 21--42. Apress, Berkeley, CA, 2014.
[6]
J. Dean and S. Ghemawat. Mapreduce: simplified data processing on large clusters. Operating Systems Design and Implementation, 2004.
[7]
M. Islam, A. K. Huang, M. Battisha, M. Chiang, S. Srinivasan, C. Peters, A. Neumann, and A. Abdelnur. Oozie: Towards a scalable workflow management system for hadoop. In In SIGMOD Workshop on SWEET, SWEET '12.
[8]
M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica. Spark: Cluster computing with working sets. In In 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud'10, 2010.

Cited By

View all
  • (2022)Classification Algorithms and Dataflow ImplementationImplementation of Machine Learning Algorithms Using Control-Flow and Dataflow Paradigms10.4018/978-1-7998-8350-0.ch003(46-77)Online publication date: 11-Mar-2022
  • (2018)Distributed Big Data Mining Platform for Smart Grid2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622163(2345-2354)Online publication date: Dec-2018
  • (2017)An Online-Offline Combined Big Data Mining Platform2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)10.1109/DASC-PICom-DataCom-CyberSciTec.2017.195(1220-1225)Online publication date: Nov-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '16: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management
October 2016
2566 pages
ISBN:9781450340731
DOI:10.1145/2983323
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 October 2016

Check for updates

Author Tags

  1. dataflow
  2. directed acyclic graph
  3. machine learning process

Qualifiers

  • Demonstration

Funding Sources

  • National Natural Science Foundation of China
  • Youth Innovation Promotion Association CAS
  • 863 Program of China
  • Key Research Program of the Chinese Academy of Sciences
  • 973 Program of China

Conference

CIKM'16
Sponsor:
CIKM'16: ACM Conference on Information and Knowledge Management
October 24 - 28, 2016
Indiana, Indianapolis, USA

Acceptance Rates

CIKM '16 Paper Acceptance Rate 160 of 701 submissions, 23%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)0
Reflects downloads up to 27 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)Classification Algorithms and Dataflow ImplementationImplementation of Machine Learning Algorithms Using Control-Flow and Dataflow Paradigms10.4018/978-1-7998-8350-0.ch003(46-77)Online publication date: 11-Mar-2022
  • (2018)Distributed Big Data Mining Platform for Smart Grid2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8622163(2345-2354)Online publication date: Dec-2018
  • (2017)An Online-Offline Combined Big Data Mining Platform2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech)10.1109/DASC-PICom-DataCom-CyberSciTec.2017.195(1220-1225)Online publication date: Nov-2017

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media