skip to main content
10.1145/3318216.3363331acmconferencesArticle/Chapter ViewAbstractPublication PagessecConference Proceedingsconference-collections
poster

Fast inference services for alternative deep learning structures

Published: 07 November 2019 Publication History

Abstract

AI inference services receive requests, classify data and respond quickly. These services underlie AI-driven Internet of Things, recommendation engines and video analytics. Neural networks are widely used because they provide accurate results and fast inference, but it is hard to explain their classifications. Tree-based deep learning models can provide accuracy and are innately explainable. However, it is hard to achieve high inference rates because branch misprediction and cache misses produce inefficient executions. My research seeks to produce low latency inference services based on tree-based models. I will exploit the emergence of large L3 caches to convert tree-based model inference from sequential branching toward fast, in-cache lookups. Our approach begins with fully trained, accurate tree-based models, compiles them for inference on target processors and executes inference efficiently. If successful, our approach will enable qualitative advances in AI services. Tree-based models can report the most significant features in a classification in a single pass. In contrast, neural networks require iterative approaches to explain their results. Consider interactive AI recommendation services where users seek to explicitly order their instantaneous preferences to attract preferred content. Tree-based models can provide user feedback much more quickly than neural networks. Tree-based models also have less prediction variance than neural networks. Given the same training data, neural networks require many inferences to quantify variances of borderline classifications. Fast tree-based inference can explain variance in seconds (versus minutes). Our approach shows that competing machine learning approaches can provide comparable accuracy but desire wholly different architectural and platform support.

References

[1]
M. De Choudhury, E. Kiciman, M. Dredze, G. Coppersmith, and M. Kumar. Discovering shifts to suicidal ideation from mental health content in social media. In Proceedings of the 2016 CHI conference on human factors in computing systems. ACM, 2016.
[2]
J. Ding, R. Cao, I. Saravanan, N. Morris, and C. Stewart. Characterizing service level objectives for cloud services: Realities and myths. In IEEE International Conference on Autonomic Computing, 2019.
[3]
Y. Hayashi. A neural expert system with automated extraction of fuzzy if-then rules and its application to medical diagnosis. In Advances in neural information processing systems, 1991.
[4]
C. Holmes, D. Mawhirter, Y. He, F. Yan, and B. Wu. Grnn: Low-latency and scalable rnn inference on gpus. In ACM EuroSys, 2019.
[5]
S. Hong, T. You, S. Kwak, and B. Han. Online tracking by learning discriminative saliency map with convolutional neural network. In International conference on machine learning, pages 597--606, 2015.
[6]
J. Kelley, C. Stewart, N. Morris, D. Tiwari, Y. He, and S. Elnikety. Measuring and managing answer quality for online data-intensive services. In ICAC, 2015.
[7]
J. Kelley, C. Stewart, N. Morris, D. Tiwari, Y. He, and S. Elnikety. Obtaining and managing answer quality for online data-intensive services. In ACM Transactions on Modeling and Performance Evaluation of Computing Systems, 2017.
[8]
E. Kiciman and M. Richardson. Towards decision support and goal achievement: Identifying action-outcome relationships from social media. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2015.
[9]
N. Morris, S. M. Renganathan, C. Stewart, R. Birke, and L. Chen. Sprint ability: How well does your software exploit bursts in processing capacity? In ICAC, 2016.
[10]
M. T. Ribeiro, S. Singh, and C. Guestrin. Why should i trust you?: Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135--1144. ACM, 2016.
[11]
C. Stewart, A. Chakrabarti, and R. Griffith. Zoolander: Efficiently meeting very strict, low-latency slos. In IEEE International Conference on Autonomic Computing, 2013.
[12]
R. Tanno, K. Arulkumaran, D. C. Alexander, A. Criminisi, and A. V. Nori. Adaptive neural trees. CoRR, abs/1807.06699, 2018.
[13]
C. Zhang, M. Yu, W. Wang, and F. Yan. Mark: Exploiting cloud services for cost-effective, slo-aware machine learning inference serving. In USENIX Annual Technical Conference, 2019.
[14]
M. Zhang, S. Rajbhandari, W. Wang, and Y. He. Deepcpu: Serving rnn-based deep learning models 10x faster. In USENIX Annual Technical Conference, 2018.
[15]
Z. Zhou and J. Feng. Deep forest: Towards an alternative to deep neural networks. CoRR, abs/1702.08835, 2017.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SEC '19: Proceedings of the 4th ACM/IEEE Symposium on Edge Computing
November 2019
455 pages
ISBN:9781450367332
DOI:10.1145/3318216
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

In-Cooperation

  • IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 November 2019

Check for updates

Qualifiers

  • Poster

Conference

SEC '19
Sponsor:
SEC '19: The Fourth ACM/IEEE Symposium on Edge Computing
November 7 - 9, 2019
Virginia, Arlington

Acceptance Rates

SEC '19 Paper Acceptance Rate 20 of 59 submissions, 34%;
Overall Acceptance Rate 40 of 100 submissions, 40%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 160
    Total Downloads
  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media