Bridging the Gap Between Research and Production with CODE

Jin, Yiping; Wanvarie, Dittaya; Le, Phu T. V.

doi:10.1007/978-3-030-16142-2_22

Yiping Jin¹⁹,
Dittaya Wanvarie¹⁹ &
Phu T. V. Le²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11441))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2073 Accesses

Abstract

Despite the ever-increasing enthusiasm from the industry, artificial intelligence or machine learning is a much-hyped area where the results tend to be exaggerated or misunderstood. Many novel models proposed in research papers never end up being deployed to production. The goal of this paper is to highlight four important aspects which are often neglected in real-world machine learning projects, namely Communication, Objectives, Deliverables, Evaluations (CODE). By carefully considering these aspects, we can avoid common pitfalls and carry out a smoother technology transfer to real-world applications. We draw from a priori experiences and mistakes while building a real-world online advertising platform powered by machine learning technology, aiming to provide general guidelines for translating ML research results to successful industry projects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Machine Learning Optimization in Computational Advertising—A Systematic Literature Review

Soul and machine (learning)

Article 27 August 2020

How Can No/Low Code Platforms Help End-Users Develop ML Applications? - A Systematic Review

Notes

1.
https://cloud.google.com/translate/.
2.
https://github.com/optimaize/language-detector.
3.
Adding more languages will actually inflate the average accuracy because most other languages can be easily identified by looking at the character alone and have an accuracy close to 1 (e.g. Chinese, Korean).
4.
https://vwo.com/ab-testing/.

References

Bagherjeiran, A., Tang, R., Zhang, Z., Hatch, A., Ratnaparkhi, A., Parekh, R.: Adaptive targeting for finding look-alike users. US Patent 9,087,332, 21 July 2015
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Barker, J., Watanabe, S., Vincent, E., Trmal, J.: The fifth ‘CHiME’ speech separation and recognition challenge: dataset, task and baselines. arXiv preprint arXiv:1803.10609 (2018)
Boyko, A., Harchaoui, Z., Nedelec, T., Perchet, V.: A protocol to reduce bias and variance in head-to-head tests. Criteo Internal Report (2015)
Google Scholar
Brooks, F.P.: The mythical man-month. Datamation 20(12), 44–52 (1974)
Google Scholar
Enam, S.Z.: Why is machine learning ‘hard’? (2016). http://ai.stanford.edu/~zayd/why-is-machine-learning-hard.html. Accessed 10 Sept 2018
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning, vol. 1. MIT press, Cambridge (2016)
MATH Google Scholar
Hermann, J., Del Balso, M.: Scaling machine learning at uber with michelangelo (2018). https://eng.uber.com/scaling-michelangelo/
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Jin, Y., Wanvarie, D., Le, P.: Combining lightly-supervised text classification models for accurate contextual advertising. In: Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), vol. 1, pp. 545–554 (2017)
Google Scholar
Juan, Y., Lefortier, D., Chapelle, O.: Field-aware factorization machines in a real-world online advertising system. In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 680–688. International World Wide Web Conferences Steering Committee (2017)
Google Scholar
Modi, A.N., et al.: TFX: a tensorflow-based production-scale machine learning platform. In: KDD 2017 (2017)
Google Scholar
Ng, A.: AI transformation playbook: how to lead your company into the AI era (2018). https://landing.ai/ai-transformation-playbook/
Pappas, N., Popescu-Belis, A.: Multilingual hierarchical attention networks for document classification. arXiv preprint arXiv:1707.00896 (2017)
Perlich, C., Dalessandro, B., Hook, R., Stitelman, O., Raeder, T., Provost, F.: Bid optimizing and inventory scoring in targeted online advertising. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 804–812. ACM (2012)
Google Scholar
Peterson, L.E.: K-nearest neighbor. Scholarpedia 4(2), 1883 (2009)
Article Google Scholar
Pfister, R., Janczyk, M.: Confidence intervals for two sample means: calculation, interpretation, and a few simple rules. Adv. Cogn. Psychol. 9(2), 74 (2013)
Article Google Scholar
Polyzotis, N., Roy, S., Whang, S.E., Zinkevich, M.: Data management challenges in production machine learning. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 1723–1726. ACM (2017)
Google Scholar
Qu, Y., et al.: Product-based neural networks for user response prediction. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1149–1154. IEEE (2016)
Google Scholar
Raeder, T., Stitelman, O., Dalessandro, B., Perlich, C., Provost, F.: Design principles of massive, robust prediction systems. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1357–1365. ACM (2012)
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. (IJCV) 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Article MathSciNet Google Scholar
Sculley, D., Phillips, T., Ebner, D., Chaudhary, V., Young, M.: Machine learning: the high-interest credit card of technical debt (2014)
Google Scholar
Shearer, C.: The CRISP-DM model: the new blueprint for data mining. J. Data Warehous. 5(4), 13–22 (2000)
Google Scholar
Shi, L., Mihalcea, R., Tian, M.: Cross language text classification by model translation and semi-supervised learning. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pp. 1057–1067. Association for Computational Linguistics (2010)
Google Scholar
Sra, S., Nowozin, S., Wright, S.J.: Optimization for Machine Learning. MIT Press, Cambridge (2012)
Google Scholar
Thomas, R.: What do machine learning practitioners actually do? (2018). http://www.fast.ai/2018/07/12/auto-ml-1/. Accessed 10 Sept 2018
Yuan, Y., Wang, F., Li, J., Qin, R.: A survey on real time bidding advertising. In: 2014 IEEE International Conference on Service Operations and Logistics, and Informatics (SOLI), pp. 418–423. IEEE (2014)
Google Scholar

Download references

Acknowledgement

The first author is supported the scholarship from “The 100$^{th}$ Anniversary Chulalongkorn University Fund for Doctoral Scholarship” and also “The 90$^{th}$ Anniversary Chulalongkorn University Fund (Ratchadaphiseksomphot Endowment Fund)”. We would like to thank Assoc. Prof. Peraphon Sophatsathit and the anonymous reviewers for their careful reading and their insightful suggestions.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, Chulalongkorn University, Bangkok, 10300, Thailand
Yiping Jin & Dittaya Wanvarie
Knorex Pte. Ltd., 8 Cross St, Singapore, 048424, Singapore
Phu T. V. Le

Authors

Yiping Jin
View author publications
You can also search for this author in PubMed Google Scholar
Dittaya Wanvarie
View author publications
You can also search for this author in PubMed Google Scholar
Phu T. V. Le
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yiping Jin .

Editor information

Editors and Affiliations

Hong Kong University of Science and Technology, Hong Kong, China
Qiang Yang
Nanjing University, Nanjing, China
Zhi-Hua Zhou
University of Macau, Taipa, Macau, China
Zhiguo Gong
Southeast University, Nanjing, China
Min-Ling Zhang
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Sheng-Jun Huang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jin, Y., Wanvarie, D., Le, P.T.V. (2019). Bridging the Gap Between Research and Production with CODE. In: Yang, Q., Zhou, ZH., Gong, Z., Zhang, ML., Huang, SJ. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2019. Lecture Notes in Computer Science(), vol 11441. Springer, Cham. https://doi.org/10.1007/978-3-030-16142-2_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-16142-2_22
Published: 20 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16141-5
Online ISBN: 978-3-030-16142-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics