Skip to main content

Asymmetric Coded Distributed Computation for Resilient Prediction Serving Systems

  • Conference paper
  • First Online:
Euro-Par 2024: Parallel Processing (Euro-Par 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14802))

Included in the following conference series:

  • 979 Accesses

Abstract

With the surge of AI services, prediction serving systems (PSSes) have been widely deployed. PSSes are often run on many workers and thus are prone to stragglers (slowdowns or failures), so it is critical to design straggler-resilient PSSes for low latency of prediction. A traditional way is replication that assigns the same prediction job to multiple workers, while incurring significant resources overheads due to its replicated redundant jobs. Recently, coded distributed computation (CDC) has become a more resource-efficient way than replication, as it encodes the prediction job into parity units for prediction reconstruction via decoding. However, we find that state-of-the-art CDC methods either trade accuracy for low latency with the encoder and decoder both simple, or trade latency for high accuracy with the encoder and decoder both complicated, leading to unbalance between accuracy and latency due to the above symmetry between the encoder and decoder.

Our insight is that the encoder is always used in CDC-based prediction, while the decoder is only used when stragglers occur. In this paper, we first propose a new asymmetric CDC framework based on the insight, called AsymCDC, composed of a simple encoder but a complicated decoder, such that the encoder’s simplicity makes a low encoding time that reduces the latency largely, while the decoder’s complexity can be helpful for accuracy. Further, we design the decoder’s complexity in two steps: i) an exact decoding method that leverages an invertible neural network’s (INN) invertibility to make the decoding have no accuracy loss, and ii) a decoder compacting method that reshapes INN outputs to utilize knowledge distillation effectively that compacts the decoder for low decoding time. We prototype AsymCDC atop Clipper and experiments show that the prediction accuracy of AsymCDC is approximately the same as state-of-the-arts with the encoder and decoder both complicated, while the latency of AsymCDC only exceeds that of state-of-the-arts with the encoder and decoder both simple by no more than \(2.6\%\).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alibaba cloud (2024). https://www.aliyun.com/

  2. Behrmann, J., Grathwohl, W., Chen, R.T., Duvenaud, D., Jacobsen, J.H.: Invertible residual networks. In: Proceedings of ICML (2019)

    Google Scholar 

  3. Crankshaw, D., Wang, X., Zhou, G., Franklin, M.J., Gonzalez, J.E., Stoica, I.: Clipper: a low-latency online prediction serving system. In: Proceedings of USENIX NSDI (2017)

    Google Scholar 

  4. Dean, J., Barroso, L.A.: The tail at scale. Commun. ACM 56(2), 74–80 (2013)

    Article  Google Scholar 

  5. Finzi, M., Izmailov, P., Maddox, W., Kirichenko, P., Wilson, A.G.: Invertible convolutional networks. In: Proceedings of ICML (2019)

    Google Scholar 

  6. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)

    Google Scholar 

  7. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  8. Huang, C., et al.: Erasure coding in windows azure storage. In: Proceedings of USENIX ATC (2012)

    Google Scholar 

  9. Huang, K.H., Abraham, J.A.: Algorithm-based fault tolerance for matrix operations. IEEE Trans. Comput. 100(6), 518–528 (1984)

    Article  Google Scholar 

  10. Jacobsen, J.H., Smeulders, A.W., Oyallon, E.: i-RevNet: deep invertible networks. In: Proceedings of ICLR (2018)

    Google Scholar 

  11. Jahani-Nezhad, T., Maddah-Ali, M.A.: Berrut approximated coded computing: straggler resistance beyond polynomial computing. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 111–122 (2022)

    Google Scholar 

  12. Kosaian, J., Rashmi, K., Venkataraman, S.: Parity models: erasure-coded resilience for prediction serving systems. In: Proceedings of ACM SOSP (2019)

    Google Scholar 

  13. Kosaian, J., Rashmi, K., Venkataraman, S.: Learning-based coded computation. IEEE J. Sel. Areas Inf. Theory (2020)

    Google Scholar 

  14. Lee, K., Lam, M., Pedarsani, R., Papailiopoulos, D., Ramchandran, K.: Speeding up distributed machine learning using codes. IEEE Trans. Inf. Theory 64(3), 1514–1529 (2017)

    Article  MathSciNet  Google Scholar 

  15. Phan, T.-D., Ibrahim, S., Zhou, A.C., Aupy, G., Antoniu, G.: Energy-driven straggler mitigation in MapReduce. In: Rivera, F.F., Pena, T.F., Cabaleiro, J.C. (eds.) Euro-Par 2017. LNCS, vol. 10417, pp. 385–398. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-64203-1_28

    Chapter  Google Scholar 

  16. Radev, S.T., Mertens, U.K., Voss, A., Ardizzone, L., Köthe, U.: Bayesflow: learning complex stochastic models with invertible neural networks. IEEE Trans. Neural Netw. Learn. Syst. 33(4), 1452–1466 (2020)

    Google Scholar 

  17. Rashmi, K.V., Shah, N.B., Gu, D., Kuang, H., Borthakur, D., Ramchandran, K.: A solution to the network challenges of data recovery in erasure-coded distributed storage systems: a study on the Facebook warehouse cluster. In: USENIX Workshop on HotStorage (2013)

    Google Scholar 

  18. Reed, I.S., Solomon, G.: Polynomial codes over certain finite fields. J. Soc. Ind. Appl. Math. 8(2), 300–304 (1960)

    Article  MathSciNet  Google Scholar 

  19. Ren, K., Kwon, Y., Balazinska, M., Howe, B.: Hadoop’s adolescence: an analysis of hadoop usage in scientific workloads. Proc. VLDB Endow. 6(10), 853–864 (2013)

    Article  Google Scholar 

  20. Rizzo, L.: Effective erasure codes for reliable computer communication protocols. ACM SIGCOMM Comput. Commun. Rev. 27(2), 24–36 (1997)

    Article  Google Scholar 

  21. Soleymani, M., Ali, R.E., Mahdavifar, H., Avestimehr, A.S.: ApproxIFER: a model-agnostic approach to resilient and robust prediction serving systems. In: Proceedings of AAAI (2022)

    Google Scholar 

Download references

Acknowledgments.

This work was supported by the Development Program of China for Young Scholars (No. 2021YFB0301400), and Key Laboratory of Information Storage System Ministry of Education of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuchong Hu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, L., Hu, Y., Liu, Y., Xiao, R., Feng, D. (2024). Asymmetric Coded Distributed Computation for Resilient Prediction Serving Systems. In: Carretero, J., Shende, S., Garcia-Blas, J., Brandic, I., Olcoz, K., Schreiber, M. (eds) Euro-Par 2024: Parallel Processing. Euro-Par 2024. Lecture Notes in Computer Science, vol 14802. Springer, Cham. https://doi.org/10.1007/978-3-031-69766-1_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-69766-1_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-69765-4

  • Online ISBN: 978-3-031-69766-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics