Elsevier

Computer Communications

Volume 163, 1 November 2020, Pages 195-201
Computer Communications

Accurate mathematical modeling and solution of TCP congestion window size distribution

https://doi.org/10.1016/j.comcom.2020.09.010Get rights and content

Abstract

The congestion window size distribution is important in TCP as it reveals the statistical variation of the sending rate. It also quantifies the network congestion conditions. A semi-Markov model is used to develop an accurate mathematical framework for solving for TCP congestion window size. The new model improves on the accuracy of a widely accepted mathematical model in the literature for the congestion window size distribution. The correct statistics of a sum of weighted exponential random variables are identified and incorporated into the new mathematical model. Simulation results from both Matlab and NS2 demonstrate the accuracy of the new model. In one example, the new model predicts that 97.5% of congestion windows will have length less than 82 whereas the previous accepted model predicts 97.5% of the windows will have sizes less than 72, and hence the previous model can lead to buffer and caching requirement estimates that are too small when designing practical networks.

Introduction

The problem of finding the probability distribution of the window sizes in the transmission control protocol (TCP) is very important, as the performance of many internet applications depend strongly on the dynamics of TCP congestion window sizes [1], [2], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16]. To the best of the authors’ knowledge, the only solutions published in the past are the solution of Bohacek and Shah [1] and the solution of Yan and Plattner [2], and these published solutions have limitations.

The former solution [1] is based on differential equation models and necessarily makes a number of assumptions, namely 1) the authors used the differential equation and assumptions suggested in [5] and [6]; 2) a negative binomial distribution is taken as an approximate solution to the differential equations; 3) parameters in the solution are given certain values without further analytical justification. The latter solution [2] proposed a novel simplified model of TCP congestion window sizes, by considering only the additive-increase multiplicative-decrease (AIMD) phase of the TCP congestion window. In this way, a novel model which was amenable to a simple and closed-form solution for the cumulative distribution function (CDF) of the congestion window resulted. The resulting equation for the CDF is much simpler than the solution in [1]. Meanwhile, the simple analytical solution form derived in [2] rests on several simplifying assumptions made in the analysis, and in two instances, on approximating assumptions and these approximating assumptions are not identified in [2], but rather stated as accurate. Hence, the solution in [2] can be deemed an approximation. We will use some insights from Ref. [2] in the present paper to propose a better approximation for congestion window size distributions, so we expound further about [2] in the following paragraphs.

First, the analysis assumes that random variable Yi, which represents the total number of packets sent during the ith AIMD phase, is an exponential random variable, when, in fact, Yi is a geometric random variable; this occurs four lines above [2, eq. (2)]. Note that the former is a continuous random variable while the latter is a discrete random variable. While it is true that the distribution of a sum of geometric random variables is often well approximated by the distribution of a sum of exponential random variables, this simplifying assumption is neither stated nor identified, and the possible approximation discrepancy is not discussed.

Second, the analysis assumes that the distribution of a weighted sum of exponential random variables is a gamma distribution, whereas the gamma distribution is only correct as the distribution of an unweighted sum of exponential random variables having equal rate parameters. The assumption occurs in the text and analysis below [2, eq. (2)]. The correct distribution of a sum of weighted exponential random variables has two forms, one form for the case where no two rate parameters are equal, and another form for the case where some of the rate parameters are equal and some of the rate parameters are not equal [17]. The analysis in [2] requires only the first case. The distributions for both cases have recently been published in the mathematics literature in a 2008 paper [17], and seem at this point to only be known by a small number of statisticians. A formula for the distribution of a sum of weighted exponential random variables, without conditions on the rate parameters, was only first published in 2003 [18], just prior to the publication of [1] in 2004.

It is important to clarify the assumptions of the distribution derived in [2] for the readership, as well as important to clarify that a sum of weighted exponential random variables is not exactly gamma distributed. These results are being used in first generation citations and also in second generation citations. Ref. [4] cites this result in [2] directly. The result in [2] for the window size distribution was used by the authors in [4] to size a buffer in order to improve the quality of experience of streaming applications. This second generation paper which expressedly uses the distribution in [2], perhaps believing it to be an exact distribution, has been cited 35 times recently.

A compelling reason to correct the misbelief that the gamma distribution is the exact distribution of the window size, is that the response of the AIMD control law used in almost all contemporary transport protocols in use to converge to a fair and efficient bandwidth allocation can show strong dependence on this distribution. As stated in [19], “The AIMD rule will take a very long time to reach a good operating point on fast networks if the congestion window is started from a small size”. This motivates applying a statistically exact solution for the sum of weighted exponential random variables to the problem of determining the congestion window size distribution.

In this paper, we develop a new and improved mathematical model for the distribution of TCP congestion window size. A novel analytical solution for the CDF of the window size is derived without using the approximating assumption that a sum of weighted exponential random variables is gamma distributed, which was used in previous research [2]. A better approximation for the CDF of the congestion window size is derived, which despite its precision, still offers simplicity. The congestion control window size distribution analysis of our paper assumes long-lived TCP flows. Time-varying cross-traffic and short-lived connections are not considered in this paper. The results are compared to the results in [2] to assess the ranges of accuracy of the simple approximate solution in [2] for different values of packet loss probability. Section 2 presents the derivation of the improved accuracy CDF of the additive-increase, multiplicative-decrease phase of the TCP congestion window size. The distribution of the window size at the end of triple-duplicate periods (TDPs) is derived first in Section 2.1, and then this result is used to derive the distribution of all AIMD phase congestion window sizes in Section 2.2. Finally, the effects of timeouts are analyzed and included in the final expression for the window distribution in Section 2.3. In Section 3, some example results are given and the accuracy of the approximate CDF in [2] is compared to our theoretical CDF for different values of packet error loss, and different values of the number of packets successfully sent in a timeout TDP, as well as different ranges of the values of the weights. Some conclusions are drawn in Section 4.

The contributions of this paper are the following. A more rigorous mathematical model for analyzing the statistics of the multiplicative-decrease phase of the TCP congestion window size is developed. Some unidentified assumptions and approximations regarding the mathematics of solution for the CDF of congestion window size, that have appeared in the literature, are identified, clarified and corrected. The correct statistics of a sum of weighted exponential random variables are identified and elucidated, and incorporated into a mathematical framework for solving for congestion window size. The results are compared to a previous recent solution published in the literature.

Section snippets

Precise statistical analysis of the CDF of congestion window size

In [2], the sum of the packets sent during successive AIMD phases, which is denoted by Yi, is treated as a sum of weighted exponential random variables, and this is done without comment, although it is clear from the derivation of the Yi that they are weighted geometric random variables. There is no discussion of the validity of approximating sums of weighted geometric random variables by sums of weighted exponential random variables, and no discussion of the accuracy of such an approximation.

Example results

In this section, we present some examples to illustrate the convenience and accuracy of our new solution for the distribution of window size. We compare the accuracy of our solution to the accuracy of the approximation in Ref. [2]. To do so, we first reproduce the theoretical and simulation results from [2, Fig. 3] in Figs. 3, 4, and 5 together with results obtained from the precise analysis, results obtained from Matlab simulations, and results obtained from NS2 simulations. We use Matlab

Conclusion

In this paper, inroads into improved mathematical modeling of internet congestion phenomena were made, by finding rigorous mathematical modeling of the AIMD phase of TCP congestion window size. Whereas all previous analyses have used approximate probability distributions for the study of the AIMD phase, we have identified that the correct distributions for the problem exist in the recently published mathematics literature and imported them into internet congestion applications, particularly the

CRediT authorship contribution statement

Gan Luan: Conceptualization, Methodology, Writing - original draft, Visualization, Investigation, Validation, Writing - review & editing. Norman C. Beaulieu: Conceptualization, Methodology, Writing - original draft, Supervision, Validation, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work is supported by the China Ministry of Science and Technology (MOST), State Administration of Foreign Expert Affairs (SAFEA), China funding to Foreign Expert Professor Norman C. Beaulieu.

References (28)

  • BohacekS. et al.

    TCP throughput and timeout–steady state and time-varying dynamics

  • YanJ. et al.

    A simple solution to find the distribution of TCP window sizes

    IEEE Commun. Lett.

    (2013)
  • PadhysJ. et al.

    Modeling TCP reno performance: A simple model and its empirical validation

    IEEE/ACM Trans. Netw.

    (2000)
  • YanJ. et al.

    Analytical framework for improving the quality of streaming over TCP

    IEEE Trans. Multimedia

    (2012)
  • V. Misra, W. Gong, D. Towsley, Stochastic differential equation modeling and analysis of TCP-window-size behavior, in:...
  • BohacekS.

    A stochastic model of TCP and fair video transmission

  • KimM. et al.

    Modeling network coded TCP: Analysis of throughput and energy cost

    Mob. Netw. Appl.

    (2014)
  • PokhrelS.R. et al.

    TCP performance over Wi-Fi: Joint impact of buffer and channel losses

    IEEE Trans. Mob. Comput.

    (2016)
  • DumasV. et al.

    A Markovian analysis of additive-increase multiplicative-decrease algorithms

    Adv. Appl. Probab.

    (2002)
  • Al-SaadiR. et al.

    A survey of delay-based and hybrid TCP congestion control algorithms

    IEEE Commun. Surveys Tuts.

    (2019)
  • MishraA. et al.

    The great internet TCP congestion control census

    Proc. ACM Meas. Anal. Comput. Syst.

    (2019)
  • LinaJ. et al.

    Extensive evaluation on the performance and behaviour of TCP congestion control protocols under varied network scenarios

    Comput. Netw.

    (2019)
  • LiW. et al.

    SmartCC: A reinforcement learning approach for multipath TCP congestion control in heterogeneous networks

    IEEE J. Sel. Areas Commun.

    (2019)
  • XuZ. et al.

    Experience-driven congestion control: When multi-path TCP meets deep reinforcement learning

    IEEE J. Sel. Areas Commun.

    (2019)
  • View full text