TCP Measurements with Kernel Delay
Introduction
In this document the results from TCP measurements are presented that
were executed at the Lambda between
SURFnet, Amsterdam, and
StarLight as part of the
Netherlight project. From
previous TCP bulk transfer tests it appears that especially
for single stream and few streams data the obtained throughput values were far
below the values that could be expected. On the other hand the results of the
corresponding UDP bulk transfer tests were as could be
expected.
A possible explanation for these remarkable differences between both traffic
tests could be burstiness character of TCP compared to the more shaped
behaviour of UDP. Especially in a Lambda topology with little available
network memory, it is likely that this lack of buffer capacity might course the
problems observed with TCP traffic.
To tests this assumption, TCP throughput tests were performed between
two Linux hosts, where in the kernel of the
sending host blocking delays were introduced. To observe the influences of the
network, the tests between Amsterdam and Chicago were performed over the Lambda
and over the regular Internet using SURFnet5.
Topology
The tests were performed between a sender host keeshond, located at the
NIKHEF, which possessed a kernel with the
possibility to introduce a blocking delay and a receiving host prusin,
located at the
EVL. In
the used topology has been given.
SARA
NIKHEF +---+ +---+ +---+ +---+ EVL
| 8 | | 1 | STS 12C | 1 | | 6 |
+----------+ | 0 |----| 5 |- ......... -| 5 |----| 5 | +--------+
| keeshond |----| 0 | | 4 | | 4 | | 0 |----| prusin |
+----------+ | 0 |----| 5 |- ......... -| 5 |----| 9 | +--------+
+---+ | 4 | | 4 | +---+
+---+ +---+
<- - - 2 Channel Trunk - - ->
|
| . |
|
Topology of the used connection scheme. Host
keeshond, located at the
NIKHEF, is connected
with the SSR 80000 in switch mode located at
SARA. Between the
SSR 8000 and the LSD 6509 router, located in
Chicago a two channel trunk has been configured. Both
ONS 15454's are their two channel
STS 12C inter-connection are transparent
for it. In Chicago host prusin at
EVL has
been connected to the LSD 6509. |
Setup
At the NIKHEF sending host, running Debian
Linux V. 3.0 using a V. 2.4.16-web100 kernel, a blocking delay of
variable duration has been introduced using the
udelay (sysctl_tcp_delay); call that appears in
/usr/src/linux/net/core/dev.c. The delay value can be set
dynamically via the /proc pseudo filesystem using the
/proc/sys/net/core/tcp_delay item. The maximum delay value that can
be set with this call is about 30 micros. The
EVL receiving host is running
Red Hat Linux V. 7.1 using a kernel V. 2.4.2-2 without further
modifications.
Between both hosts TCP throughput measurements were executed in the
direction keeshond => prusin as a function of the
delay (0 - 20 micros.), the TCP window size (1 -
20 Mbyte) and the # streams (#1 - #4). Because we were
mainly interested in the delay influence upon the startup behaviour of the
TCP streams, relatively short test times of 5 s. and 10 s.
were used. The performance tests were executed over the Lambda and also via
SURFnet5 over the regular Internet to check for the influences of the massive
buffer memory, available in the routing equipment of the global providers.
Lambda and SURFnet5 Tests
In this section the results are presented from the TCP throughput tests
that were executed over the Lambda or over the regular Internet using SURFnet5.
Single stream
Results
In the following figures TCP throughput data has been presented as a
function of the delay time in the direction keeshond =>
prusin. In each sub figure the results of four successive TCP
window sizes are represented by the corresponding plot traces. In the
these results are displayed for tests over the Lambda with a test time of
5 s, while in the
.
the results for the regular Internet are shown, with the same test time. Because
more fluctuations could be expected here, the displayed results are the best
from an identical series of five tests.
In the figures
again Lambda results are displayed but here with a test duration of 10 s.
| .I. |
|
TCP throughput as a function of the delay time for the
single stream keeshond =>
prusin over the Lambda. Plot traces are
representing the TCP window sizes in the range
1, ..., 4 Mbyte. The test duration was
5 s. |
| .II. |
|
TCP throughput as a function of the delay time for the
single stream keeshond =>
prusin over the Lambda. Plot traces are
representing the TCP window sizes in the range
5, ..., 8 Mbyte. The test duration was
5 s. |
| .III. |
|
TCP throughput as a function of the delay time for the
single stream keeshond =>
prusin over the Lambda. Plot traces are
representing the TCP window sizes in the range
9, ..., 12 Mbyte. The test duration was
5 s. |
| .IV. |
|
TCP throughput as a function of the delay time for the
single stream keeshond =>
prusin over the Lambda. Plot traces are
representing the TCP window sizes in the range
13, ..., 16 Mbyte. The test duration was
5 s. |
| .I. |
|
TCP throughput as a function of the delay time for the
single stream keeshond =>
prusin over the regular Internet. Plot traces
are representing the TCP window sizes in the range
1, ..., 4 Mbyte. The test duration was
5 s. The best results from a series of five are
displayed. |
| .II. |
|
TCP throughput as a function of the delay time for the
single stream keeshond =>
prusin over the regular Internet. Plot traces
are representing the TCP window sizes in the range
5, ..., 8 Mbyte. The test duration was
5 s. The best results from a series of five are
displayed. |
| .III. |
|
TCP throughput as a function of the delay time for the
single stream keeshond =>
prusin over the regular Internet. Plot traces
are representing the TCP window sizes in the range
9, ..., 12 Mbyte. The test duration was
5 s. The best results from a series of five are
displayed. |
| .IV. |
|
TCP throughput as a function of the delay time for the
single stream keeshond =>
prusin over the regular Internet. Plot traces
are representing the TCP window sizes in the range
12, ..., 16 Mbyte. The test duration was
5 s. The best results from a series of five are
displayed. |
| .I. |
|
TCP throughput as a function of the delay time for the
single stream keeshond =>
prusin over the Lambda. Plot traces are
representing the TCP window sizes in the range
1, ..., 4 Mbyte. The test duration was
10 s. |
| .II. |
|
TCP throughput as a function of the delay time for the
single stream keeshond =>
prusin over the Lambda. Plot traces are
representing the TCP window sizes in the range
5, ..., 8 Mbyte. The test duration was
10 s. |
| .III. |
|
TCP throughput as a function of the delay time for the
single stream keeshond =>
prusin over the Lambda. Plot traces are
representing the TCP window sizes in the range
9, ..., 12 Mbyte. The test duration was
10 s. |
| .IV. |
|
TCP throughput as a function of the delay time for the
single stream keeshond =>
prusin over the Lambda. Plot traces are
representing the TCP window sizes in the range
12, ..., 16 Mbyte. The test duration was
10 s. |
Conclusions
From
the following conclusions can be drawn:
-
The TCP throughput tests over the Lambda with a duration of
5 s.
()
a delay of about 5 - 6 micros. causes a considerable improvement
in the performance for window sizes from 4 to 12 Mbyte. Below this
range the window sizes self are the limiting factor given the round-trip
time of nearly 100 ms. Above this range the chance of packets lost
increases rapidly due to the larger amount of data send.
-
The performance decreases in
for delays larger than 5 micros. are probably caused by the blocking
behaviour of the delay mechanism.
-
Not much influences of the delay times could be found in the tests over the
regular Internet
(),
for the reason that there is probably already sufficient shaping and/or
network memory intrinsically available.
-
Also at the regular Internet connection performance decreases are found with
increasing delay times that is also here probably caused by the blocking
delay.
-
The tests executed at the regular Internet connections show for almost all
window sizes performance decreases for delays between 6 and
10 micros. The reason for this is unclear.
-
When the performance tests over the Lambda with a duration of 5 s.
()
are compared with those with a duration of 10
(),
it appears that the last tests are containing more performance dips,
probably because the longer test duration is enlarging the chance for that.
Dual streams
Results
Besides the results presented in the
"Single stream" subsection also
multiple stream TCP throughput measurements were executed. In this
subsection the results of two streams performance tests over the Lambda are
presented. No regular Internet results are shown here, because the
previous subsection showed small influences of
the delay times. In the
the dual streams results are displayed that were obtained with a measurement
time of 5 s.
| .I. |
|
Sum of the TCP throughput as a function of the delay
time for dual streams keeshond =>
prusin over the Lambda. Plot traces are
representing the sum of the TCP window sizes in the
range 1, ..., 4 Mbyte. The test duration was
5 s. |
| .II. |
|
Sum of the TCP throughput as a function of the delay
time for dual streams keeshond =>
prusin over the Lambda. Plot traces are
representing the sum of the TCP window sizes in the
range 5, ..., 8 Mbyte. The test duration was
5 s. |
| .III. |
|
Sum of the TCP throughput as a function of the delay
time for dual streams keeshond =>
prusin over the Lambda. Plot traces are
representing the sum of the TCP window sizes in the
range 9, ..., 12 Mbyte. The test duration was
5 s. |
| .IV. |
|
Sum of the TCP throughput as a function of the delay
time for dual streams keeshond =>
prusin over the Lambda. Plot traces are
representing the sum of the TCP window sizes in the
range 13, ..., 16 Mbyte. The test duration was
5 s. |
Conclusions
From the
the following conclusions can be drawn:
-
The effect of the introduced delay times is much less as in the
single stream tests. The reasons probably are
twofold:
-
With multiple streams there already is an intrinsic shaping effect. This
implies that the throughput with zero delay is already lager as in the
single stream case.
-
The maximum sum throughput values are here also lower as in the single
stream situation.
-
The performance decrease with increasing delay times is also found here, but
not so strong as in the single stream case.
General Conclusions
From the presented measurements the following general
conclusions can be drawn:
|