DataTAG logo
Workpackage 3
Bulk data transfer validations and application performance monitoring
 Work Packages
WP1
WP2
WP3
WP4
WP5
WP6
PTB
PMB
DataTag Work package 3 | DataTAG task 2.1 | TCP Measurements with Kernel Delay

TCP Measurements with Kernel Delay

Introduction

In this document the results from TCP measurements are presented that were executed at the Lambda between SURFnet, Amsterdam, and StarLight as part of the Netherlight project. From previous TCP bulk transfer tests it appears that especially for single stream and few streams data the obtained throughput values were far below the values that could be expected. On the other hand the results of the corresponding UDP bulk transfer tests were as could be expected.

A possible explanation for these remarkable differences between both traffic tests could be burstiness character of TCP compared to the more shaped behaviour of UDP. Especially in a Lambda topology with little available network memory, it is likely that this lack of buffer capacity might course the problems observed with TCP traffic.

To tests this assumption, TCP throughput tests were performed between two Linux hosts, where in the kernel of the sending host blocking delays were introduced. To observe the influences of the network, the tests between Amsterdam and Chicago were performed over the Lambda and over the regular Internet using SURFnet5.

Topology

The tests were performed between a sender host keeshond, located at the NIKHEF, which possessed a kernel with the possibility to introduce a blocking delay and a receiving host prusin, located at the EVL. In the used topology has been given.

                     SARA

   NIKHEF       +---+    +---+             +---+    +---+       EVL
                | 8 |    | 1 |   STS 12C   | 1 |    | 6 |
+----------+    | 0 |----| 5 |- ......... -| 5 |----| 5 |    +--------+
| keeshond |----| 0 |    | 4 |             | 4 |    | 0 |----| prusin |
+----------+    | 0 |----| 5 |- ......... -| 5 |----| 9 |    +--------+
                +---+    | 4 |             | 4 |    +---+
		         +---+             +---+

                     <- - -  2 Channel Trunk  - - ->
.    Topology of the used connection scheme. Host keeshond, located at the NIKHEF, is connected with the SSR 80000 in switch mode located at SARA. Between the SSR 8000 and the LSD 6509 router, located in Chicago a two channel trunk has been configured. Both ONS 15454's are their two channel STS 12C inter-connection are transparent for it. In Chicago host prusin at EVL has been connected to the LSD 6509.

Setup

At the NIKHEF sending host, running Debian Linux V. 3.0 using a V. 2.4.16-web100 kernel, a blocking delay of variable duration has been introduced using the udelay (sysctl_tcp_delay); call that appears in /usr/src/linux/net/core/dev.c. The delay value can be set dynamically via the /proc pseudo filesystem using the /proc/sys/net/core/tcp_delay item. The maximum delay value that can be set with this call is about 30 micros. The EVL receiving host is running Red Hat Linux V. 7.1 using a kernel V. 2.4.2-2 without further modifications.

Between both hosts TCP throughput measurements were executed in the direction keeshond => prusin as a function of the delay (0 - 20 micros.), the TCP window size (1 - 20 Mbyte) and the #  streams (#1 - #4). Because we were mainly interested in the delay influence upon the startup behaviour of the TCP streams, relatively short test times of 5 s. and 10 s. were used. The performance tests were executed over the Lambda and also via SURFnet5 over the regular Internet to check for the influences of the massive buffer memory, available in the routing equipment of the global providers.

Lambda and SURFnet5 Tests

In this section the results are presented from the TCP throughput tests that were executed over the Lambda or over the regular Internet using SURFnet5.

Single stream

Results

In the following figures TCP throughput data has been presented as a function of the delay time in the direction keeshond => prusin. In each sub figure the results of four successive TCP window sizes are represented by the corresponding plot traces. In the these results are displayed for tests over the Lambda with a test time of 5 s, while in the . the results for the regular Internet are shown, with the same test time. Because more fluctuations could be expected here, the displayed results are the best from an identical series of five tests. In the figures again Lambda results are displayed but here with a test duration of 10 s.

Tput Lambda keeshond => prusin; 1 str.; t: 5s.; win: 1-4MB
.I.    TCP throughput as a function of the delay time for the single stream keeshond => prusin over the Lambda. Plot traces are representing the TCP window sizes in the range 1, ..., 4 Mbyte. The test duration was 5 s.

Tput Lambda keeshond => prusin; 1 str.; t: 5s.; win: 5-8MB
.II.    TCP throughput as a function of the delay time for the single stream keeshond => prusin over the Lambda. Plot traces are representing the TCP window sizes in the range 5, ..., 8 Mbyte. The test duration was 5 s.

Tput Lambda keeshond => prusin; 1 str.; t: 5s.; win: 9-12MB
.III.    TCP throughput as a function of the delay time for the single stream keeshond => prusin over the Lambda. Plot traces are representing the TCP window sizes in the range 9, ..., 12 Mbyte. The test duration was 5 s.

Tput Lambda keeshond => prusin; 1 str.; t: 5s.; win: 13-16MB
.IV.    TCP throughput as a function of the delay time for the single stream keeshond => prusin over the Lambda. Plot traces are representing the TCP window sizes in the range 13, ..., 16 Mbyte. The test duration was 5 s.

Tput Inet keeshond => prusin; 1 str.; t: 5s.; win: 1-4MB
.I.    TCP throughput as a function of the delay time for the single stream keeshond => prusin over the regular Internet. Plot traces are representing the TCP window sizes in the range 1, ..., 4 Mbyte. The test duration was 5 s. The best results from a series of five are displayed.

Tput Inet keeshond => prusin; 1 str.; t: 5s.; win: 5-8MB
.II.    TCP throughput as a function of the delay time for the single stream keeshond => prusin over the regular Internet. Plot traces are representing the TCP window sizes in the range 5, ..., 8 Mbyte. The test duration was 5 s. The best results from a series of five are displayed.

Tput Inet keeshond => prusin; 1 str.; t: 5s.; win: 9-12MB
.III.    TCP throughput as a function of the delay time for the single stream keeshond => prusin over the regular Internet. Plot traces are representing the TCP window sizes in the range 9, ..., 12 Mbyte. The test duration was 5 s. The best results from a series of five are displayed.

Tput Inet keeshond => prusin; 1 str.; t: 5s.; win: 12-16MB
.IV.    TCP throughput as a function of the delay time for the single stream keeshond => prusin over the regular Internet. Plot traces are representing the TCP window sizes in the range 12, ..., 16 Mbyte. The test duration was 5 s. The best results from a series of five are displayed.

Tput Lambda keeshond => prusin; 1 str.; t: 10s.; w: 1-4MB
.I.    TCP throughput as a function of the delay time for the single stream keeshond => prusin over the Lambda. Plot traces are representing the TCP window sizes in the range 1, ..., 4 Mbyte. The test duration was 10 s.

Tput Lambda keeshond => prusin; 1 str.; t: 10s.; w: 5-8MB
.II.    TCP throughput as a function of the delay time for the single stream keeshond => prusin over the Lambda. Plot traces are representing the TCP window sizes in the range 5, ..., 8 Mbyte. The test duration was 10 s.

Tput Lambda keeshond => prusin; 1 str.; t: 10s.; w: 9-12MB
.III.    TCP throughput as a function of the delay time for the single stream keeshond => prusin over the Lambda. Plot traces are representing the TCP window sizes in the range 9, ..., 12 Mbyte. The test duration was 10 s.

Tput Lambda keeshond => prusin; 1 str.; t: 10s.; w: 13-16MB
.IV.    TCP throughput as a function of the delay time for the single stream keeshond => prusin over the Lambda. Plot traces are representing the TCP window sizes in the range 12, ..., 16 Mbyte. The test duration was 10 s.

Conclusions

From the following conclusions can be drawn:

  • The TCP throughput tests over the Lambda with a duration of 5 s. () a delay of about 5 - 6 micros. causes a considerable improvement in the performance for window sizes from 4 to 12 Mbyte. Below this range the window sizes self are the limiting factor given the round-trip time of nearly 100 ms. Above this range the chance of packets lost increases rapidly due to the larger amount of data send.
  • The performance decreases in for delays larger than 5 micros. are probably caused by the blocking behaviour of the delay mechanism.
  • Not much influences of the delay times could be found in the tests over the regular Internet (), for the reason that there is probably already sufficient shaping and/or network memory intrinsically available.
  • Also at the regular Internet connection performance decreases are found with increasing delay times that is also here probably caused by the blocking delay.
  • The tests executed at the regular Internet connections show for almost all window sizes performance decreases for delays between 6 and 10 micros. The reason for this is unclear.
  • When the performance tests over the Lambda with a duration of 5 s. () are compared with those with a duration of 10  (), it appears that the last tests are containing more performance dips, probably because the longer test duration is enlarging the chance for that.

Dual streams

Results

Besides the results presented in the "Single stream" subsection also multiple stream TCP throughput measurements were executed. In this subsection the results of two streams performance tests over the Lambda are presented. No regular Internet results are shown here, because the previous subsection showed small influences of the delay times. In the the dual streams results are displayed that were obtained with a measurement time of 5 s.

Tput Lambda keeshond => prusin; 2 str.; t: 5s.; win: 1-4MB
.I.    Sum of the TCP throughput as a function of the delay time for dual streams keeshond => prusin over the Lambda. Plot traces are representing the sum of the TCP window sizes in the range 1, ..., 4 Mbyte. The test duration was 5 s.

Tput Lambda keeshond => prusin; 2 str.; t: 5s.; win: 5-8MB
.II.    Sum of the TCP throughput as a function of the delay time for dual streams keeshond => prusin over the Lambda. Plot traces are representing the sum of the TCP window sizes in the range 5, ..., 8 Mbyte. The test duration was 5 s.

Tput Lambda keeshond => prusin; 2 str.; t: 5s.; win: 9-12MB
.III.    Sum of the TCP throughput as a function of the delay time for dual streams keeshond => prusin over the Lambda. Plot traces are representing the sum of the TCP window sizes in the range 9, ..., 12 Mbyte. The test duration was 5 s.

Tput Lambda keeshond => prusin; 2 str.; t: 5s.; win: 13-16MB
.IV.    Sum of the TCP throughput as a function of the delay time for dual streams keeshond => prusin over the Lambda. Plot traces are representing the sum of the TCP window sizes in the range 13, ..., 16 Mbyte. The test duration was 5 s.

Conclusions

From the the following conclusions can be drawn:

  • The effect of the introduced delay times is much less as in the single stream tests. The reasons probably are twofold:
    1. With multiple streams there already is an intrinsic shaping effect. This implies that the throughput with zero delay is already lager as in the single stream case.
    2. The maximum sum throughput values are here also lower as in the single stream situation.
  • The performance decrease with increasing delay times is also found here, but not so strong as in the single stream case.

General Conclusions

From the presented measurements the following general conclusions can be drawn:

  • Introducing blocking delays only has a significant effect for single TCP streams with an optimum at a delay from about 5 micros. With larger delays values the performance is decreasing due to the blocking behaviour of the delays.
  • Not much influences could be found when the regular Internet has been used for the reason that there is probably already sufficient shaping and network memory available.
  • The delay mechanism is not sufficient to explain completely the noticed performance decrease at the Lambda. Noticing that the optimal TCP window size has been given by

    Wopt  =  Blink  TRTT   

    with:

    Wopt   :    The optimal TCP window size.
    Blink : The provided bandwidth of the link.
    TRTT : The round-trip time.

    there follows with an observed round-trip time of about 100 ms. that the optimal TCP window size would be about 12 Mbyte. However, the maximum throughput does not increase much anymore for window sizes larger than about 5 Mbyte.

DataTAG is a project sponsored by the European Commission - EU Grant IST-2001-32459