DataTAG logo
Workpackage 3
Bulk data transfer validations and application performance monitoring
 Work Packages
WP1
WP2
WP3
WP4
WP5
WP6
PTB
PMB
DataTag Work package 3 | DataTAG task 2.1 | TCP Measurements with TXQueue Length Variation

TCP Measurements with TXQueue Length Variation

In this document the results from TCP measurements between the following hosts:

  • Source hosts located at SARA and NIKHEF in Amsterdam, Netherlands.
    • The SARA hosts are connected with a SSR 8000 switch with a 2 Gbit/s trunk to a Cisco 6509 switch that is on its turn connected with the Netherlight Lambda from SURFnet.
    • The NIKHEF hosts are directly connected with the Cisco 6509.
  • Destination hosts located at EVL, Chicago IL, and Caltech / SLAC, Sunnyvale CA.
From the Dutch to the USA hosts single stream throughput tests were performed with a duration of 16 s and a TCP window size of 12 Mbyte. The round-trip time to the EVL hosts is about 105 ms, while it is about 160 ms to the Caltech / SLAC hosts.

These tests were performed as function of the length of the Transmit Queue(TXQ) of the sending device which had an amazing effect on long bandwidth high speed. The TXQ length can be adjusted using the Unix command  ifconfig interface txqueuelen length , which allows one to set the transmit queue buffer length for a device to any length, though one should keep in mind that the device is limited by the amount of memory on it. We found that even though tuning of this variable can improve the performance of TCP by several factors, it is not very predictable and there does not seem to be an easy way what the length should be.

shows the results of testing Throughput over a Long Fat Network along the following routes: (LFN) routes: NIKHEF -> EVL, SARA -> EVL, and SARA -> SLAC. For each route two series consisting of several hundred different queue lengths are displayed. The default queue length is 100 packets, while in the graphs increased performance is found until about 1500 packets for the tests to EVL and about 3500 packets for the tests to SLAC. Having a large transmit queue is very helpful during the TCP bandwidth discovery phase in absence of congestion events since it allows one to reach maximum throughput very quickly. If a congestion event does happen, the flow will fall back into the TCP congestion avoidance phase, the same as if it was in steady state, and will then act like normal TCP over a long latency link and will take several RTT to recover.

Tput (TXQqueueLength)
.    Transmit Queue Length in Maximum Segment Sizes versus throughput in Mbit/s. The TXQ length step was 32 MSS. At the sender host a Linux Net100 kernel capable of AIMD has been used.

In there can be seen that through adjusting the transmit queue one can clearly obtained improved throughput, but the throughput is not always very predictable. During the tests packet loss -- to try to identify if a failure of bandwidth discovery phase was occuring -- had been detected by monitoring the Web100 variable BytesRetrans. This had been done to give indication if the oscillations in throughput were due to packet retransmission, but there were no retransmissions during the tests. We now assume that the variances may be due to the dynamics of TXQ which causes an early congestion event (i.e. premature end of bandwidth discovery phase again), and low throughput.

Authors:    Antony Antony
Hans Blom

DataTAG is a project sponsored by the European Commission - EU Grant IST-2001-32459