Honda Super Hawk

Slow start implications


Last update: 20.09.98 (change log)

With Solaris 2.5.1 and patches 103582-15 or above applied, or starting with Solaris 2.6, a new TCP bug/feature is available to the user. This page will show on that activating the bug on webservers will speedup the data transfer with buggy client implementation, e.g. Irix 6.2 or Windows. Since Solaris does recognize the initial phase of a data transfer, you will not experience a speedup with Solaris.

1. Finding out about the test object

At first the delayed ACK interval on the IRIX client or possibly windows client has to be found out about. As the toggles and dial of an Irix client are unknown to me, a modified version of the sock utility described in W. Richard Stevens' books come in very handy. The modifications allow it to do sub-second pauses between data transmissions. Looking back from the results to this point, finding the exact delayed ACK interval is not possible. You can verify how BSDish your test object is.

It suffices to connect a sock to the Irix discard port and trace the connection. It is not a good idea to connect to the echo port, as the echo server's answer will be ready to send before the delayed ACK interval timeout occurred. You can do the same test with your Windows systems, but vanilla Windows 95 does not suppy an inetd, and therefore has no server listening on the discard port. Nevertheless, the procedure is the same.

server $ sock -i -p 0.75 -n 4 server discard
server # tcpdump -ttNl | some_filter
  1: 0.000000 client.30957 > server.discard: S ...:...(0) win 33580 <mss 1460>
  2: 0.000984 server.discard > client.30957: S ...:...(0) ack ... win 61320 <mss 1460>
  3: 0.001057 client.30957 > server.discard: . ack 1 win 33580

  4: 0.001277 client.30957 > server.discard: P 1:1025(1024) ack 1 win 33580
  5: 0.148547 server.discard > client.30957: . ack 1025 win 61320
  6: 0.746173 client.30957 > server.discard: P 1025:2049(1024) ack 1 win 33580
  7: 0.748428 server.discard > client.30957: . ack 2049 win 61320
  8: 1.496185 client.30957 > server.discard: P 2049:3073(1024) ack 1 win 33580
  9: 1.548272 server.discard > client.30957: . ack 3073 win 61320
 10: 2.246127 client.30957 > server.discard: P 3073:4097(1024) ack 1 win 33580
 11: 2.348140 server.discard > client.30957: . ack 4097 win 61320

 12: 2.996492 client.30957 > server.discard: F 4097:4097(0) ack 1 win 33580
 13: 2.997147 server.discard > client.30957: . ack 4098 win 61320
 14: 2.997974 server.discard > client.30957: F 1:1(0) ack 4098 win 61320
 15: 2.998017 client.30957 > server.discard: . ack 2 win 33580
Figure 1: Trying to find out about the delayed ACK interval of a BSDish host.

A slightly modified version of the tcpdump output is shown. First of all, line numbers are introduced and relative timestamps are used. The client and server are abstracted names. Please mind that the server in this case is the Irix box, and the client the Solaris machine. The initial sequence numbers are replaced with three dots, as they contain no relevant information. Finally, the trailing (DF) is cut off, as both hosts are doing path MTU discovery. The connection initiation and tear down is separated from the data transfer phase.

From the output, segment 4 is the client data transfer and segment 5 the server (Irix) delayed acknowledgment of about 150 ms. The second pair does not seem to show any delay, the third pair 40 ms and finally 100 ms. This seemingly erratic behaviour of the Irix host can be subscribed to the BSD networking code. The delayed ACK timer of TCP goes off every 200 ms, but it goes off at fixed points in time, that is, every 200 ms relative to the last boot [compare: Stevens, TCPIPV1, 19.3].

This first experiment did not show us the delayed ACK interval. It only verified that Irix is behaving very BSDish.

2. Scrutinizing slow start with Solaris servers and BSDish clients

The major performance increase works due to the fact that immediate ACKs are elicted instead of running into the delayed ACK interval timeout. The performance difference is visible, as the experiments show.

The client (Irix) requests an about 13K plain HTML document, with no server side includes nor anything else special to do. The 13K is the average document size as seen on caches. Including the HTTP transmission headers, the document grows to 13549 Byte on TCP level. The webserver is an Apache/1.2.5 PHP/FI-2.0b13. The browser is simulated on the commandline with the help of the sock utility:

client $ ( echo -ne "GET /test.html HTTP/1.0\r\n\r\n" ; sleep 1 ) | sock server 80

Please note that the shell builtin echo command needs to know about extended options, and that the CRLF sequences which terminate a request are sent explicitely. If the sleep were missing, the sock program would immediately close both directions of the data transfer, possibly before receiving a reply. Even though it is possible to use the TCP halfclose feature of sock, and Apache will do the correct thing, some webserver do not understand nor handle them correctly. This is especially true for webcaches. Only recent recommendations argue that the directions of a HTTP transfer should be handled independently of each other. Also, if half-closes had been used, the experiment would not have simulated real browser behaviour.

  1: 0.000000 client.1469 > server.www: S ...:...(0) win 61440 <mss 1460>
  2: 0.000250 server.www > client.1469: S ...:...(0) ack ... win 33580 <mss 1460>
  3: 0.000909 client.1469 > server.www: . ack 1 win 61320

  4: 0.002138 client.1469 > server.www: P 1:28(27) ack 1 win 61320
  5: 0.004606 server.www > client.1469: P 1:1461(1460) ack 28 win 33580
  6: 0.038097 client.1469 > server.www: . ack 1461 win 61320
  7: 0.038186 server.www > client.1469: . 1461:2921(1460) ack 28 win 33580
  8: 0.038253 server.www > client.1469: . 2921:4097(1176) ack 28 win 33580
  9: 0.238699 client.1469 > server.www: . ack 4097 win 61320
 10: 0.238731 server.www > client.1469: . 4097:5557(1460) ack 28 win 33580
 11: 0.238820 server.www > client.1469: . 5557:7017(1460) ack 28 win 33580
 12: 0.238889 server.www > client.1469: . 7017:8449(1432) ack 28 win 33580
 13: 0.256021 client.1469 > server.www: . ack 8449 win 60040
 14: 0.256104 server.www > client.1469: . 8449:9909(1460) ack 28 win 33580
 15: 0.256167 server.www > client.1469: . 9909:11369(1460) ack 28 win 33580
 16: 0.256230 server.www > client.1469: . 11369:12829(1460) ack 28 win 33580
 17: 0.256285 server.www > client.1469: F 12829:13550(721) ack 28 win 33580
 18: 0.261071 client.1469 > server.www: . ack 11369 win 60448
 19: 0.261424 client.1469 > server.www: . ack 13551 win 59139
 20: 0.262634 server.www > client.1469: . ack 29 win 33580
 21: 0.262662 client.1469 > server.www: F 28:28(0) ack 13551 win 61320

Figure 2.1: Solaris server with tcp_slow_start_initial set to 1 and a BSDish client.

The first experiment has set the server behaviour to the Solaris default before the patch. The server pushes out the start of its reply insegment 5, but it has to wait until the client acknowledges the answer. As you can see, the next part of the reply almost immediately follow the ACK coming in. The second client acknowledge of the server's reply in segment 9 is a delayed ACK, as it is delivered almost exactly 200 ms after the first ACK.

Please note that in figure 2.1 the transfer took over 250 ms, most of it waiting for a delayed ACK.

  1: 0.000000 client.1472 > server.www: S ...:...(0) win 61440 <mss 1460>
  2: 0.000292 server.www > client.1472: S ...:...(0) ack ... win 33580 <mss 1460>
  3: 0.000933 client.1472 > server.www: . ack 1 win 61320

  4: 0.001453 client.1472 > server.www: P 1:28(27) ack 1 win 61320
  5: 0.004541 server.www > client.1472: P 1:1461(1460) ack 28 win 33580
  6: 0.004659 server.www > client.1472: P 1461:2921(1460) ack 28 win 33580
  7: 0.008695 client.1472 > server.www: . ack 2921 win 61320
  8: 0.008744 server.www > client.1472: . 2921:4097(1176) ack 28 win 33580
  9: 0.008846 server.www > client.1472: . 4097:5557(1460) ack 28 win 33580
 10: 0.008908 server.www > client.1472: . 5557:7017(1460) ack 28 win 33580
 11: 0.013305 client.1472 > server.www: . ack 7017 win 60884
 12: 0.013329 server.www > client.1472: . 7017:8449(1432) ack 28 win 33580
 13: 0.013377 server.www > client.1472: . 8449:9909(1460) ack 28 win 33580
 14: 0.013447 server.www > client.1472: . 9909:11369(1460) ack 28 win 33580
 15: 0.013514 server.www > client.1472: . 11369:12829(1460) ack 28 win 33580
 16: 0.018901 client.1472 > server.www: . ack 9909 win 61320
 17: 0.018961 server.www > client.1472: FP 12829:13550(721) ack 28 win 33580
 18: 0.019976 client.1472 > server.www: . ack 12829 win 61320
 19: 0.020307 client.1472 > server.www: . ack 13551 win 60599
 20: 0.021076 server.www > client.1472: . ack 29 win 33580
 21: 0.021100 client.1472 > server.www: F 28:28(0) ack 13551 win 61320
Figure 2.2: Solaris server with tcp_slow_start_initial set to 2 and a BSDish client.

The second experiment sets the new value 2 of the slow start patch for Solaris on the server. The most notable difference is that Solaris now seems to be "miscounting" when sending its reply. The "miscount" only concerns RFC compliance. It is not a mistake, Solaris. behaves like that on purpose. As the client receives now two segments instead of one, it has to send an acknowledge immediately. "Immediately" is still four to five milliseconds processing time for an Irix Indy.

Please note that in figure 2.2 the transfer is finished after just 21 ms. Even though the Solaris host still seemed to be waiting on the (slow) Indy, the network was not really idle, as no delayed acknowledgments were encountered.

3. Scrutinizing the patch with Solaris servers and clients

Doing things with two Solaris machines does not look as good as having a buggy BSDish client. The test machines are a Solaris 2.5.1 server and a Solaris 2.6 client. Both machines have a delayed ACK interval of 200 ms.

  1:0.0000 client.8381 > server.www: S ...:...(0) win 33580 <mss 1460>
  2:0.0003 server.www > client.8381: S ...:...(0) ack ... win 33580 <mss 1460>
  3:0.0008 client.8381 > server.www: . ack 1 win 33580

  4:0.0011 client.8381 > server.www: P 1:28(27) ack 1 win 33580
  5:0.0045 server.www > client.8381: P 1:1461(1460) ack 28 win 33580
  6:0.0051 client.8381 > server.www: . ack 1461 win 33580
  7:0.0052 server.www > client.8381: . 1461:2921(1460) ack 28 win 33580
  8:0.0053 server.www > client.8381: P 2921:4097(1176) ack 28 win 33580
  9:0.0059 client.8381 > server.www: . ack 2921 win 33580
 10:0.0060 server.www > client.8381: . 4097:5557(1460) ack 28 win 33580
 11:0.0061 server.www > client.8381: . 5557:7017(1460) ack 28 win 33580
 12:0.1980 client.8381 > server.www: . ack 7017 win 33580
 13:0.1981 server.www > client.8381: . 7017:8449(1432) ack 28 win 33580
 14:0.1982 server.www > client.8381: . 8449:9909(1460) ack 28 win 33580
 15:0.1983 server.www > client.8381: . 9909:11369(1460) ack 28 win 33580
 16:0.1983 server.www > client.8381: . 11369:12829(1460) ack 28 win 33580
 17:0.3980 client.8381 > server.www: . ack 12829 win 33580
 18:0.3981 server.www > client.8381: FP 12829:13550(721) ack 28 win 33580
 19:0.3986 client.8381 > server.www: . ack 13551 win 33580
 20:0.3989 client.8381 > server.www: F 28:28(0) ack 13551 win 33580
 21:0.3990 server.www > client.8381: . ack 29 win 33580

Figure 3.1: Server tcp_slow_start_initial 1 and client tcp_slow_start_initial 1.

  1:0.0000 client.8384 > server.www: S ...:...(0) win 33580 <mss 1460>
  2:0.0003 server.www > client.8384: S ...:...(0) ack ... win 33580 <mss 1460>
  3:0.0007 client.8384 > server.www: . ack 1 win 33580

  4:0.0011 client.8384 > server.www: P 1:28(27) ack 1 win 33580
  5:0.0045 server.www > client.8384: P 1:1461(1460) ack 28 win 33580
  6:0.0052 client.8384 > server.www: . ack 1461 win 33580
  7:0.0053 server.www > client.8384: . 1461:2921(1460) ack 28 win 33580
  8:0.0054 server.www > client.8384: P 2921:4097(1176) ack 28 win 33580
  9:0.0060 client.8384 > server.www: . ack 2921 win 33580
 10:0.0061 server.www > client.8384: . 4097:5557(1460) ack 28 win 33580
 11:0.0062 server.www > client.8384: . 5557:7017(1460) ack 28 win 33580
 12:0.2002 client.8384 > server.www: . ack 7017 win 33580
 13:0.2003 server.www > client.8384: . 7017:8449(1432) ack 28 win 33580
 14:0.2004 server.www > client.8384: . 8449:9909(1460) ack 28 win 33580
 15:0.2004 server.www > client.8384: . 9909:11369(1460) ack 28 win 33580
 16:0.2005 server.www > client.8384: . 11369:12829(1460) ack 28 win 33580
 17:0.4001 client.8384 > server.www: . ack 12829 win 33580
 18:0.4002 server.www > client.8384: FP 12829:13550(721) ack 28 win 33580
 19:0.4007 client.8384 > server.www: . ack 13551 win 33580
 20:0.4010 client.8384 > server.www: F 28:28(0) ack 13551 win 33580
 21:0.4011 server.www > client.8384: . ack 29 win 33580

Figure 3.2: Server tcp_slow_start_initial 1 and client tcp_slow_start_initial 2.

  1:0.0000 client.8388 > server.www: S ...:...(0) win 33580 <mss 1460>
  2:0.0004 server.www > client.8388: S ...:...(0) ack ... win 33580 <mss 1460>
  3:0.0009 client.8388 > server.www: . ack 1 win 33580

  4:0.0011 client.8388 > server.www: P 1:28(27) ack 1 win 33580
  5:0.0047 server.www > client.8388: P 1:1461(1460) ack 28 win 33580
  6:0.0048 server.www > client.8388: P 1461:2921(1460) ack 28 win 33580
  7:0.0053 client.8388 > server.www: . ack 1461 win 33580
  8:0.0055 server.www > client.8388: . 2921:4097(1176) ack 28 win 33580
  9:0.0056 server.www > client.8388: . 4097:5557(1460) ack 28 win 33580
 10:0.0056 client.8388 > server.www: . ack 2921 win 33580
 11:0.0058 server.www > client.8388: . 5557:7017(1460) ack 28 win 33580
 12:0.0058 server.www > client.8388: . 7017:8449(1432) ack 28 win 33580
 13:0.2009 client.8388 > server.www: . ack 7017 win 33580
 14:0.2010 server.www > client.8388: . 8449:9909(1460) ack 28 win 33580
 15:0.2011 server.www > client.8388: . 9909:11369(1460) ack 28 win 33580
 16:0.2012 server.www > client.8388: . 11369:12829(1460) ack 28 win 33580
 17:0.2013 server.www > client.8388: F 12829:13550(721) ack 28 win 33580
 18:0.2022 client.8388 > server.www: . ack 12829 win 33580
 19:0.2024 client.8388 > server.www: . ack 13551 win 33580
 20:0.2027 client.8388 > server.www: F 28:28(0) ack 13551 win 33580
 21:0.2029 server.www > client.8388: . ack 29 win 33580

Figure 3.3: Server tcp_slow_start_initial 2 and client tcp_slow_start_initial 1.

  1:0.0000 client.8387 > server.www: S ...:...(0) win 33580 <mss 1460>
  2:0.0003 server.www > client.8387: S ...:...(0) ack ... win 33580 <mss 1460>
  3:0.0008 client.8387 > server.www: . ack 1 win 33580

  4:0.0010 client.8387 > server.www: P 1:28(27) ack 1 win 33580
  5:0.0043 server.www > client.8387: P 1:1461(1460) ack 28 win 33580
  6:0.0045 server.www > client.8387: P 1461:2921(1460) ack 28 win 33580
  7:0.0050 client.8387 > server.www: . ack 1461 win 33580
  8:0.0051 server.www > client.8387: . 2921:4097(1176) ack 28 win 33580
  9:0.0052 server.www > client.8387: . 4097:5557(1460) ack 28 win 33580
 10:0.0053 client.8387 > server.www: . ack 2921 win 33580
 11:0.0054 server.www > client.8387: . 5557:7017(1460) ack 28 win 33580
 12:0.0055 server.www > client.8387: . 7017:8449(1432) ack 28 win 33580
 13:0.2019 client.8387 > server.www: . ack 7017 win 33580
 14:0.2020 server.www > client.8387: . 8449:9909(1460) ack 28 win 33580
 15:0.2021 server.www > client.8387: . 9909:11369(1460) ack 28 win 33580
 16:0.2021 server.www > client.8387: . 11369:12829(1460) ack 28 win 33580
 17:0.2022 server.www > client.8387: F 12829:13550(721) ack 28 win 33580
 18:0.2032 client.8387 > server.www: . ack 12829 win 33580
 19:0.2034 client.8387 > server.www: . ack 13551 win 33580
 20:0.2037 client.8387 > server.www: F 28:28(0) ack 13551 win 33580
 21:0.2037 server.www > client.8387: . ack 29 win 33580

Figure 3.4: Server tcp_slow_start_initial 2 and client tcp_slow_start_initial 2.

Figures 3.1 and 3.2 each displayed two delayed acknowledgments in segment 12 and 17. Common to both experiments is the RFC compliant behaviour of the server. Figures 3.3 and 3.4 just show one delayed acknowledgment in segment 13. In those experiments, the server uses the patched feature.

The experiments with the Solaris hosts talking to each other show that not all unnecessary deemable delayed acknowlegdement can be avoided. But just avoiding one can half your transfer time for the average web document.

On the other hand, some experiments of your own might turn up different results, and even seem to indicate a worse performance with the patch enabled, if both hosts are using it. After all, 200 ms is just barely better than the transaction time of the regular buggy cient, and 400 ms is even worse. What we would like to see here is a fast access below 50 ms, too, without having to tune the delayed ACK interval.

Indeed, as the load on servers and lines changes, you might even experience an undelayed transfer between two Solaris hosts. It is possible, but maybe not as common as we would like it. I had to repeat the experiment several times in order to get the data for figure 3.5.

  1:0.0000 client.8391 > server.www: S ...:...(0) win 33580 <mss 1460>
  2:0.0003 server.www > client.8391: S ...:...(0) ack ... win 33580 <mss 1460>
  3:0.0005 client.8391 > server.www: . ack 1 win 33580

  4:0.0008 client.8391 > server.www: P 1:28(27) ack 1 win 33580
  5:0.0041 server.www > client.8391: P 1:1461(1460) ack 28 win 33580
  6:0.0042 server.www > client.8391: P 1461:2921(1460) ack 28 win 33580
  7:0.0048 client.8391 > server.www: . ack 1461 win 33580
  8:0.0048 server.www > client.8391: . 2921:4097(1176) ack 28 win 33580
  9:0.0049 server.www > client.8391: . 4097:5557(1460) ack 28 win 33580
 10:0.0049 client.8391 > server.www: . ack 2921 win 33580
 11:0.0051 server.www > client.8391: . 5557:7017(1460) ack 28 win 33580
 12:0.0051 server.www > client.8391: . 7017:8449(1432) ack 28 win 33580
 13:0.0058 client.8391 > server.www: . ack 7017 win 33580
 14:0.0059 server.www > client.8391: . 8449:9909(1460) ack 28 win 33580
 15:0.0059 server.www > client.8391: . 9909:11369(1460) ack 28 win 33580
 16:0.0060 server.www > client.8391: . 11369:12829(1460) ack 28 win 33580
 17:0.0060 server.www > client.8391: P 12829:13550(721) ack 28 win 33580
 18:0.0063 server.www > client.8391: F 13550:13550(0) ack 28 win 33580
 19:0.0067 client.8391 > server.www: . ack 12829 win 33580
 20:0.0068 client.8391 > server.www: . ack 13551 win 33580
 21:0.0074 client.8391 > server.www: F 28:28(0) ack 13551 win 33580
 22:0.0075 server.www > client.8391: . ack 29 win 33580

Figure 3.5: Expample for an unimpended transfer between two Solaris hosts.

To have a look at how many bytes were transferred at ethernet level, RFC 894 encapsulation, we have to count SYN, sole FINs and sole ACKs as 64 Byte (40 Byte + 2\6 pads + 14 header + 4 trailer). Everthing else will have 48 Bytes added (20 TCP + 20 IP + 18 Ethernet). Thus the transfer in figure 3.5 shipped 14809 on Ethernet level in 7.5 ms using up a capacity of (minimum) 15 Mbps on a 100 Mpbs channel. As the inter-frame gaps were not accounted for, this is still lower than reality.


[Back]  [Solaris tuning]  [TCP transactions]  [SYS-V-IPC]  [TCP rexmit]  [Slow start]  [Index] 

Sun, Sun Microsystems, the Sun Logo and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.

Valid CSS! Please send your suggestions, bugfixes, comments, and ideas for new items to soltune at sean dot de
In hope of supplying useful information, Jens-S. Vöckler

Last Modified: Thursday, 22-Sep-2005 16:15:51 MEST