Sunday, November 24, 2013

Bufferbloat: less is more

You would have thought that increasing buffer sizes was always a good thing, right? Wrong.

You would have thought that reducing load on a system would always make it faster, right? Also wrong.

When stress testing our code in an Oracle lab in Edinburgh, we noticed that increasing the load on the system increased throughput. Independently, on totally different software (nothing in common other than it's written in Java and some of it's running on Linux) I saw the same thing on my home network.

In both cases, a large network buffer size and low load was the problem. At home, I saw this:

Initiated 7855 calls. Calls per second = 846. number of errors at client side = 0. Average call time = 81ms
Initiated 9399 calls. Calls per second = 772. number of errors at client side = 0. Average call time = 89ms
Initiated 10815 calls. Calls per second = 708. number of errors at client side = 0. Average call time = 96ms
.
.

etc until I started a second machine hitting the same single-threaded process whereupon performance shot up:

Initiated 18913 calls. Calls per second = 771. number of errors at client side = 0. Average call time = 107ms
Initiated 21268 calls. Calls per second = 1177. number of errors at client side = 0. Average call time = 105ms
Initiated 24502 calls. Calls per second = 1617. number of errors at client side = 0. Average call time = 99ms
Initiated 29802 calls. Calls per second = 2650. number of errors at client side = 0. Average call time = 88ms
Initiated 34192 calls. Calls per second = 2195. number of errors at client side = 0. Average call time = 82ms
Initiated 39558 calls. Calls per second = 2683. number of errors at client side = 0. Average call time = 77ms

How odd - more load on the server means better throughput.

I was browsing the subject of bufferbloat on various websites including Jim Getty's excellent blog [1] where he writes extensively on the topic. He says:

"... bloat occurs in multiple places in an OS stack (and applications!). If your OS TCP implementation fills transmit queues more than needed, full queues will cause the RTT to increase, etc. , causing TCP to misbehave."

Inspired by this, I added to my code:

        serverSocketChannel.setOption(
            SO_RCVBUF,
            4096);

before binding the channel to an address and the problem went away (the default value for this option was about 128kb on my Linux box).

Note that although this looks like a very small number, there is no fear of a buffer overrun.

"The TCP socket received buffer cannot overflow because the peer is not allowed is not allowed to send data beyond the advertised window. This is TCP's flow control" [2].

Curious to see why reducing the buffer size helps things, I tried sizes of 512, 1024, 2048 and so on until 65536 bytes while running

sudo tcpdump -nn -i p7p1 '(tcp[13] & 0xc0 != 0)'

which according to [3] should show me when the network experiences congestion (p7p1 is the name of my network interface, by the way).

The first value for SO_RCVBUF at which poor initial performance is encountered was 8192 bytes. Interestingly, as soon as the second client started hitting the server, tcpdump started spewing output like:

17:54:28.620932 IP 192.168.1.91.59406 > 192.168.1.94.8888: Flags [.W], seq 133960115:133961563, ack 2988954847, win 33304, options [nop,nop,TS val 620089208 ecr 15423967], length 1448
17:54:28.621036 IP 192.168.1.91.59407 > 192.168.1.94.8888: Flags [.W], seq 4115302724:4115303748, ack 2823779942, win 33304, options [nop,nop,TS val 620089208 ecr 15423967], length 1024
17:54:28.623174 IP 192.168.1.65.51628 > 192.168.1.94.8888: Flags [.W], seq 1180366676:1180367700, ack 1925192901, win 8688, options [nop,nop,TS val 425774544 ecr 15423967], length 1024
17:54:28.911140 IP 192.168.1.91.56440 > 192.168.1.94.8888: Flags [.W], seq 2890777132:2890778156, ack 4156581585, win 33304, options [nop,nop,TS val 620089211 ecr 15424257], length 1024

What can we make of this? Well, it appears that the bigger the buffer, the longer a packet can stay in the receiver's queue as Getty informs us [1]. The longer it stays in the queue, the longer the round trip time (RTT). The longer the RTT, the worse the sender thinks the congestion is as it doesn't differentiate between time lost on the network and time stuck in a bloated stupid FIFO queue. (The RTT is used in determining the congestion [4])

Given a small buffer, the receiver will, at a much lower threshold, tell the sender not to transmit any more packets [2]. Thus the queue is smaller and less time is spent in it. As a result, the RTT is low and the sender believes the network to be congestion-free and is inclined to send more data.

Given a larger buffer but with greater competition for resources (from the second client), the available space in the buffer is reduced so it things look very similar to the client as described in the previous paragraph.

It appears that the Linux community are wise to this and have taken countermeasures [5].

[1] JG's Ramblings.
[2] Unix Network Programming, p58, Stevens et al, p207
[3] Wikipedia.
[4] RFC 5681.
[5] TCP Small Queues.

No comments:

Post a Comment