Apparently, one of the unsolved programming problems of our time is making HTTP calls – at least, taken from the fact that new HTTP client libraries keep cropping up. Mostly, the focus is on new features, async APIs etc. But what about the actual IO part? Especially regarding performance?

A rather specific case

I’m not going to do a general performance comparison – too many aspects. My use case is a bit more specific – downloading files, potentially large, from a fast network. When downloading from a remote server, most likely the network connection is going to be the bottleneck. But what if I have a server in a local network, and bandwidth is no longer an issue? What difference does IO vs NIO make? The internet is full of rumours and outdated information on this, from “NIO is heaps faster” to “IO is the new NIO because it’s faster”. Time for some benchmarking.

The contestants

Client Version Builds on Language
Apache httpcomponents-client 4.2.5 Java IO Java
Apache commons-httpclient (discontinued) 3.1 Java IO Java
Apache HttpAsyncClient (dev) 4.0-beta3 Java NIO, async Java
Bee 0.21.0 Java HTTP Scala
soke-http 3.0.0 Finagle, Netty Scala
Dispatch 0.10.0 Ning Async HTTP Client, Netty Scala
cURL 7.24.0 x86_64-apple-darwin10.8.0 C

The test

I’m going to be a bit unscientific here – I’m running the test on my workstation, rather than on dedicated hardware in an isolated network. To compensate, I’m doing a number of runs over time, averaging will still give a valid trend. In the results, there has been very little variance in the recorded times.

I’m downloading three binary files from a Nexus server on the local network:

File Size
small ~ 80 KB
medium ~ 7 MB
large ~ 30 MB

Each test starts a JVM and downloads one file. As much as the respective APIs allow, the tests factor out creation and startup of the client instances and try to measure pure request to disk speed. Running JDK 1.6 on OSX, all writes go to SSD.

Small file

http_io_perf_small

cURL, being native, unsurprisingly beats the hell out of most JVM-based solutions. Commons-httpclient comes second – given the size of the file, the deciding factor here is JVM startup, establishing a connection and request generation overhead.

Medium file

http_io_perf_med

Bee is losing ground pretty quickly. Apache’s async client is already twice as fast, while the other solutions play in the mid-field, except for commons-httpclient, which is still the fastest.

Large file

http_io_perf_large

Here it starts to get interesting. NIO’s low-level IO operations are starting to make a difference, as Apache’s async client beats even cURL. Bee is far off, and interestingly enough, the classic Apache beats the Netty-based libraries this time.
But the winner, in overall download time, is still the discontinued commons-httpclient, using classic IO.

A word on NIO

Before we start to look on what’s going on under the hood, a word on NIO. It’s often mentioned that NIO is short for “non-blocking IO”, but apparently, it stands for “New IO API”. NIO.2, by the way, seems to be “More New IO API”. Marketing ftw.

Besides the known, non-blocking / asynchronous aspect, NIO comes with a whole lot of very efficient, low-level IO operations mapping closely to the OS’ native implementations. Where the “classic” IO is copying data around in memory from kernel / network socket to user / application buffer and back to kernel / file, NIO can do a lot of this avoiding multiple buffer copies and context switches between kernel / user.

The “async” aspect of NIO is more of a scalability feature than for raw throughput – the event-driven setup usually hidden in Netty or Grizzly is more valuable for servers that have to deal with a large number of not-too-busy connections.

Under the hood

We’ll start with the slowest on the large file test, Bee client. Most of the time is spent in the JDK classes, copying bytes around, and it also has the worst GC profile, using more than double the peak memory of Apache’s async (20MB vs 10MB):

http_io_bee

Next is soke-http, which is a slim wrapper on Twitter’s finagle, which uses Netty under the hood. Even though this uses NIO, there’s still quite some copying of buffers going on:

http_io_soke

A close third is Dispatch, the de-facto “standard” HTTP client for Scala. It makes more efficient use of Netty than Twitter’s library, but Ning doesn’t do the best of jobs writing out to the target file.

http_io_dispatch

Quite a surprise is Apache’s classic HTTP client, which looks quite similar in its profile, although it uses plain old streams. One possible reason is that both soke-http and dispatch, the event-driven solutions, had a GC cycle kicking in, while both Apaches stayed just under the threshold.

http_io_ahc

The most efficient IO is done by the zerocopy implementation of Apache’s async client, using NIO to write the file – beating a tool written in C.

IBM has a good overview on the “zero copy” idea that Apache async is using .

http_io_ahc_async

But at the end of the day, commons-httpclient still wins by a slim margin. What’s going on? Well, the IO performance is at par with its successor, Apache’s http client version 4, which is no surprise, given that it uses the same facilities. It loses out in IO against Apache’s async client, but more than compensates with faster execution on the application level.

http_io_ahc3

This test is obviously biased against event-driven, asynchronous libraries, since it uses a single thread to download a single file. Then again, the focus here is the IO part, and the results there clearly show that NIO’s lower-level implementation can be quite a bit faster than the classic Java IO. But the results also show that this performance gain can be easily overshadowed by less efficient client code, even when dealing with large files.

And this matters, how?

Mostly, the spread for copying 30MB across the wire is below 500ms, which isn’t much on its own. But let’s consider a build tool, for example – if it has to download 50 files, the difference may well be noticeable – and right in that painful spot between “slow enough to disturb” and “not taking long enough for a mental context switch”.

Like I said, this is a pretty specific use case, and by no means a pro/con for any HTTP client library in general. But these things tend to get overlooked, or taken for granted, or buried deep in the stack somewhere.