There have been a few customers wondering how Crowd scales (outside of it’s integration with JIRA/Confluence). Unfortunately, the answers we could think of ranged from “..yes” to “nfi” – so we decided to take a look at load testing Crowd.
Since Crowd offers a bunch of connection points for various applications, directories and databases, it’s hard to give an accurate single metric for scalability. One particular evaluator was asking how Crowd would scale for 1 million users using an internal directory (MySQL) with a PHP application.
It’s a massive number given that we consider 20,000 users a large user base.

Getting a million users

We ran a script to insert users into our internal directory, starting with user0 and ending with user999999 – taking 5 hours.
I came back in this morning and found that Dave, our team lead, had already verified that it was possible to authenticate and use the Crowd console without issues. This shows us that Crowd is capable of not falling over if there are 1 million users in it’s repository. He also decided to double-check that JIRA integration was very broken with this many users :)

Load Testing

PHP, Java, .NET or whatever is unlikely to make a huge difference if your web-services stack is slick. What’s more relevant is the calls your application makes to Crowd. You can bet that findAllPrincipalNames will take much longer than findPrincipalByName for example.
As the evaluator didn’t have a concrete idea of the number and nature of the calls his application would be making, we decided to test out the fundamental calls made to Crowd:

  1. Authenticate
  2. Validate token
  3. Find user from token

When you go to an application, your likely to log in (authenticate) once, but your likely to perform many secure operations (each requiring a valid token and the corresponding user) once logged on. Thus we vaguely approximate 100 token checks per authentication call for our load test. In reality you’re likely to need fewer authentications.
We had two ideas for load testing: hammer the crap out of it or approximate a reasonable load. We decided to go with the hammering option as it’s possible to extrapolate performance under a reasonable load from that data – and it’s much faster to simulate than mimic the 3-30 seconds a user would take to read a web page before clicking.

The Hammering

Hammering Crowd means see how many concurrent threads Crowd can service. So we take n users and launch n threads. Each thread performs:

for 1..100
{
authenticatePrincipal()
for 1..100
{
verifyToken()
findPrincipalByToken()
}
}

Which is ~10,000 requests to Crowd. Note that this test is equivalent to something logging in and pressing refresh 100 times (really fast!) and repeating that 100 times.
Clearly with a handful of threads, you’d expect Crowd to get smashed.
For the load test, Crowd and MySQL were on the same box: a 4 core Mac Pro, networked over a 100MBit line to the client, residing on a separate box. Check out the results:

crowd-hammer-table.png
crowd-hammer-graph.png

Analysis

Making sense of the data:

  • Authentications/authentication verifications are pretty fast (~10ms).
  • Crowd performs optimally when there are 4-6 threads hammering it at the same time and doesn’t appear to show signs of death for more concurrent threads.
  • The JVM heap was 128MB and it didn’t die, ie. Crowd is not hogging memory since there’s only a handful of entities it needs to load up for this authentication test.
  • Load could be limited by the generation box, however, the generation box was an 8 core beast whereas the Crowd server was on a 4 core box.
  • Crowd seems to scale for 15+ concurrent threads hammmering it with authentication requests. Overnight, we ran a 50-concurrent-threads test which had an average request service time of 8.26ms. Conversely, this translates to 120 requests serviced per second.
  • At 50 concurrent threads, we are still not maxing out the CPUs although their idle time is decreasing. We could push Crowd even further until it was either CPU, disk or network bound.
  • We can extrapolate these results and overestimate “reasonable usage” to allow for 10 seconds between authentication checks. This means that Crowd could handle 1200 active users. Note that 10 seconds is an overestimate, especially if client libraries cache authentication for much longer (usually around 2 minutes).
  • This is still a basic test and we should investigate broader performance testing of Crowd’s API.

Going Forward

Load testing is important. It’s even more important when you’re middleware. Although these metrics are a start, we should consider further performance testing:

  • Various directories: OpenLDAP, ActiveDirectory.
  • Various databases: Postgres, Oracle.
  • Load testing real applications integrated with Crowd (eg. Confluence, JIRA): might be good to compare how much fat Crowd adds to an applications standard repository.
  • Replaying logs from customers’ (or our own) applications for automated load testing of Crowd (without needing to run a specific client application).
  • Profiling: determine which methods are letting us down and optimise based on potential benefit.

There are two sides to considering Crowd and load, and the first is to ensure the Crowd Server is lean and mean. The side which we didn’t examine in this post, particularly useful for Java clients, is to ensure that our client libraries are smart and efficient – which boils down to effective caching – only making requests to the Crowd server when actually necessary.
We’re working on making Crowd, and Crowd integration, even slicker in 1.4!