Saturday, May 8, 2010

The god of small things

Java programmers are spoilt. They have excellent Collection classes already written for them.

But choosing the correct one can still be a problem.

This week, I was looking once more at improving the performance of a trading application. YourKit was indicating that a lot of time was being spent in the equals() method of a class we use for to hold audit information. The method itself seemed fine. It's just that it was being called a lot of times. Fair enough. The system had lots of audit points. But why was equals() being called so often anyway?

On closer inspection, things could be optimized. Basically, persisting an order to a Sybase DB on the same subnet through the app was taking about 450ms during normal deal activity. For each DB write, the thread would pop an audit object onto an in-memory queue from which another thread would consume it and persist it to another DB.

The JavaDocs for this Queue class said "this implementation employs an efficient "wait-free" algorithm". So, what could possibly go wrong?

Well, underneath the queue is a linked list. These are slow to traverse when you are looking for your element (which the first thread would do to update the audit if it were still waiting to be dispatched - don't ask why). It was this searching that was calling our equals() method so often.

I talked to the original programmer and asked why the batching of these audits had to be in a queue. Could they be in any Collection - say a HashSet? HashSets are much more efficient during a lookup since they put objects into buckets based on the mod of the integer returned by the hashcode() method. If the mod of the hashcode() method is not the same as the object to be compared, the collection doesn't bother to call equals().

The programmer said that indeed the order of the audits was unimportant as long as they all were eventually dispatched. But what surprised me was that little change - Queue to HashSet - made a huge difference to performance. Persists now took about 200ms under normal deal activity.

So, all the time spent looking at optimizing the database to improve access times was dwarfed by one Java change.

Moral of the story #1: wait until you have empirical evidence rather than assuming where the problem lies.

Moral of the story #2: don't assume that just because Collection classes are well written and optimized by experts you can use whichever one you like.

No comments:

Post a Comment