Wednesday, December 29, 2010

More Oddities

Nothing particularly enlightening, this is more a scratchpad for me.

Taking yesterday's code, I got to thinking what happens if I compile a class external to our friend, the class that appears to violate the language spec. It has two methods:

/** This code compiles in Eclipse 3.4 but does not in Eclipse Helios */
public class ErasureTest extends TestCase {
public String sameMethod(Class clazz) {
return STRING;

public Integer sameMethod(Class clazz) {
return INTEGER;
The Java Language Spec says this should be forbidden but the JVM seems happy with it. Whatsmore, I now have a class that looks like this:

package com.henryp.lang;

public class ErasureCaller {
public static void main(String[] args) {
new ErasureTest().sameMethod(ErasureCaller.class);

And upon compilation, I get:

phillip:Phill henryp$ javac -d bin -cp .:/Users/henryp/Documents/workspace/Test/bin:/Users/henryp/.m2/repository/junit/junit/3.8.2/junit-3.8.2.jar com/henryp/lang/
com/henryp/lang/ cannot find symbol
symbol : method sameMethod(java.lang.Class<com.henryp.lang.ErasureCaller>)
location: class com.henryp.lang.ErasureTest
new ErasureTest().sameMethod(ErasureCaller.class);
1 error
because the generic information in the signature is not entirely erased as we saw yesterday.

But this is fine:

package com.henryp.lang;

public class ErasureCaller {
public static void main(String[] args) {
new ErasureTest().sameMethod(String.class);
As if the erasures were not, well, erased at all. Which leaves me wondering what's the use of erasures.

"Type erasure exists so that new code may continue to interface with legacy code."
(Type Erasure, The Java Tutorials)

But this seems to be a unidirectional relationship. New code can use old binaries but type erasure does not mean old code can use new binaries. Setting the -source and -target switches on javac to 1.4 does not mean generic code will compile.

Tuesday, December 28, 2010

I am not a language lawyer...

... but this week has been a strange one for delving into the minutia of the Java language as well as the JVM.

First, some of us upgraded our Eclipse to Helios. Despite the fact that we were using the same JVM as before, some of our code was not compiling with the new version of the IDE when it was perfectly fine with the old. The problem was similar to what was outlined in this Eclipse bug here and the JDK bug here.

Our code looked something similar to:

package com.henryp.lang;

import org.junit.Test;

import junit.framework.TestCase;

/** This code compiles in Eclipse 3.4 but does not in Eclipse Helios */
public class ErasureTest extends TestCase {
private static final int INTEGER = 3;
private static final String STRING = "String";

public void testWhichOneIsCalled() {
Object anObject = sameMethod(String.class);
assertEquals(STRING, anObject);

public String sameMethod(Class<String> clazz) {
return STRING;

public Integer sameMethod(Class<Integer> clazz) {
return INTEGER;

The question being: if generics are erased, how does the compiler know which sameMethod method to call?

[Aside: although it has no bearing on this discussion, generic declarations are not completely removed. If a library uses generics, your code can take advantage of them even if you don't have the source code. Just to prove it, this class demonstrates you can see the erased parameter types via introspection:

package com.henryp.lang;

import java.lang.reflect.Method;
import java.lang.reflect.ParameterizedType;
import java.lang.reflect.Type;

import junit.framework.TestCase;

public class GenericErasureTest extends TestCase {

public void testGetGeneric() throws Exception {
Method method = ClassWithParametizedMethod.class.getMethod("aMethodWithGenerics", Class.class);
Type[] genericParameterTypes = method.getGenericParameterTypes();
assertEquals(1, genericParameterTypes.length);
Type type = genericParameterTypes[0];

assertTrue(type instanceof ParameterizedType);
ParameterizedType parameterizedType = (ParameterizedType)type;
Type[] actualTypeArguments = parameterizedType.getActualTypeArguments();
assertEquals(1, actualTypeArguments.length);
assertEquals(String.class, actualTypeArguments[0]);

class ClassWithParametizedMethod {
public void aMethodWithGenerics(Class<String> clazz) {


If we decompile ClassWithParametizedMethod, we see something like:

phillip:Test henryp$ javap -c -verbose -classpath ./bin com.henryp.lang.ClassWithParametizedMethod
public void aMethodWithGenerics(java.lang.Class);
Signature: length = 0x2
00 11

The 11 is hexadecimal and refers to an entry in the constant pool (0x11 = 17):

const #17 = Asciz (Ljava/lang/Class<Ljava/lang/String;>;)V;

and that's where our generics information is stored [1]



The Java Language Specification says:

"It is a compile-time error to declare two methods with override-equivalent signatures (defined below) in a class.

"Two methods have the same signature if they have the same name and argument types.

"Two method or constructor declarations M and N have the same argument types if all of the following conditions hold:

  • They have the same number of formal parameters (possibly zero)
  • They have the same number of type parameters (possibly zero)
  • Let <a1,...,An> be the formal type parameters of M and let <b1,...,Bn> be the formal type parameters of N. After renaming each occurrence of a Bi in N's type to Ai the bounds of corresponding type variables and the argument types of M and N are the same."
(JLS 8.4.2)

What does this mean? Let's define some terms.

"A class, interface, or method that declares one or more type variables. These type variables are known as type parameters."
(Java glossary)

Type Parameters
"A method is generic if it declares one or more type variables. These type variables are known as the formal type parameters of the method. "
(JLS 8.4.4)

Formal Parameters
"The formal parameters of a method or constructor, if any, are specified by a list of comma-separated parameter specifiers. Each parameter specifier consists of a type (optionally preceded by the final modifier and/or one or more annotations) and an identifier (optionally followed by brackets) that specifies the name of the parameter. "
(JLS 8.4.1)

Aside: parameters and arguments are often used synonymously but they are slightly different. Parameters define the method or constructor at compile time. Arguments are what are passed to the method or constructor at runtime [2]. Alternatively, the words actual and formal can be used to distinguish between an argument and a parameter, respectively [3] [4].

It appears that the last of these bullet points in JLS 8.4.2 (quoted above) is describing type erasure. It's saying that

public String sameMethod(Class<String> clazz)

public Integer sameMethod(Class<Integer> clazz)
are the same as far as the Java language is concerned and so this is a bug in the compiler.

But the interesting thing is that although the Java language says you can't have this, the JVM does not seem to mind (although there is some confusion about this that I have not resolved - see below). We have the odd situation of having Java byte code that cannot be rendered into Java.

Decompiling ErasureTest.testWhichOneIsCalled, we see:

public void testWhichOneIsCalled();
Stack=2, Locals=2, Args_size=1
0: aload_0
1: ldc #23; //class java/lang/String
3: invokevirtual #25; //Method sameMethod:(Ljava/lang/Class;)Ljava/lang/String;

The comment tells us that item #25 in the constant pool is indeed our method definition:

const #25 = Method #1.#26; // com/henryp/lang/ErasureTest.sameMethod:(Ljava/lang/Class;)Ljava/lang/String;

This is confirmed by Bill Venner's splendid Inside the Java 2 Virtual Machine:

"The method_info table contains several pieces of information about the method, including the method's name and description (its return type and argument types)."

Note, if the two methods did not have the same return type, the JVM would treat them as the same and the class would fail linking where:

"The virtual machine checks the referenced class for a method of the specified name and descriptor [including its return type]. If the machine discovers such a method, that method is the result of the successful method lookup. [...] Otherwise the method lookup fails."
(ibid, p289)

However, I mentioned there was some confusion about this as the actual JVM spec says:

"The signature of a method consists of the name of the method and the number and type of formal parameters of the method. A class may not declare two methods with the same signature."
(Java VM Spec 2.10.2)

When we have exactly this. Maybe there is something wrong with my JVMs (Sun 1.6.0_20 on Windows and 1.5.0_22 on Mac).

So, that was the first odd thing this week. The second was similar. An Ant script was not deleting the binary directory before building. As a result, old class versions were hanging around when new classes that they called were changing. This lead to a hard-to-debug exception being thrown in Hibernate that was not its fault. I had to go hiking through Hibernate code to realise this, though.

But the problem that manifests itself has a bearing on what is discussed above. Again, it supports Bill Venners rather than the JVM Spec itself.

Try this: write two classes as below:
package com.henryp.lang;

public class VersionCaller {

public static void main(String[] args) {
Object object = new VersionedClass().getNumber();


package com.henryp.lang;

public class VersionedClass {

public Integer getNumber() {
return 0;

Copy them somewhere else, so:

phillip:Test henryp$ cp bin//com/henryp/lang/VersionedClass.class /tmp/Phill/com/henryp/lang
phillip:Test henryp$ cp bin//com/henryp/lang/VersionCaller.class /tmp/Phill/com/henryp/lang
phillip:Test henryp$ java -cp /tmp/Phill com.henryp.lang.VersionCaller

No surprises. So, just subtly change the return type of getNumber() to return a Number (after all java.lang.Integer extends java.lang.Number so it should be OK, right?).

package com.henryp.lang;

public class VersionedClass {

public Number getNumber() {
return 0;

This time, just copy VersionedClass and then run.

phillip:Test henryp$ cp bin//com/henryp/lang/VersionedClass.class /tmp/Phill/com/henryp/lang
phillip:Test henryp$ java -cp /tmp/Phill com.henryp.lang.VersionCaller
Exception in thread "main" java.lang.NoSuchMethodError: com.henryp.lang.VersionedClass.getNumber()Ljava/lang/Integer;
at com.henryp.lang.VersionCaller.main(

This happens because VersionCaller was compiled against an old version of VersionedClass:

public static void main(java.lang.String[]);
Stack=2, Locals=2, Args_size=1
0: new #16; //class com/henryp/lang/VersionedClass
3: dup
4: invokespecial #18; //Method com/henryp/lang/VersionedClass."":()V
7: invokevirtual #19; //Method com/henryp/lang/VersionedClass.getNumber:()Ljava/lang/Integer;

So, it begs the question that if calling code can distinguish between two methods by virtue of their return type, why the JVM spec says you cannot have two otherwise identical methods?


Saturday, November 13, 2010

Tuning Sybase

A debate is raging in the office over which DB query is the most expensive. Since it's the weekend, time to fire up Fedora in VMWare and tinker with my Sybase installation.

First, you have to set up the DB to take advantage of the statistics. Do this with:

[sybase@vmwareFedoraII ~]$ isql -Usa -SVMWAREFEDORAII
1> exec sp_configure "allow resource limits", 1
2> go

You need to reboot, so:

1> shutdown
2> go
Server SHUTDOWN by request.

and restart:

[root@vmwareFedoraII henryp]# /etc/rc.d/init.d/sybase start


[sybase@vmwareFedoraII ~]$ isql -Usa -SVMWAREFEDORAII
1> use henryp
2> go
1> set statistics io on
2> go
Total actual I/O cost for this command: 0.
Total writes for this command: 0
1> set showplan on
2> select * from tranche t , ioi i where i.fk_tranche =
Table: tranche (t) scan count 1, logical reads: (regular=1 apf=0 total=1),
physical reads: (regular=1 apf=0 total=1), apf IOs used=0

Table: ioi (i) scan count 2, logical reads: (regular=18 apf=0 total=18),
physical reads: (regular=8 apf=1 total=9), apf IOs used=1

Total actual I/O cost for this command: 288.

Total writes for this command: 0

Execute the same select again and you get:

Table: tranche (t) scan count 1, logical reads: (regular=1 apf=0 total=1),
physical reads: (regular=0 apf=0 total=0), apf IOs used=0

Table: ioi (i) scan count 2, logical reads: (regular=18 apf=0 total=18),
physical reads: (regular=0 apf=0 total=0), apf IOs used=0

Total actual I/O cost for this command: 38.

Total writes for this command: 0

Notice how the physical reads are reduced second time around and indeed the actual I/O cost is an order of magnitude less.

Presumably, this is because the DB is caching this data. Rebooting clears this cache:

[root@vmwareFedoraII henryp]# /etc/rc.d/init.d/sybase restart
Stopping sybase service: [ OK ]
Starting sybase service: [ OK ]

And running the query again with io statistics on gives the same actual I/O as when we first ran the query.

Let's see the showplan again without actually executing it:

1> set noexec on
2> select * from tranche t , ioi i where i.fk_tranche =
3> go

Total estimated I/O cost for statement 2 (at line 2): 731.

The I/O costs are a unit less number (see Administrator's Guide to Sybase ASE 15).

Why the difference?

"The estimated cost is higher than the actual cost if some pages are already in the cache or if the statistics are incorrect. The estimated cost may be lower than the actual cost if the optimizer chooses 16K I/O, and some of the pages are in 2K cache pools, which requires many 2K I/Os. Also, if a big join forces the cache to flush its pages back to disk, repeated access may require repeated physical I/Os."

So, by a circuitous route, why were some people in the team seeing different estimated I/O costs for the same query? Actual costs can change for a variety of reasons - but estimated? Well, I don't know for sure but I was also using an IDE to run some of these queries and just happen to limit the number of rows since I didn't want to print out thousands of lines. Showplan knows this and changes the estimated accordingly. Gotcha.

Of course, if the data changes then the estimate will change too.

Sunday, November 7, 2010

ByteArrayInputStream is a not terribly efficient...

A question came my way on Friday: if Java New I/O (NIO) is so efficient, why doesn't using it to read a big XML file into memory and wrapping the result in a ByteArrayInputStream make parsing faster?

The answer lies in what ByteArrayInputStream actually does. It's nothing more than a thin wrapper around a byte[] that allows calling code to think it is dealing with an InputStream. But each call to read(byte[], int, int) causes a fragment of the array to be copied (so that the caller cannot modify the underlying array).

The problem is that by the time you've reached the end, you've copied the whole array. With large amounts of data, this can cause a lot of memory to be used and garbage collected and you'll soon start to see that savings of NIO have been more than offset by the activity of ByteArrayInputStream.

Afterall, one of the benefits of NIO is minimizing copying. From Ron Hitchen's excellent Java NIO:

"[in normal Java IO] The disk controller writes the data directly into a kernel memory buffer by DMA [Direct Memory Access] without further assistance from the main CPU. Once the disk controller finishes filling the buffer, the kernel copies the data from the temporary buffer in kernel space to the buffer specified by the process when it requested the read() operation... [C]opying from kernel space to the final user buffer seems like extra work."

(Java NIO, p13,14).

"By mapping a kernel space address to the same physical address as a virtual address in user space [as with NIO], the DMA hardware (which can access only physical memory addresses) can fill a buffer that is simultaneously visible to both the kernel and a user space process."

(ibid, p15).

Wednesday, November 3, 2010

Unfairs Fair

This conundrum was on our internal mailing lists:

Take a ReentrantReadWriteLock. One thread calls readLock(), gets the lock and calls lock() on it.

A second calls writeLock() and also calls lock(). The first thread still holds the read lock so the write lock blocks. No surprises there.

But then a third thread also gets a read lock and tries to call lock() on it and it too blocks - even though it is a read lock.

Surprised? Why should a read block when a fellow read holds the lock? Doesn't the JavaDocs for the ReadWriteLock interface say:

"The read lock may be held simultaneously by multiple reader threads"


Well, this is a special case. Yes, usually if a read lock is being held then other reading threads can lock even if there is one or more writer threads attempting to take the write lock. This is true irrespective of the ReentrantReadWriteLock being fair or not.

The special case is this: if a read lock is held but the next bid for the lock comes from an attempt to attain a write lock, subsequent read locks will block and wait for the write operation to complete.

You won't find this in the JavaDocs but in a small comment in the ReentrantReadWriteLock.NonfairSync code. I found this snippet from Sun's 1.6.05 implementation while stepping through the code to diagnose this behaviour:

final boolean readerShouldBlock(Thread current) {

/* As a heuristic to avoid indefinite writer starvation,

* block if the thread that momentarily appears to be head

* of queue, if one exists, is a waiting writer. This is

* only a probablistic effect since a new reader will not

* block if there is a waiting writer behind other enabled

* readers that have not yet drained from the queue.


return apparentlyFirstQueuedIsExclusive();


This may or may not be the case on other JVMs.

Saturday, October 30, 2010

Debugging Production Environments

Have you ever been in the position where you have to look after a legacy app that has insufficient logging and non-deterministic problems? Whatsmore, it's running in production and that rare non-deterministic bug has just occurred. How do you diagnose it?

This task faced me last week. Looking at the code on my machine, I just couldn't see how the problem could manifest itself. "Afterall, there is only one instance of this object in the entire system, right?" I found myself saying. "It must be a bizarre and rare threading issue!"

So, I ran this on the production Linux box (where PID is the Java process ID):

$JAVA_HOME/jmap -histo:live PID | grep OUR_CLASS

and saw something like this (I've added the column headings for clarity):

num #instances #bytes class name


9030: 2 200 OUR_CLASS

So, there were 2 instances of our class! All my recent investigation of odd threading issues was for naught!

This is one reason why the factory pattern is so nice - you know exactly where the objects are being instantiated (tip: make the class's constructor package protected and put the factory in the same package with no other classes in that package). As an added benefit, it makes writing test code easier.

Still, this isn't going to help you if some smart alec decides to introspectively instantiate an object of this class. But such a practice should be strongly discouraged.

Testing and embracing change

My client can only be described as "Agile curious" rather than gung-ho disciples of our lord, Kent Beck. So, it was nice to see an email circulated on the company-wide technical mailing lists asking for help in writing automated regression tests. The author asked about the wisdom of serializing the output of all tests into XML and comparing them between runs.

I almost choked on my coffee. I worked on a project that did this last year and it was not pretty. So, I emailed him how using serialized objects in tests had worked for us:

1. It was fragile. Even minor refactoring of the Java objects would break the tests.

2. It didn’t scale. As the number of XML files that needed to be maintained increased, productivity declined. (At one point, we were maintaining about 20 megs of serialization XML.)

3. Java developers tend not to like maintaining large XML files as their Java code changes - so some refactoring of the XML was, shall we say, less than diligent...

4. Serializing Sets (or any other collection where the order in which the elements were serialized was non-deterministic) could break the tests. For a time, some developers were hacking the problem domain classes to make the order deterministic just so the tests would pass.

To this last point, he replied that XML Unit could handle this non-determinism. However, I think the first three points were more than enough to frighten me from doing that again.

More volatility

Wondering why one of our legacy apps seems to be having thread issue, I continued digging into the recesses of the Java Memory Model.

Volatile is a fascinating keyword since many excellent Java programmers don't appreciate its finer points. An engineer I much admire told me that he only used it for boolean primitives when telling threads to stop running. Fair enough, but there is more to it than that. Here are some:

1. What if we're not using volatile primitives? What if we're using, say, an array? Can I be sure that setting an element of the array is volatile too and will be seen by other threads? Apparently not, according to Google's Jeremy Manson in this tutorial (about 40' into the video) where he says: "there is no way to make the elements of an array volatile... the reference to the array is volatile not the elements of the array itself".

2. Furthermore, the performance of volatile is generally pretty good. "On an x86 there is no cost in reading a volatile variable. There is a cost to writing to a volatile variable but not reading... as of 2007".

3. On some hardware, 64-bit read/writes are not atomic but "writes and reads of volatile long and double values are always atomic" (Java Language Specification 17.7).

4. The order of the lines of your code that a JVM executes can also be changed by volatile (you did know the order of execution may differ to how you wrote your code, right?). Brian Goetz explains:

"There is no guarantee that operations in one thread will be performed in the order given by the program, as long as the reordering is not detectable within that thread - even if the reordering is apparent to other threads"

(Java Concurrency in Practise, p34).

However, when we introduce volatile :

"Under the old memory model, accesses to volatile variables could not be reordered with each other, but they could be reordered with nonvolatile variable accesses. [...]

Under the new memory model, it is still true that volatile variables cannot be reordered with each other. The difference is that it is now no longer so easy to reorder normal field accesses around them... In effect, because the new memory model places stricter constraints on reordering of volatile field accesses with other field accesses, volatile or not, anything that was visible to thread A when it writes to volatile field f becomes visible to thread B when it reads f."

(JSR-133 FAQ)

Incidentally, this does mean that if you write code that is fine in JDK 1.5+, it might not be fine running on older machines. If you really want to write-once-run-anywhere Java, you should assume you're writing for the old JMM and make all variables volatile for whom the order of execution of read/write operations is important.

So, did all of this help me to fix the buggy legacy app? No. Every time I have cast a suspicious eye at the finer points of the JMM, it has actually been a much more mundane bug that was the culprit. In this case, it turns out that somebody, somewhere was introspectively calling one of our methods that starts threads - a call that is very hard to find with most IDEs.


Wednesday, October 27, 2010

A little volatile

Time to refresh my memory: what is the volatile modifier for?

The answer given by most Java programmers I've interviewed recently is that it makes the value the reference points to visible to all threads. This is true but not the whole story.

When I first came to appreciate the Java Memory Model I was surprised like almost everybody else to see that the value of a single field may be different for two different threads. This is not something peculiar to Java. Any hardware that conforms to a Von Neumann architecture (which is pretty much every common processor) can run faster if it uses its CPU's registers to store references rather than main memory (RAM).

Making a reference volatile means that it will be published to main memory (accessible by all threads) rather than being stored in a CPU's register (accessible by only that thread). But it also means:

1. Its published state is safe

If a field of an object is being instantiated, potentially other threads have access to it. However, they may reference it before the instantiating thread has left the constructor. If the object is not immutable, this can lead to inconsistent data. Using volatile is one way to eliminate this.

To ensure safe publication, Java code needs to do at least one of the following:
  • initialize the object from a static initializer
  • store the reference in a final field
  • store the reference in a volatile field
  • store the reference in a java.util.concurrent.atomic.AtomicReference
  • guard the reference in an appropriate lock
(See Brian Goetz's excellent Java Concurrency in Practise, p52, for more information).

2. All other variables are flushed to main memory

"When thread A writes to a volatile variable and subsequently thread B reads that same variable, the value of all variables that were visible to A prior to writing to the volatile variable become visible to B after reading the volatile variable."
(Java Concurrency in Practise, p38)

The reason I am re-reading Mr Goetz's excellent book is that I have been asked to diagnose a threading issue in some legacy code where I came across the Double Checked Locking idiom (see this link for what it is and why it's pathological). It's been a long time since I saw DCL code or even the Singleton pattern (singletons are hard to test and, in this day of dependency injection, largely redundant). But something I read that I had forgotten was:

"To ensure that all threads see the most up-to-date values of shared mutable variables, the reading and writing threads must synchronize on a common lock"
(ibid, p37, emphasis mine).

DCL is fixed if one uses volatile, according to Jeremy Manson, co-author of the JSR-133 and JLS that deals with threads and synchronization. Since the old code does not use volatile, I suspect this may be the cause of our problem. However, now I need to prove it - not easy when you're dealing with Heisenbugs.

Sunday, August 15, 2010

Emergent (Mis)behavior

I've been puzzling for a while over why our bond allocation program would suddenly hang when under stress.

Fortunately for us, we have a good test harness and I could attach a profiler and watch and wait for it to happen. Eventually, it did:

Emergent behaviour as all threads spontaneously start contending for a resource

All the threads found themselves in the same logjam - but for no discernible reason. It was like a spontaneous "phantom jam" on a motorway (there is an interesting computer simulation of exactly this in a video here).

It was clear where the threads were choking (mentioned here).

But why did an innocent - if not particularly efficient - piece of code cause the system to grind to a halt?

Well, it's all down to something called emergent behaviour - or Emergent (Mis)behavior as Jeffrey C. Mogul calls it in his excellent paper here. One (of many) definitions is:

Emergent behavior is that which cannot be predicted through analysis at any level simpler than that of the system as a whole.

Mogul's paper does not address the question of whether emergent behaviour is a property of the system or a function of our ignorance. But he does give many examples of where it has occurred in software systems and gives a warning to the architects of SOA systems. Service Oriented Architecture emphasises a separation of concerns:

The fundamental benefit to solving problems this way is that a number of the solution logic units can be designed to solve immediate concerns while still remaining agnostic to the greater problem.
(SOA, Principles of Service Design, p70 - Thomas Erl)

Although it's a good design principle that helps minimize maintenance costs, it can lead the unwary into a false sense of security. To avoid this, one can only watch a system's behaviour in a test environment for a long period of time - time that most projects don't have. Most projects fix these problems once they have gone live.

There is very little literature in this area but Mogul attempts to build a taxonomy of emergent misbehaviour with the hope that one day we'll be better equipped to say what a particular system may be prone to it.

Emergent behaviour also occurs in non-computer systems. As examples: it was not obvious from the constitution of the Weimar Republic that it could allow the rise of a tyrant like Hitler; the book Freakonomics describes many examples where unexpected behaviour arises in otherwise well-understood systems; the designers of the Millennium Bridge in London understood footbridges and understood how crowds behave on them but did not predict how the crowds response to the bridge would feed back into how the bridge behaved under load.

Regarding this last example, Dr Mogul points out that although this phenomena was know, no quantitive analysis existed. The same problem still afflicts IT community.

Monday, August 9, 2010

Hash brown-outs

How do you tell if a production server has a memory leak? Simply run:

$JAVA_HOME/jmap -dump:file=/tmp/dump_file PID

and then load the file into your profiler of choice and see where all the memory is being used.

I had to do this last week. As it happened, there wasn't a memory leak but a ConcurrentHashMap that was being populated faster than its elements could be consumed.

The interesting thing was that the memory profile looked all wrong. To see what I mean, we have to look closer at ConcurrentHashMap.

What is a ConcurrentHashMap?

ConcurrentHashMap is a map that more than one thread can read and update without any fear of ConcurrentModificationExceptions being thrown.

How does it do this?

ConcurrentHashMap is basically a hash table of hash tables. These inner hash tables are called "segments". Threads that insert elements only lock one of these segments. This is one way that ConcurrentHashMap improves throughput.

Threads that wish to read do not use locking. Instead, you see in the code, snippets like this:

transient volatile HashEntry[] table;
HashEntry getFirst(int hash) {
HashEntry[] tab = table;
return (HashEntry) tab[hash & (tab.length - 1)];

Without the method-scoped pointer (tab) pointing to the object-scoped pointer (table) on the first line of the method, the second line would have a race condition. That is: the table may change size as the array index was being calculated. This would return the wrong element.

Of course, this means that between the first and the second line, tab may have become stale. The danger is that this method could return an element that another thread had already removed from the "real" table.

[Aside: pointing a reference (tab) to another reference (table) is, of course, atomic ("Writes to and reads of references are always atomic, regardless of whether they are implemented as 32 or 64 bit values." - Java Language Spec 17.7, 3rd Edition).]

The JavaDocs do warn you about this potential "staleness" when it says:

"Retrievals reflect the results of the most recently completed update operations holding upon their onset."

A bug in ConcurrentHashMap

Back to my memory dump. The problem seen in the ourput from jmap was that all of the objects we wanted to store in the ConcurrentHashMap were in just one segment rather than being evenly distributed. The code of ConcurrentHashMap that determines which segment into which an object is to be inserted looks like this:

final Segment segmentFor(int hash) {
return segments[(hash >>> segmentShift) & segmentMask];

Where hash is the hashCode of our object and segmentShift is 28 for the default constructor of ConcurrentHashMap. Reducing this significantly means gobbling huge amounts of memory.

Shifting an Integer to the right 28 times means only a very small range of numbers that can be stored in a 32-bit Integer are useful - those greated than 2 e 29 and less than -2 e 29.

This bug is mentioned here but Sun/Oracle appear to say it is fixed in my version of Java when it definitely isn’t (as the source code shows).

This leaves us with the choice of hacking our hashCode() method to achieve these extreme numbers or grin and bear it until we upgrade our Java to a version that has fixed this problem.

It's not a major issue so it looks like the latter.

Sunday, August 1, 2010

Go test, young man

Since I've had a week of testing, I thought I'd share some thoughts on an area much neglected by most Agile methodologies: non-functional testing. Everybody knows it's important but most books only make a passing reference to it.

Stress Tests

Since I have my copy of Pressman sitting next to me, I'll use his definition:

Stress tests are designed to confront programs with abnormal situations. In essence, the tester who performs stress testing asks: "How high can we crank this up before it fails?"
(Software Engineering: A Practitioners Approach - Roger S Pressman)

The trouble is: what defines "fails"? For a web application, it might be when the time it takes to service a request exceeds a predefined limit. But which metric and at which point in time? The average time? And which average - mean or median? Or maybe the 90% line? And what if it just spikes momentarily then settles down? Has the test failed at this point? I don't have the answers.

Sufficient for us was when our system started throwing (ie, the first one) non-deterministic ConcurrentModificationExceptions after the code naively tries 5 times to change a data structure other threads are changing (this poor design is fixed in the next release).

Look out for deadlocks in your code during these tests. The annoying thing about deadlocks is that they tend to happen when your system is most under stress so now is a chance to catch them rather than when a production release has gone live and is getting hammered. The JConsole that comes with JDK6 can do this detection automatically for you.

Soak Tests

For want of a better reference, Wikipedia defines Soak Tests as "testing a system with a significant load extended over a significant period of time".

For most practical purposes, this boils down to memory leaks. My advice is to attach JConsole and look and look at the memory of the JVM. Then start the stress tests and go home. The following morning, look at the pretty graph JConsole has produced. It should be a nice saw-tooth shape with the average remaining pretty much constant.

Test Harnesses

Apache's JMeter should be good enough for most of your needs. But the question of "who watches the watchmen?" [quis custodiet ipsos custodes?] has never been more important. Two gotchas that got me were:
  1. too much monitoring in JMeter causes it to use a lot of memory and start performing poorly thus giving false data.
  2. my test code had a memory leak. I had to connect to the system I was testing using in-house code. This code tries to connect to a messaging system to send audits for each request to the system. If it cannot, it quietly stores the messages in memory for sending later. And so, it is the test harness that slowly runs out of memory not the system under test. This is generally not your first thought. I fixed this problem by getting the source code of the client JAR and commenting out this call before redeploying the test harness.
In each case, I spotted the problem by attaching JConsole to the test harness.

Performance Testing

Put together a test with typical usage metrics. I once worked for a company that had outsourced a lot of it's coding. The legal draft contract stated that access to page X would take place in no more than 2 seconds. The vendor must have been rubbing its hands with glee as that should have been easy to achieve with just one person using the system. The draft never specified the number of concurrent users. If the system didn't scale, it was legally no longer the vendor's problem. They had met their contractual obligation.

"Performance tests ... often require both hardware and software instrumentation" says Pressman. On Linux, it should be sufficient to run the top command to get a good idea of CPU usage (vmstat is also useful if you want to monitor IO). The stat that is the most interesting is the load average. If this number is greater than your number of CPUs, work is starting to backlog. Or, to put it another way:

“If there are four CPUs on a machine and the reported one-minute load average is 4.00, the machine has been utilizing its processors perfectly for the last 60 seconds”
Linux Journal Dec 1 2006


You can use your performance tests to drive your system while you profile the JVM. There are (as ever) some caveats:
  1. There's a testing version of the Heisenberg Uncertainty Principle: the act of observation affects the system. Profiling adds an overhead so do this as well as your non-profiling performance tests.
  2. Just because your profiler shows that a particular method takes a long time it doesn't mean it did a lot of work. Message driven systems will typically spend a lot of time waiting for something to happen. This shows up in the profiler as a method that is "hogging" CPU time. This is a false interpretation. Another common time "hog" is waiting IO code - typically this is an order of magnitude greater than your code in a server. So don't be surprised if you see a lot of time spent here:, byte[], int, int, int)[], int, int)

Tomcat Tuning

The continuing hunt for performance improvements in our app continues - this time in the Web Tier.

I tried adding the following JVM args when I ran the JMeter performance tests:

Uses performance improvements in the JVM that are still experimental in a given release [1].

Inlines getXXX() methods for primitives [2].

Controls when HotSpot compiles code into native bytecode. High numbers mean faster start up time, low numbers mean the code is optimized sooner. For a server, a low number may be desirable [3].


Allows threads to have an affinity with a lock they frequently use. Makes attaining that lock faster [4].

There are many others but from reading the literature, these looked likely candidates.

Unfortunately I saw no improvements on my machine. This last point is important. Currently, we have a shortage of QA machines so I had to try the tests on my Windows machine. I protested that this was not an accurate way of doing things. For instance, adding the –server JVM arg massively improved performance. I was quite excited until I realized that for multi-core Linux machines such as our production boxes this is set as default. On 32-bit Windows machines, it is not [5].

So, I gave up on JVM tuning and looked at Tomcat config. The obvious first choice was to use NIO (non blocking IO). This is easy. You just make your connector in server.xml look like this:

<connector port="8080"

redirectPort="8443" />

But although this massively reduced the number of threads from roughly one thread per connection to about half a dozen, it didn't seem to speed things up.

Time to fire up the profiler of choice, YourKit. Under normal load, the threads look like this:
Fair enough. A spot of contention (red) but mostly waiting (yellow) for something to happen (eg, an incoming request). But some waiting was for a message from an internal library that was developed in-house by another team (profilers can't distinguish between legitimate waiting and unnecessary waiting).

So, I looked at the thread the third party library gave us and it looks like this:
Green indicates the thread is doing work - light green when it executes in native code, dark green for all other code. The light green in this picture is just listening on a socket in native code, the dark green is just one thread that happens to be doing a lot of serialization (see the bottom half of the screen).

So, we have a bottle neck that's nothing to do with Tomcat. Tomorrow, I am going to rip out this in-house library as it's caused us nothing but pain and go back to Java RMI. Then, let's see how much we can tune Tomcat.


Saturday, July 31, 2010

There is nothing new under the sun

One of team (AJ) was sent on an Test Driven Development course this week and came back and gave a show-and-tell at the weekly retrospective. Some interesting points came out of the subsequent discussion

AJ said that he had been taught the definition of "legacy code" was code that had no tests (I assume, after a quick Google, this came from Michael Feather's book Working Effectively with Legacy Code).

But, I pointed out, if we accept this definition then all test code was therefore legacy code. Who has tests for tests - Quis custodiet ipsos custodes?

So, what was the best way to avoid the pain often associated with "legacy" code? Afterall, some of our tests are now over 3 years old. So, we decided:
  • Making methods so short that unit testing becomes almost trivial.
  • If the method is truly trivial, there is little point in testing it. (We don't like unit testing everything since that sounds like we're just ticking check boxes for the Time and Motion man).
  • Do as little mocking as possible since writing expectations can lead to hard to maintain code. Instead, override methods in the (production) class that would call other classes/make network calls etc with methods in a (test) subclass is preferred.
  • Continue using EasyMock on the projects that already use it but use Mockito for new projects since test code is much more succinct with Mockito. One team member was alarmed that we would then have two frameworks for testing but was won over by the rest of the team who had used it and said it was trivial to learn.
  • If we're just quickly trying to get the test to pass rather than thinking about why it's failing, the test serves no purpose.
AJ also mentioned Acceptance Testing. This caused some confusion as to what exactly this was. For those of us who were brought up on Pressman, the definition is:

Conducted by the end user rather than the system developer, an acceptance test can range from an informal "test drive" to a planned and systematically executed series of tests.
(Software Engineering: A Practitioners Approach, European Adaptation, Fourth Edition, p522 - Roger S Pressman)

More than once, I've heard people use the term "acceptance tests" and assumed that their audience knew they meant automated, regression acceptance tests as encouraged by frameworks such as Fitnesse. Pressman's definition is more general. So, I started Googling for formal definitions of terms the team was discussing.

Test Driven Development

The notable characteristics of TDD are:
  • automated testing [1]
  • regression testing
  • black- and white-box testing [2]
  • unit testing [1]
  • test-first programming [1] ("write a little test that doesn't work, and perhaps doesn't even compile to begin with").
[1] Test Driven Development: By Example - Kent Beck

The ubiquitous JUnit is an example of a TDD framework.

Team observations: write-first testing is nice but not to be dogmatically enforced; just because you're using JUnit doesn't mean you're unit testing.

Behaviour Driven Development
, to give it its British spelling, was a term coined by fellow Brit, Dan North. It is a methodology that brings together:
  • automated testing [3]
  • regression testing [3]
  • black-box testing
  • acceptance testing [3]

It distinguishes itself from other methodologies in that "acceptance criteria should be executable". To this end, it uses a domain specific language that looks like natural language. This makes the tests easy to read by the business.

JBehave is an example of a framework that facilitates this.

Team observation: these tests take a long time to run - which can hamper productivity; testers and business people are supposed to write the tests but that very, very rarely happens and instead the developer does this.

This is a very incomplete list of terms. Ideally, I want to launch Gimp and put together some matrix of which methodologies include what elements. Although they often have some very nice frameworks, new methodologies often claim to be paradigm shifts (Dan North calls BDD "a second generation ... methodology" [3]). Really, they are emphasising ideas that have been long established.

Wednesday, July 28, 2010

ActiveMQ Patch Accepted

Yay! The Apache ActiveMQ team have accepted my patch!

This fixes a bug that prevented the consumption of relevant messages from a Topic for which we have a durable subscription if it were full of messages we were not interested in.

The bug manifested itself when your consuming thread simply hung despite perfectly good messages being added to the JMS Topic.

Fortunately, ActiveMQ comes with MBeans that are easily activated. This allowed us to watch what the little blighter was doing with the much under-utilized JConsole. Service stopped when we saw the number of irrelevant messages in the Topic hit 100 - the default size of the buffer. This was the clue that prompted me to go off and look in the code at where my consuming thread was blocking (JConsole also shows where all the JVM's threads are).

Incidentally, I'm surprised how few really good Java developers don't use JConsole - despite the fact you get it for free with your JDK. For monitoring an application in production, you can't beat it. Other commonly used libraries often come with MBeans ready to deploy. For instance, did you know that you can change the Log4J logging threshold at runtime? If you want to make your own MBeans and you're using Spring, it couldn't be easier. You just need to have a org.springframework.jmx.export.MBeanExporter in your context and annotate your class with @ManagedResource.

Saturday, July 10, 2010

Brief comparison of locking between Oracle, Postgres and Sybase

I'm always surprised at how little most server-side developers know about databases even when all but the most trivial application them. And I am not talking about understanding SQL (worthy cause as it is) but understanding what is happening in the DB regarding locks. You generally don't have any problems at this level during testing. But under a non-trivial load they are a major concern.

So, partly to remind myself, here are some notes on the databases of which I have experience.

Sybase uses (exclusive) write and (shared) read locks extensively. If you update any data, you have an exclusive lock on that data until the end of the transaction. No other thread can even read it. "That data" can be anything. It doesn't need to be a row. It could be a table or an index. So, perhaps surprisingly, these two threads block each other:

TX 1
begin tran
begin tran
insert into X values ('y')
insert into X values ('y')
[Blocked by TX1]
commit tran
[Released by TX1]

commit tran

They are dealing with different rows but TX1 holds an exclusive lock on the table.

Oracle and Postgres deal with things differently. They use Multiversion Concurrency Control. At the start of a transaction, you get something similar to a snapshot of the database that you can play with until your "merge" your changes into the "trunk" (note: this is only a loose analogy and there are areas where it breaks down - see later).

So, these two (Oracle) processes don't block each other:

TX 1
set transaction isolation level read committed ;
set transaction isolation level read committed ;
insert into X values ('y')
insert into X values ('y')

They only really block each other when they really are contending for the same row:

TX 1
set transaction isolation level read committed ;
set transaction isolation level read committed ;
update X set Y ='y' where Z = z;
update X set Y ='y' where Z = z;
[Blocked by TX1]

[Released by TX1]

But (unlike Sybase) TX2 is not blocked in Oracle or Postgres in this scenario:

TX 1
set transaction isolation level read committed ;
set transaction isolation level read committed ;
update X set Y ='y2' where Z = z;
select Y from X where Z = z;
[sees the old value for Y]

Sybase would block TX2 upon the select command until TX1 committed or rolled back. This leads to an interesting unintuitive corner case in Sybase that I have seen. Since a shared read lock is still a lock, you can have deadlocks between two transactions even if one of those transactions is read-only. For instance, if I issue a select * where ... on a table, I start locking rows in the table. If another transaction updates a row that my select process has not yet reached then tries to update a row my select process already has a lock on - deadlock. Sybase chooses one transaction to be the loser. By default, this is the transaction that has done the least work.

So far, I've discussed the READ COMMITTED isolation level - the default on all three databases. The fun starts when you use SERIALIZABLE.

In Sybase:


set transaction isolation level 3 -- SERIALIZABLE
begin tran
select count(z) from X

set transaction isolation level 1 -- READ_COMMITTED

begin tran
insert into X values ('y')
[Blocked by TX1]

[Released by TX1]

Note: if TX1 had been at isolation level 1 (READ_COMMITTED) there would have been no blocking in TX2.

Unsurprisingly, Oracle and Postgres don't block:

set transaction isolation level serializable;
set transaction isolation level serializable;
insert into X values ('y');
select count(z) from X
[sees value ' w', say]
insert into X values ('y')
select count(z) from X
[sees value 'w' too]
select count(z) from X
[still sees value 'w' !]

So, although Sybase blocks (which can be somewhat inefficient with anything but the shortest transactions), it is consistent. In Oracle and Postgres, you will continue to see the same value for as long as the transaction continues even though the data is stale and a perfectly good change has been successfully committed.

Oracle has to keep hold of this snapshot to maintain the SERIALIZABLE contract for potentially forever (or at least until you get "ORA-01550 snapshot too old"...).

And whereas Sybase guarantees that your view of the data always SERIALIZABLE, Oracle and Postgres do not.

set transaction isolation level serializable;
set transaction isolation level serializable;
delete from X where Z=z;
update X set Y='y2' where Z=z

Oracle gives you:

Error: ORA-08177: can't serialize access for this transaction

and Postgres gives you:

ERROR: could not serialize access due to concurrent update
SQL state: 40001

Interestingly, the criteria of SERIALIZABLE is defined as being met if the final database state is equal to all the queries being run one after the other with no overlap. Note, the DB implementation is not obligated to actually do this, just that the final state must be equivalent to this behaviour. The Sybase programmers chose to guarantee this every time. The Oracle and Postgres developers chose to make a best-effort and to throw an exception if the contract could not be honoured.

There will always be subtle differences between implementations. So, Java developers, get to know a little about your database of choice. (Oh, and keeping transactions as short as possible is a good idea in all three databases.)

Aside: it's a pity that the Exceptions in the JDBC spec are so non-specific. I get a SQLException if the database is down (from which I can't really recover). And I get a SQLException if my transaction was deadlocked (from which I can recover, namely by trying again.) A missed opportunity.

Sunday, July 4, 2010

Concurrent modification of data structures

Sharing data is something that seems easy but has many subtle problems. When I architected a credit rating system in the past, I chose to make all the domain objects immutable. Although most developers were fine with this approach, it made a number of others antsy as manipulating the data became a little more unwieldy. By definition, they couldn't change the data (make a similar object with new fields) and felt the constructors took too many arguments since everything had to be set at object creation time (decompose the model).

Anyway, I'm glad we went with that architecture all those years ago since now I am on a completely different project, I have to maintain code that did not adopt this pattern. As a result, things are messy.

As an aside, hire good developers. Good architects and good managers are for naught if the guys at the code face are not naturals.

First off, in the current mutable architecture, threads were taking the objects from the cache and manipulating them for their own purpose. This changed the underlying data. Oops. So, this was remedied quite early by making a copy using simple Java serialization.

But frequently copying large objects using serialization is surprisingly expensive. With 70 threads reading a moderately sized object (~100s of KB) and 20 updating it, each every 45s, my four-core Linux box is maxing out.

Whatsmore, I'm seeing plenty of ConcurrentModificationExceptions in the logs as the threads performing updates to the data interfere with the threads performing the serialization. This forces a retry which consumes yet more CPU cycles.

And to top it all, occasionally one read thread is stuck in an infinite loop and consumes all the CPU as another thread changed a java.util.HashSet (read more about this pathological phenomena here).

Sure, you could say well the JavaDoc clearly says don't change non-thread-safe collections while another thread is reading it. But most developers don't reckon on somebody serializing an object of their class while somebody else updates a collection within it.

And sure, you could say use ConcurrentHashMap and its ilk ("They do not throw ConcurrentModificationException" and "all operations are thread-safe") but we never chose the implementation of our maps and sets - Hibernate did as it rehydrated the object.

These are innocent mistakes (albeit with potentially devastating impacts). Less forgivable perhaps are developers updating java.util.Sets with code like:


The equality of our anElement object is based on its Primary Key, so that's sensible. But of course, in a shared data world, this introduces a horrible race condition. For a moment, aSet does not contain anElement. Obvious when you think about it but although this code was well unit tested, the developer never thought about concurrency. It only manifests itself as a bug when the system is under a heavy load.

My patch involved using RentrantReadWriteLock that allowed me to have non-blocking ReadLocks and exclusive WriteLocks. Firing up JMeter and running my stress tests showed a moderate degradation in performance (5-10%) but under heavy load, 0% of errors (compared to 5% of errors without this locking due to race conditions).

Ideally, I'd like to re-write this crufty code entirely and wrap changes to the underlying data structure in Command objects that can only be executed in one class. This would give me a centralized place to implement and manage the locking code. Unfortunately, access to the underlying data is currently scattered across this component.

Alas, it's unlikely that I'll get to do this as for the client my current solution is "good enough".

As for improving the performance, I managed to squeeze a 30% saving by just copying what was needed by the calling code not copying a larger object graph then returning a subset of that data. But that's less interesting.

Sunday, May 16, 2010

Hibernate and Performance Tuning

So, continuing with squeezing more performance out of the back-end of our Java/Sybase app, I attached YourKit to the JVM and ran the soak tests for an hour or so.

One of our queries is a straight insert/update to two tables. Other than the tables also having triggers for further auditing purposes, there's nothing special here.

However, on inspection of where the time is being spent, I see half the time for this query is actually spent in handling the connection and the connection pool (we're using DBCP).

DBCP with Sybase JConnect 6 JDBC driver

Roughly 54% of the time of this innocuous call was spent doing something which was unrelated to my persistence request.

OK, so let's compare with Tomcat's connection pools. This should be faster as instead of our persistence thread wasting time with checking the status of the connection etc, another thread handles that for us (see ConnectionPool$ And indeed so it proves to be:
Tomcat Connection Pool with Sybase JConnect 5.5 JDBC driver

Now, only 44% of time is spent getting and returning the connection back to the pool (and this is using an older JDBC driver - see below).

But still, there is more. Notice that a lot of time is spent in Hibernate's JDBCTransaction. This time is used checking the connection's auto-commit status and ensuring that it is set to true when returned to the pool.

This seems like an unnecessary overhead since with Sybase's JConnect 5.5, both of these actions are network calls (with JConnect 6, only setting the auto-commit flag is a network call. Reading it is not since the driver caches this information).

And althought the JDBC spec says "The default is for auto-commit mode to be enabled when the Connection object is created" (section 10.1.1), I can't find any reference that says this must also be true when storing the connections in a pool.

I see where the Hibernate team are going. They say "the intent is that the connection pool you use with Hibernate has autocommit *disabled*". But I can configure Tomcat's connection pool to always make this so. In which case, this extra checking and setting becomes an unnecessary overhead. It would be nice if JDBCTransaction could be configured such that when told explicitly that the pool it is using has auto-commit set to false, it does not make these calls...

Anyway, using Tomcat connection pool with all connections having auto-commit of false and using JConnect 6, the time to actually service my request went up to 85% of total time with only some 15% handling the pool, committing the transaction etc. Thus the median time to service this simple request dropped to about 70ms from about 200ms. Nice.