Saturday, January 11, 2014

Slow serialization between Java 6 and 7

When we upgraded all our systems to use Java 7 in the performance test environment, everything worked fine. But if the client remained as Java 6 and the Jetty server as Java 7 then the test harness slowed by an order of magnitude.

What gives? Nobody else seemed to be complaining about it when I did a Google search.

Well, with a combination of jstack and YourKit, we saw the problem was the bespoke classloader that the test harness used. It was synchronized on the loadClass method and had a lot of threads contending to access it. However, it was exactly the same classloader in both Java 6 and Java 7 configurations. Huh?

Indirectly, the problem was due to the classes that are new in Java 7. Putting some logging in our classloader showed that when running Java 6 it was constantly trying to load ReflectiveOperationException (and failing). This is a class new in JDK 7 and has been retro-fitted to be the superclass of common JDK classes such as ClassNotFoundException.

Our bespoke classloader allows hot deployment of new classes so if a class is not found, it tries to load it from somewhere on the network. But even if it doesn't find it, serialization continues as normal - just very slowly.

This is because Java serialization doesn't really care that classes are missing. It does a best-effort attempt to populate the object anyway.

The serialized bytes of the object still carries meta-data, something I missed when I ran this:

    public static final String OBJ_SER = "String.ser";

    public static void main(String[] args) throws IOException { 
        FileOutputStream fileOutputStream = new FileOutputStream(OBJ_SER); 
        ObjectOutputStream objectOutputStream = new ObjectOutputStream(fileOutputStream); 
        objectOutputStream.writeObject("this is a test"); 
        objectOutputStream.flush(); 
        fileOutputStream.close(); 
    } 

This produced a file of a mere 21 bytes - most of it my text "this is a test". Replacing the line that writes the object to a stream with this:

        objectOutputStream.writeObject(new ClassNotFoundException("this is a test")); 

produce a monster 803 bytes. This is because java.lang.String is different. "There are special representations for null objects, new objects, classes, arrays, strings." [1]. Whereas, ClassNotFoundException is not special and serializes to something like this:

----sr--java-lang-ClassNotFoundException-Z-f-------L--ext--Ljava-lang-Throwable-xr--java-lang-ReflectiveOperationException-----------xr--java-lang-Exception-----------xr--java-lang-Throwable--5-9w-----L--causeq----L--detailMessaget--Ljava-lang-String---
stackTracet---Ljava-lang-StackTraceElement-L--suppressedExceptionst--Ljava-util-List-xppt--this-is-a-testur---Ljava-lang-StackTraceElement--F-----9---xp----sr--java-lang-StackTraceElementa----6-----I-
lineNumberL--declaringClassq----L--fileNameq----L-
methodNameq----xp----t--com-henryp-SerializationMaint--SerializationMain-javat--mainsr--java-util-Collections-UnmodifiableList---1-------L--listq----xr--java-util-Collections-UnmodifiableCollection-B---------L--ct--Ljava-util-Collection-xpsr--java-util-ArrayListx-----a----I--sizexp----w-----xq----xp

Where you can see it carries metadata showing the classes that make up its hierarchy. 

This makes you wonder how efficient Java serialization is since all that metadata is sent with each object. There are alternatives - the open source Kryo being one, the proprietary POF format used in Oracle's Coherence being another (this blog [2] compares serialization libraries). With Coherence, you have to assign arbitrary but unique IDs to each class to be serialized and provide a class to perform the serialization such that your config may look something like this:

        <user-type>
            <type-id>2050</type-id>
            <class-name>com.xx.yy.zz</class-name>
            <serializer>
                <class-name>something that implements com.tangosol.io.pof.PofSerializer</class-name>
            </serializer>
        </user-type>

Which can be laborious but allows very efficient de/serialization. 




No comments:

Post a Comment