Friday, December 7, 2012

Java Serialization - Good, Fast, and Faster

Probably anyone who has ever worked with serialization of objects, be that in Java or any other language, knows that it should be avoided whenever possible. Just like the first rule of distribution is "Do not distribute!", the first rule of serialization should be "Do not serialize!". However, in many cases, especially in distributed environments, serialization cannot be avoided and therefore must be significantly optimized to achieve any kind of reasonable throughput.

At GridGain, given the distributed nature of our product, we have always been working on optimizing of our serialization routines, but starting with version 4.3.0 we have achieved the fastest results so far. Our GridOptimizedMarshaller in our tests achieved up to 20x performance optimization on standard Java serialization with java.io.Serializable. If you switch to java.io.Externalizable, then GridGain marshaller is up to 10x faster. We have even compared our marshaller to Kryo serialization, and turns out that our marshaller is up to 5x faster than Kryo. On top of that, the footprint of GridGain serialized objects is significantly smaller than Java.

The coolest thing here is that we do not require any custom interfaces or API s - GridGain optimized serialization works directly with standard Java POJOs, regardless if they implement java.io.Serializable interface or not. If your POJOs implement java.io.Externalizable, then our marshaling works even faster.

How do we do it? The main culprit of Java serialization is java.io.ObjectOutputStream which is extremely expensive to initialize and performs poorly. The first thing we did is replaced it with our own implementation, based on direct memory copying by invoking native C and Java so-called "unsafe" routines.  We also serialize fields in predefined order by doing lots of object introspection which allows us to pass only values and not their type names or other metadata.

Here are the results from the test on my MacBookPro 2.7 GHz Intel i7:
>>> Java serialization via Externalizable (average): 22,551 ms
>>> Kryo serialization (average): 17,300 ms
>>> GridGain serialization (average): 2,937 ms
Here is the test itself. This test is included with our product, so feel free to download it and try for yourself. 
public class SerializationBenchmark {
    /** Number of runs. */
    private static final int RUN_CNT = 3;

    /** Number of iterations. */
    private static final int ITER_CNT = 200000;

    public static void main(String[] args) throws Exception {
        // Create sample object.
        SampleObject obj = createObject();

        // Run Java serialization test.
        javaSerialization(obj);

        // Run Kryo serialization test.
        kryoSerialization(obj);

        // Run GridGain serialization test.
        gridGainSerialization(obj);
    }

    private static long javaSerialization(SampleObject obj) throws Exception {
        long avgDur = 0;

        for (int i = 0; i < RUN_CNT; i++) {
            SampleObject newObj = null;

            long start = System.currentTimeMillis();

            for (int j = 0; j < ITER_CNT; j++) {
                ByteArrayOutputStream out = new ByteArrayOutputStream();

                ObjectOutputStream objOut = null;

                try {
                    objOut = new ObjectOutputStream(out);

                    objOut.writeObject(obj);
                }
                finally {
                    U.close(objOut, null);
                }

                ObjectInputStream objIn = null;

                try {
                    objIn = new ObjectInputStream(
                        new ByteArrayInputStream(out.toByteArray()));

                    newObj = (SampleObject)objIn.readObject();
                }
                finally {
                    U.close(objIn, null);
                }
            }

            long dur = System.currentTimeMillis() - start;

            avgDur += dur;
        }

        avgDur /= RUN_CNT;

        System.out.format("\n>>> Java serialization via Externalizable (average): %,d ms\n\n", avgDur);

        return avgDur;
    }

    private static long kryoSerialization(SampleObject obj) throws Exception {
        Kryo marsh = new Kryo();

        long avgDur = 0;

        for (int i = 0; i < RUN_CNT; i++) {
            SampleObject newObj = null;

            long start = System.currentTimeMillis();

            for (int j = 0; j < ITER_CNT; j++) {
                ByteArrayOutputStream out = new ByteArrayOutputStream();

                Output kryoOut = null;

                try {
                    kryoOut = new Output(out);

                    marsh.writeObject(kryoOut, obj);
                }
                finally {
                    U.close(kryoOut, null);
                }

                Input kryoIn = null;

                try {
                    kryoIn = new Input(new ByteArrayInputStream(out.toByteArray()));

                    newObj = marsh.readObject(kryoIn, SampleObject.class);
                }
                finally {
                    U.close(kryoIn, null);
                }
            }

            long dur = System.currentTimeMillis() - start;

            avgDur += dur;
        }

        avgDur /= RUN_CNT;

        System.out.format("\n>>> Kryo serialization (average): %,d ms\n\n", avgDur);

        return avgDur;
    }

    private static long gridGainSerialization(SampleObject obj) throws Exception {
        GridMarshaller marsh = new GridOptimizedMarshaller(false, 
            Arrays.asList(SampleObject.class.getName()), null);

        long avgDur = 0;

        for (int i = 0; i < RUN_CNT; i++) {
            SampleObject newObj = null;

            long start = System.currentTimeMillis();

            for (int j = 0; j < ITER_CNT; j++)
                newObj = marsh.unmarshal(marsh.marshal(obj), null);

            long dur = System.currentTimeMillis() - start;

            avgDur += dur;
        }

        avgDur /= RUN_CNT;

        System.out.format("\n>>> GridGain serialization (average): %,d ms\n\n", avgDur);

        return avgDur;
    }

    private static SampleObject createObject() {
        long[] longArr = new long[3000];

        for (int i = 0; i < longArr.length; i++)
            longArr[i] = i;

        double[] dblArr = new double[3000];

        for (int i = 0; i < dblArr.length; i++)
            dblArr[i] = 0.1 * i;

        return new SampleObject(123, 123.456f, (short)321, longArr, dblArr);
    }

    private static class SampleObject 
        implements Externalizable, KryoSerializable {
        private int intVal;
        private float floatVal;
        private Short shortVal;
        private long[] longArr;
        private double[] dblArr;
        private SampleObject selfRef;

        public SampleObject() {}

        SampleObject(int intVal, float floatVal, Short shortVal, 
            long[] longArr, double[] dblArr) {
            this.intVal = intVal;
            this.floatVal = floatVal;
            this.shortVal = shortVal;
            this.longArr = longArr;
            this.dblArr = dblArr;

            selfRef = this;
        }

        // Required by Java Externalizable.
        @Override public void writeExternal(ObjectOutput out) 
            throws IOException {
            out.writeInt(intVal);
            out.writeFloat(floatVal);
            out.writeShort(shortVal);
            out.writeObject(longArr);
            out.writeObject(dblArr);
            out.writeObject(selfRef);
        }

        // Required by Java Externalizable.
        @Override public void readExternal(ObjectInput in) 
         throws IOException, ClassNotFoundException {
            intVal = in.readInt();
            floatVal = in.readFloat();
            shortVal = in.readShort();
            longArr = (long[])in.readObject();
            dblArr = (double[])in.readObject();
            selfRef = (SampleObject)in.readObject();
        }

        // Required by Kryo serialization.
        @Override public void write(Kryo kryo, Output out) {
            kryo.writeObject(out, intVal);
            kryo.writeObject(out, floatVal);
            kryo.writeObject(out, shortVal);
            kryo.writeObject(out, longArr);
            kryo.writeObject(out, dblArr);
            kryo.writeObject(out, selfRef);
        }

        // Required by Kryo serialization.
        @Override public void read(Kryo kryo, Input in) {
            intVal = kryo.readObject(in, Integer.class);
            floatVal = kryo.readObject(in, Float.class);
            shortVal = kryo.readObject(in, Short.class);
            longArr = kryo.readObject(in, long[].class);
            dblArr = kryo.readObject(in, double[].class);
            selfRef = kryo.readObject(in, SampleObject.class);
        }
    }
}

25 comments:

Jakob Skov said...

Nice post! Is it possible to use this serializer outside GridGain?

Dmitriy Setrakyan said...
This comment has been removed by the author.
Praveen said...
This comment has been removed by the author.
Anonymous said...

Nice.
I guess if you try to serialize 200k different objects. Instead of serializing same instance 200k times results would be slightly different.

Dmitriy Setrakyan said...

If you are implying that some caching happens between serialization attempts, then no - different objects would not make any difference.

Dmitriy Setrakyan said...

Jacob, the marshaller will work outside of GridGain. We are thinking internally about open sourcing it.

Stay tuned.

Anonymous said...

Can you try to test with objects which have a lot of fields of basic types or custom types, but not so many arrays? IMHO, right now you benefit too much from arrays serialization.

Anonymous said...

Kryo has a pending patch, which provides the same kind of optimization. More over, it seems to be faster than your implementation. It shows a sub millisecond performance per iteration.

For more information, please have a look at:
http://code.google.com/p/kryo/issues/detail?id=75

Dmitriy Setrakyan said...

I don't see how Kryo is remotely even close to GridGain marshaling on performance. Don you see numbers in the blog?

You can try yourself and see. Just download GridGain and run the benchmark, it's included into the product.

Roman said...

Why do you allocate new buffers/streams in each iteration for Java and Kryo, but not for GridGain?

Why do you have exception handling code only for Java and Kryo, but not for GridGain?

Couldn't those factors influence the performance results as well?

Dmitriy Setrakyan said...

Buffer allocation is identical for all tests - GridGain marshaller just has different API and does it internally.

I highly doubt exception handling would make any difference simply because no exceptions are thrown in the test.

Leo said...

Sorry if this comment will be posted multiple times. I tried a few times and after submission it seems to appear in the list of comments and then disappears somehow.

I don't see how Kryo is remotely
even close to GridGain marshaling
on performance.


I meant the a version of Kryo with the mentioned patch. Vanilla Kryo is really rather slow on your use-case.

Don you see numbers in the blog?

Yes, I've seen them. And this was the initial reason for looking deeper into it.

Leo said...

You can try yourself and see.
Just download GridGain and run the benchmark, it's included into the product.


I did exactly that before I posted my initial comment. I tool your test code, disabled the GridGain part (as I was too lazy to download the product for initial testing), and tested first with Java and Kryo, to see how it performs. Kryo number were roughly like the ones you reported. It was a bit faster than your results (about 12 or 13 msec per iteration), but still much slower than GridGain serialization results that you reported.

Leo said...

Then I applied the mentioned Kryo patch and started using UnsafeInput and UnsafeOutput classes instead of Input and Output. I got a sub-millisecond performance for your tests with this patched version of Kryo.

You don't need to believe me. Just take Kryo, apply this patch and check the results.
(disclosure: I'm the author of this Kryo patch)

If you're interested in more details or comparisons, please feel free to ask questions here or even better on the corresponding Kryo issue discussion that was mentioned in my original post.

Dmitriy Setrakyan said...

Leo, you should test everything on the same box, otherwise comparison is not valid. So far I see that you have tested Java and Kryo, but have not tested GridGain - hardly a valid comparison.

In any case, if Kryo has gotten closer to GridGain marshalling results with latest patch, good for them.

Leo said...

I followed your advice and tested GridGain and Kryo on the same box (Vista, Intel i5, M520, Dual-Core, HT, 2.4 GHz)

Explanations:
"kryo unmodified serialization" - your version of code for Kryo

"Kryo serialization without try-catch, reusing byte buffer" - your version with try-catch removed and (re)using byte arrays instead of ByteArrayOutputStream

"Kryo Unsafe-based serialization without try-catch, reusing byte buffer" -
your version with try-catch removed and (re)using byte arrays instead of ByteArrayOutputStream. Used Unsafe-based Input and Output classes from the patched version of Kryo

>>> Kryo unmodified serialization (average): 169,187 ms

>>> Kryo serialization without try-catch, reusing byte buffer (average): 119,886 ms

>>> Kryo Unsafe-based serialization without try-catch, reusing byte buffer (average): 11,018 ms

>>> GridGain serialization (average): 16,455 ms

Dmitriy Setrakyan said...

Leo, we actually went ahead and tried what you suggested.

1 . Removing buffer creation is not really fair, because GridGain does create buffer internally every time.

2. Removing try/catch made *absolutely* no difference and it shouldn't... I am really not sure what is happening in your environment in this regard.

3. We could not apply Kryo patch simply because I don't think it is accessible to general public yet. Let us know when it is available and we will try it.

Anonymous said...

Try to test a more complex object with ~ some collection types like List, Map - Kryo will be considerably faster...

Rüdiger Möller said...

WTF, if i have to implement externalizable its not really serialization. Its handcrafted and each time you change a class you have to adjust the externalize methods ..

Would be interesting how
http://code.google.com/p/fast-serialization/

compares against this. Its at least completely generic and faster than kryo without having to manually implement externalizable.

Dmitriy Setrakyan said...

You don't have to implement Externalizable with GridGain. Just for comparison with Kryo we implemented Externalizable because Kryo does something similar.

If you don't implement Externalizable, GridGain is still an order of magnitude faster than Java.

Rüdiger Möller said...

Ok, I'd happily add GridGain to my benchmark .. where can i download it ?

current results:
http://fast-serialization.googlecode.com/files/result-v1.1.html

Dmitriy Setrakyan said...

From the website: www.gridgain.com

Take a look at GridSerializationBenchmark in benchmarks folder shipped with the product.

Rüdiger Möller said...

Hum .. IMO your benchmark is somewhat flawed as you measure creation of output and input buffers. You can reuse them to avoid reallocating the byte arrays each time. Additionally you use getByteArray which additionally creates another copy. It is pretty obvious, that actual serialization code (at least for kryo) takes < 20% of your measured benchmark time ..

Dmitriy Setrakyan said...

I think the code provided reflects the most common user pattern as reusing byte-arrays in real application is absolutely non-trivial task. The benchmark is shipped with the product. You can play with code as you like and see if you get better results.

Rüdiger Möller said...

Ok, i added gridgain to benchmarks ..

http://fast-serialization.googlecode.com/files/result-v1.1.html

may be a hard truth for you :-), consider switching to fast-serialization, i'd happily reference you as a reference project.

fast-serialization is used currently in a big company on their flagship system, however i have a non-disclosure with them.