Sunday, November 7, 2010

ByteArrayInputStream is a not terribly efficient...

A question came my way on Friday: if Java New I/O (NIO) is so efficient, why doesn't using it to read a big XML file into memory and wrapping the result in a ByteArrayInputStream make parsing faster?

The answer lies in what ByteArrayInputStream actually does. It's nothing more than a thin wrapper around a byte[] that allows calling code to think it is dealing with an InputStream. But each call to read(byte[], int, int) causes a fragment of the array to be copied (so that the caller cannot modify the underlying array).

The problem is that by the time you've reached the end, you've copied the whole array. With large amounts of data, this can cause a lot of memory to be used and garbage collected and you'll soon start to see that savings of NIO have been more than offset by the activity of ByteArrayInputStream.

Afterall, one of the benefits of NIO is minimizing copying. From Ron Hitchen's excellent Java NIO:

"[in normal Java IO] The disk controller writes the data directly into a kernel memory buffer by DMA [Direct Memory Access] without further assistance from the main CPU. Once the disk controller finishes filling the buffer, the kernel copies the data from the temporary buffer in kernel space to the buffer specified by the process when it requested the read() operation... [C]opying from kernel space to the final user buffer seems like extra work."

(Java NIO, p13,14).

"By mapping a kernel space address to the same physical address as a virtual address in user space [as with NIO], the DMA hardware (which can access only physical memory addresses) can fill a buffer that is simultaneously visible to both the kernel and a user space process."

(ibid, p15).

No comments:

Post a Comment