Big arrays in Java

Java's built-in arrays have one limitation: you cannot create arrays which can hold more than Integer.MAX_VALUE elements. This might be sufficient for most use cases, but if you want to do some big data computations, it might become necessary to hold more than ~2.4 billion elements.

Luckily, the Java SDK provides a class named sun.misc.Unsafe which can be used to allocate memory directly like you would in C's malloc. Access to this class is meant for internal JDK use only and therefore restricted, but we can use a little reflection hack to obtain the instance:

We can now wrap some of the methods provided by Unsafe to simulate an array which takes a long value in its constructor to specify its size:

We can now use our big array like this:

Some points to mention here:

  • This works both in Java 6 and 7. I haven't tested any older versions
  • Laurent pointed out in the comments that sun.misc.Unsafe is only available in the Sun JDK and OpenJDK. This code probably breaks in other JVMs like JRockit or J9.
  • The use of sun.misc.Unsafe is discouraged and not part of the official Java API, so this code might break in future Java versions. However, sun.misc.Unsafe is heavily used in Java's internal libraries, so I personally think it's safe to use for some reasonable amount of time.
  • Memory allocated like this isn't managed by the garbage collector, so you have to make sure to call freeMemory(...) at the end as you would in C.
  • Java arrays do bounds checking on each access and throw an exception if you violate them. Our big array doesn't do anything like that, but you can simulate that as well, of course. One might think the absence of bound checking improves access performance, but my tests didn't show any significant speed ups.

You can find the full source code here. Enjoy!

There are more cool things you can do with sun.misc.Unsafe, check out this blog post.

EDIT There is a discussion on reddit about this post going on.

comments powered by Disqus