Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove

This article is outdated! A newer version covering the latest versions of collections libraries is available here.

04 Jan 2015 update: a couple of clarifications, fixed a bug in FastUtil Object-int test – now it got much faster (thanks to Sebastiano Vigna for his suggestions).

Introduction

This article will give you an overview of hash map implementations in 5 well known libraries and JDK HashMap as a baseline. We will test separately:

Primitive to primitive maps
Primitive to object maps
Object to primitive maps
Object to Object maps (JDK participates only in this section)

This article will overview a single test – map read access for a random set of keys (a set of keys is shared for all collections of a given capacity).

We will also pay attention to the way the data is stored inside these collections and to some pretty interesting implementation details.

Participants

JDK 8

JDK HashMap is the oldest hash map implementation in this test. It got a couple of major updates recently – a shared underlying storage for the empty maps in Java 7u40 and a possibility to convert underlying hash bucket linked lists into tree maps (for better worse case performance) in Java 8.

FastUtil 6.5.15

FastUtil provides a developer a set of all 4 options listed above (all combinations of primitives and objects). Besides that, there are several other types of maps available for each parameter type combination: array map, AVL tree map and RB tree map. Nevertheless, we are only interested in hash maps in this article.

Goldman Sachs Collections 5.1.0

Goldman Sachs has open sourced its collections library about 3 years ago. In my opinion, this library provides the widest range of collections out of box (if you need them). You should definitely pay attention to it if you need more than a hash map, tree map and a list for your work For the purposes of this article, GS collections provide a normal, synchronized and unmodifiable versions of each hash map. The last 2 are just facades for the normal map, so they don’t provide any performance advantages.

HPPC 0.6.1

HPPC provides array lists, array dequeues, hash sets and hash maps for all primitive types. HPPC provides normal hash maps for primitive keys and both normal and identity hash maps for object keys.

Koloboke 0.6

Koloboke is the youngest of all libraries in this article. It is developed as a part of an OpenHFT project by Roman Leventov. This library currently provides hash maps and hash sets for all primitive/object combinations. This library was recently renamed from HFTC, so some artifacts in my tests will still use the old library name.

Trove 3.0.3

Trove is available for a long time and quite stable. Unfortunately, not much development is happening in this project at the moment. Trove provides you the list, stack, queue, hash set and map implementations for all primitive/object combinations. I have already written about Trove.

Data storage implementations and tests

This article will look at 4 different sorts of maps:

int–int
int–Integer
Integer–int
Integer–Integer

Let’s see how the data is stored in each kind of those maps. We will refer to the test names instead of the actual implementation names, because a lot of those implementations are called very similarly and it’s not easy to distinguish them by name. After looking at the implementation details, we will check how they affect the actual test results.

We will use JMH 1.0 for testing. Here is the test description: for each map size in (10K, 100K, 1M, 10M, 100M) (outer loop) generate a set of random keys (they will be used for each test at a given map size) and then run a test for each map implementations (inner loop). Each test will be run 100M / map_size times (so that we will call map.get 100M times for each test case).

In setup: Take a set of int keys and required fill factor
Initialize a map with a given fill factor and capacity = number of keys
Populate a map with keys and values = keys
Store a reference to the keys array or convert it into Integer[] for tests with object keys (nevertheless, use the same keys)

All tests are nearly identical – get stored values for an array of keys and use these values, so that JVM will not optimize out your code:

public int runRandomTest() {
    int res = 0;
    for ( int i = 0; i < m_keys.length; ++i )
        res = res ^ m_map.get( m_keys[ i ] );
    return res;
}

public int runRandomTest() {
    int res = 0;
    for ( int i = 0; i < m_keys.length; ++i )
        res = res ^ m_map.get( m_keys[ i ] );
    return res;
}

int-int

tests.maptests.primitive.FastUtilMapTest	int[] keys, int[] values, boolean[] used
tests.maptests.primitive.GsMutableMapTest	int[] keys, int[] values
tests.maptests.primitive.HftcMutableMapTest	long[] (key-low bits, value-high bits)
tests.maptests.primitive.HppcMapTest	int[] keys, int[] values, boolean[] allocated
tests.maptests.primitive.TroveMapTest	int[] _set, int[] _values, byte[] _states

As you can see, FastUtil, HPPC and Trove use identical storage, so you may expect the similar performance from them.

Handling of empty and removed cells in GS collections and Koloboke

GS collections use just keys and values arrays. If you have ever looked at the hash map implementations, you should know that a map should at least distinguish empty cells from the occupied ones (some maps also use "removed cell" marker). How could you achieve such functionality without extra storage? GS IntIntHashMap uses a companion sentinel object containing values for key=0 (empty cell) and key=1 (removed key). All operations on keys=0 or 1 are done on the sentinel object. Such an object allows GS IntIntHashMap to use O(1) storage for flags instead of O(capacity). This also allows you to access only 2 cells of memory instead of 3, which makes this implementation faster.

Koloboke int-int map (the actual name is hidden behind the factories and may change) is going even further. First of all, in some cases it uses an array of longer datatype as storage, which is capable to keep both key and value in one element. int-int map is an example of such approach: a key is stored in the low 32 bits of a long cell and a value is stored in the high 32 bits. Such a layout means only one cache line miss in case of the cold data access instead of 2 (GS collections) or 3 (all other).

Koloboke uses a different technique for marking non-used entries. When a map is initialized, it picks a random int and uses it as a free cell marker. If you try to insert a key = free cell marker, it picks another random value, which is not present in the map and so on. It means that Koloboke uses just 4 bytes overhead for handling empty nodes and does it in the extremely efficient way.

In general such approach does not impose any performance penalties unless your map size is getting close to the number of values in a given datatype. You may want to think what will happen in case of smaller key data types? You will get a HashOverflowException defined in koloboke-api library if you will attempt to add all datatype values into a map. You can use the following test to reproduce it:

HashByteIntMap m = HashByteIntMaps.newMutableMap( 256 );
for ( int i = Byte.MIN_VALUE; i < Byte.MAX_VALUE; ++i )
{
    final byte key = (byte) i;
    m.put( key, i );
}
m.put( Byte.MAX_VALUE, 127 );   //exception will be thrown here
System.out.println( m.size() );

HashByteIntMap m = HashByteIntMaps.newMutableMap( 256 );
for ( int i = Byte.MIN_VALUE; i < Byte.MAX_VALUE; ++i )
{
    final byte key = (byte) i;
    m.put( key, i );
}
m.put( Byte.MAX_VALUE, 127 );   //exception will be thrown here
System.out.println( m.size() );

Nevertheless, this should not be an issue in the real life. If you want to map every / most of byte/char/short into some value, you'd better use an array of value type indexed by keys.

int-int Test results

Each of test sections will start with a result table followed by a chart. The first line in a table is a map size. All test results are in milliseconds.

	10000	100000	1000000	10000000	100000000
tests.maptests.primitive.HftcMutableMapTest	955	1324	1871	4198	3805
tests.maptests.primitive.HftcImmutableMapTest	941	1335	1807	4194	3793
tests.maptests.primitive.HftcUpdateableMapTest	949	1314	1836	4183	3799
tests.maptests.primitive.GsMutableMapTest	977	1883	3322	6256	7754
tests.maptests.primitive.GsImmutableMapTest	997	1895	3279	6201	7786
tests.maptests.primitive.FastUtilMapTest	1045	1590	3776	7655	10095
tests.maptests.primitive.HppcMapTest	1021	1580	3693	7612	10086
tests.maptests.primitive.TroveMapTest	1775	2642	5137	10799	13834

As you can see, libraries got split into 4 distinctly different groups (fastest to slowest):

Koloboke shows the best results: using a single long[] for storage and a clever trick of a random free cell values gives its results. All 3 versions of Koloboke collections are showing exactly the same result in this test (it does not mean they will be equally fast in other tests as well).
GS collections implementation is the second fastest - using 2 arrays instead of 3 as well as good code quality pays off here.
FastUtil and HPPC are showing exactly the same performance (less than 2% difference).
Trove is the the slowest implementation in this test, being about 2 times slower than Koloboke on most of map sizes, but becoming even more slower on huge maps sizes (10M+).

Note that Koloboke works faster on 100M map rather than on 10M map. According to Roman Leventov email, this happens due to bigger fill factor chosen for a map(size=10M) than for a map(size=100M). You will see the similar difference in Object-Object test results.

int-Object

tests.maptests.prim_object.FastUtilIntObjectMapTest	int[] key, Object[] value, boolean[] used
tests.maptests.prim_object.GsIntObjectMapTest	int[] keys, Object[] values
tests.maptests.prim_object.HftcIntObjectMapTest	int[] keys, Object[] values
tests.maptests.prim_object.HppcIntObjectMapTest	int[] keys, Object[] values, boolean[] allocated
tests.maptests.prim_object.TroveIntObjectMapTest	int[] _set, Object[] _values, byte[] _states

No surprises here: FastUtil, HPPC and Trove are using 3 arrays (including an array of cell states). GS collections and Koloboke are using 2 arrays and the tricks similar to the listed above for the special cases.

int-Object test results

	10000	100000	1000000	10000000	100000000
tests.maptests.prim_object.HftcIntObjectMapTest	1223	1358	3034	6187	7064
tests.maptests.prim_object.FastUtilIntObjectMapTest	1213	1746	4112	7902	10595
tests.maptests.prim_object.GsIntObjectMapTest	1764	2658	4310	7775	9715
tests.maptests.prim_object.HppcIntObjectMapTest	1666	1725	4083	8447	12202
tests.maptests.prim_object.TroveIntObjectMapTest	1987	2835	5812	11269	14265

There are 3 groups in this test (fastest to slowest):

Koloboke is the fastest one due to using only 2 arrays and simpler code for the empty cells case.
It is followed by GS collections (which did not manage to use the advantage of 2 storage arrays instead of 3), FastUtil and HPPC. Their results slightly vary in different tests, but they are relatively close to each other.
Trove is the slowest again, losing 1.5 to 2 times to Koloboke.

Object-int

tests.maptests.object_prim.FastUtilObjectIntMapTest	Object[] key, int[] value, boolean[] used
tests.maptests.object_prim.GsObjectIntMapTest	Object[] keys, int[] values
tests.maptests.object_prim.HftcObjectIntMapTest	Object[] keys, int[] values
tests.maptests.object_prim.HppcObjectIntMapTest	Object[] keys, int[] values, boolean[] allocated
tests.maptests.object_prim.TroveObjectIntMapTest	Object[] _set, int[] _values

FastUtil and HPPC are using the third array in case of Object keys. This seems to be a bad idea, because you can always use a private sentinel object as a flag in case of Object keys. We will see the actual performance a bit below.

GS collections, Koloboke and Trove are using 2 arrays, so we should expect them to be a little faster.

Object-int test results

	10000	100000	1000000	10000000	100000000
tests.maptests.object_prim.HftcObjectIntMapTest	1775	1781	4320	8567	8962
tests.maptests.object_prim.GsObjectIntMapTest	1598	2876	6214	8467	11700
tests.maptests.object_prim.FastUtilObjectIntMapTest	1599	2614	6151	9273	15146
tests.maptests.object_prim.HppcObjectIntMapTest	2297	2687	6077	10788	17425
tests.maptests.object_prim.TroveObjectIntMapTest	2550	3286	5837	11804	14324

There are 2 groups in this test, though the groups are not that distinctive as before (fastest to slowest):

Koloboke is faster than other implementations with the exceptions of 10K map, where it is slower than both GS collections and FastUtil and 10M, where it is slower than GS collections (yeah, the same problem with too big fill factor which was mentioned above).
Other collections behave similarly to each other until map size = 1M. After that we can see that GS collections are getting faster than others, and it is followed by FastUtil.

Object-Object

tests.maptests.object.FastUtilObjMapTest	Object[] keys, Object[] values, boolean[] used
tests.maptests.object.GsObjMapTest	Object[] table - interleaved keys and values
tests.maptests.object.HftcMutableObjTest	Object[] tab - interleaved keys and values
tests.maptests.object.HppcObjMapTest	Object[] keys, Object[] values, boolean[] allocated
tests.maptests.object.JdkMapTest	Node<K,V>[] table - each Node could be a part of a linked list or a TreeMap (Java 8)
tests.maptests.object.TroveObjMapTest	Object[] _set, Object[] _values

In case of Object-to-Object mappings we have a more complex picture:

FastUtil and HPPC are using 3 arrays per map. Nothing fancy.
JDK HashMap is the only map which stores entries in the Node objects, which combine a key and a value. It means you have at least 24 bytes of overhead per entry. The actual overhead are 32 bytes because each bucket in a HashMap is a double linked list, so each entry has 2 extra pointers.
Trove is using 2 maps (and a special sentinel object for empty cells).
Finally, GS collections and Koloboke are using a single array with interleaved keys and values, which makes them most CPU cache friendly collections of these 6.

Now, armed with the implementation knowledge, let's test the maps performance.

Object-Object test results

	10000	100000	1000000	10000000	100000000
tests.maptests.object.HftcMutableObjTest	1146	1378	2928	6215	5945
tests.maptests.object.JdkMapTest	1151	1776	3759	5341	11523
tests.maptests.object.GsObjMapTest	1566	2242	4582	6012	8110
tests.maptests.object.FastUtilObjMapTest	1720	3002	6015	9360	13292
tests.maptests.object.HppcObjMapTest	1726	3085	5692	9125	13139
tests.maptests.object.TroveObjMapTest	2065	2979	5713	10266	12631

This test results are even less clear.

There is Koloboke which is generally faster than JDK HashMap, but the difference is not that big except the case of huge maps, where Koloboke wins.
GS collections is close to Koloboke and JDK on the large and huge maps, but sufficiently far in case of smaller maps.
Finally there FastUtil, HPPC and Trove with approximately the same performance for all map sizes.

One billion entries test

I decided to see what will happen to these collections if I will try to create a map with a requested size of one billion entries and fill factor = 0.5, which means that all these maps will have to allocate an array very close to the maximal allowed array length = 2³¹.

FastUtil, HPPC and GS collections have failed with various exceptions (not OOM - I have allocated 110G RAM for this test).

Koloboke, Trove and JDK managed to pass these tests. Unfortunately, I dod not manage to run some of these tests successfully in JMH, so they were run by a separate code.

Here are the test results (if you want to compare them to the previous results, multiply the previous results by 10, because all previous tests called map.get 100M times in total):

tests.maptests.primitive.HftcMutableMapTest : time = 95.05 sec
tests.maptests.primitive.TroveMapTest : time = 235.062 sec

tests.maptests.prim_object.HftcIntObjectMapTest : time = 216.361 sec
tests.maptests.prim_object.TroveIntObjectMapTest : time = 304.019 sec

tests.maptests.object_prim.HftcObjectIntMapTest : time = 335.139 sec
tests.maptests.object_prim.TroveObjectIntMapTest : time = 217.412 sec

tests.maptests.object.HftcMutableObjTest : time = 272.792 sec
tests.maptests.object.JdkMapTest : time = 163.335 sec
tests.maptests.object.TroveObjMapTest : time = 239.133 sec

As you can see, Koloboke wins by a large margin in the primitive-to-primitive test. It is also significantly faster in primitive-to-object test.

In case of object-to-primitive test Koloboke took significantly longer than Trove to complete.

Finally, for object-to-object test, I had to change Koloboke map initialization code, because by default it started to degrade extremely quickly once I have added half a billion elements into it:

1	HashObjObjMaps.getDefaultFactory().withHashConfig(HashConfig.fromLoads(0.5, 0.6, 0.8)).newMutableMap(keys.length)

HashObjObjMaps.getDefaultFactory().withHashConfig(HashConfig.fromLoads(0.5, 0.6, 0.8)).newMutableMap(keys.length)

Koloboke 2.0?

Roman Leventov has just announced that he is considering to implement a newer and even faster version of Koloboke library, but he needs your feedback. Do you mind to write him a line?

Summary

Koloboke has turned out to be the fastest and the most memory efficient library implementing hash maps. This library is too young and not widely used yet, but why don't give it a try?
If you are looking for a more stable and mature library (and willing to sacrifice some performance), you should probably look at GS collections library. Unlike Koloboke, it gives you a wide range of collections out of box.

Source code

The article source code is now hosted at GitHub: https://github.com/mikvor/hashmapTest. You may expect that the test set would be slightly ahead of this article

Please note you should run this project via tests.MapTestRunner class:

mvn clean install
java -cp target/benchmarks.jar tests.MapTestRunner

The post Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove appeared first on Java Performance Tuning Guide.