This article is outdated! A newer version covering the latest versions of collections libraries is available here.
04 Jan 2015 update: a couple of clarifications, fixed a bug in FastUtil Object-int test – now it got much faster (thanks to Sebastiano Vigna for his suggestions).
Introduction
This article will give you an overview of hash map implementations in 5 well known libraries and JDK HashMap
as a baseline. We will test separately:
- Primitive to primitive maps
- Primitive to object maps
- Object to primitive maps
- Object to Object maps (JDK participates only in this section)
This article will overview a single test – map read access for a random set of keys (a set of keys is shared for all collections of a given capacity).
We will also pay attention to the way the data is stored inside these collections and to some pretty interesting implementation details.
Participants
JDK 8
JDK HashMap
is the oldest hash map implementation in this test. It got a couple of major updates recently – a shared underlying storage for the empty maps in Java 7u40 and a possibility to convert underlying hash bucket linked lists into tree maps (for better worse case performance) in Java 8.
FastUtil 6.5.15
FastUtil provides a developer a set of all 4 options listed above (all combinations of primitives and objects). Besides that, there are several other types of maps available for each parameter type combination: array map, AVL tree map and RB tree map. Nevertheless, we are only interested in hash maps in this article.
Goldman Sachs Collections 5.1.0
Goldman Sachs has open sourced its collections library about 3 years ago. In my opinion, this library provides the widest range of collections out of box (if you need them). You should definitely pay attention to it if you need more than a hash map, tree map and a list for your work Image may be NSFW.
Clik here to view. For the purposes of this article, GS collections provide a normal, synchronized and unmodifiable versions of each hash map. The last 2 are just facades for the normal map, so they don’t provide any performance advantages.
HPPC 0.6.1
HPPC provides array lists, array dequeues, hash sets and hash maps for all primitive types. HPPC provides normal hash maps for primitive keys and both normal and identity hash maps for object keys.
Koloboke 0.6
Koloboke is the youngest of all libraries in this article. It is developed as a part of an OpenHFT project by Roman Leventov. This library currently provides hash maps and hash sets for all primitive/object combinations. This library was recently renamed from HFTC, so some artifacts in my tests will still use the old library name.
Trove 3.0.3
Trove is available for a long time and quite stable. Unfortunately, not much development is happening in this project at the moment. Trove provides you the list, stack, queue, hash set and map implementations for all primitive/object combinations. I have already written about Trove.
Data storage implementations and tests
This article will look at 4 different sorts of maps:
int
–int
int
–Integer
Integer
–int
Integer
–Integer
Let’s see how the data is stored in each kind of those maps. We will refer to the test names instead of the actual implementation names, because a lot of those implementations are called very similarly and it’s not easy to distinguish them by name. After looking at the implementation details, we will check how they affect the actual test results.
We will use JMH 1.0 for testing. Here is the test description: for each map size in (10K, 100K, 1M, 10M, 100M) (outer loop) generate a set of random keys (they will be used for each test at a given map size) and then run a test for each map implementations (inner loop). Each test will be run 100M / map_size
times (so that we will call map.get
100M times for each test case).
-
In setup: Take a set of
int
keys and required fill factor - Initialize a map with a given fill factor and capacity = number of keys
- Populate a map with keys and values = keys
-
Store a reference to the keys array or convert it into
Integer[]
for tests with object keys (nevertheless, use the same keys) -
All tests are nearly identical – get stored values for an array of keys and use these values, so that JVM will not optimize out your code:
1 2 3 4 5 6
public int runRandomTest() { int res = 0; for ( int i = 0; i < m_keys.length; ++i ) res = res ^ m_map.get( m_keys[ i ] ); return res; }
public int runRandomTest() { int res = 0; for ( int i = 0; i < m_keys.length; ++i ) res = res ^ m_map.get( m_keys[ i ] ); return res; }
int-int
tests.maptests.primitive.FastUtilMapTest | int[] keys, int[] values, boolean[] used |
tests.maptests.primitive.GsMutableMapTest | int[] keys, int[] values |
tests.maptests.primitive.HftcMutableMapTest | long[] (key-low bits, value-high bits) |
tests.maptests.primitive.HppcMapTest | int[] keys, int[] values, boolean[] allocated |
tests.maptests.primitive.TroveMapTest | int[] _set, int[] _values, byte[] _states |
As you can see, FastUtil, HPPC and Trove use identical storage, so you may expect the similar performance from them.
Handling of empty and removed cells in GS collections and Koloboke
GS collections use just keys and values arrays. If you have ever looked at the hash map implementations, you should know that a map should at least distinguish empty cells from the occupied ones (some maps also use "removed cell" marker). How could you achieve such functionality without extra storage? GS IntIntHashMap
uses a companion sentinel object containing values for key=0 (empty cell) and key=1 (removed key). All operations on keys=0 or 1 are done on the sentinel object. Such an object allows GS IntIntHashMap
to use O(1) storage for flags instead of O(capacity). This also allows you to access only 2 cells of memory instead of 3, which makes this implementation faster.
Koloboke int
-int
map (the actual name is hidden behind the factories and may change) is going even further. First of all, in some cases it uses an array of longer datatype as storage, which is capable to keep both key and value in one element. int
-int
map is an example of such approach: a key is stored in the low 32 bits of a long
cell and a value is stored in the high 32 bits. Such a layout means only one cache line miss in case of the cold data access instead of 2 (GS collections) or 3 (all other).
Koloboke uses a different technique for marking non-used entries. When a map is initialized, it picks a random int
and uses it as a free cell marker. If you try to insert a key = free cell marker, it picks another random value, which is not present in the map and so on. It means that Koloboke uses just 4 bytes overhead for handling empty nodes and does it in the extremely efficient way.
In general such approach does not impose any performance penalties unless your map size is getting close to the number of values in a given datatype. You may want to think what will happen in case of smaller key data types? You will get a HashOverflowException
defined in koloboke-api library if you will attempt to add all datatype values into a map. You can use the following test to reproduce it:
1 2 3 4 5 6 7 8 HashByteIntMap m = HashByteIntMaps.newMutableMap( 256 ); for ( int i = Byte.MIN_VALUE; i < Byte.MAX_VALUE; ++i ) { final byte key = (byte) i; m.put( key, i ); } m.put( Byte.MAX_VALUE, 127 ); //exception will be thrown here System.out.println( m.size() );HashByteIntMap m = HashByteIntMaps.newMutableMap( 256 ); for ( int i = Byte.MIN_VALUE; i < Byte.MAX_VALUE; ++i ) { final byte key = (byte) i; m.put( key, i ); } m.put( Byte.MAX_VALUE, 127 ); //exception will be thrown here System.out.println( m.size() );
Nevertheless, this should not be an issue in the real life. If you want to map every / most of byte/char/short
into some value, you'd better use an array of value type indexed by keys.
int-int Test results
Each of test sections will start with a result table followed by a chart. The first line in a table is a map size. All test results are in milliseconds.
10000 | 100000 | 1000000 | 10000000 | 100000000 | |
tests.maptests.primitive.HftcMutableMapTest | 955 | 1324 | 1871 | 4198 | 3805 |
tests.maptests.primitive.HftcImmutableMapTest | 941 | 1335 | 1807 | 4194 | 3793 |
tests.maptests.primitive.HftcUpdateableMapTest | 949 | 1314 | 1836 | 4183 | 3799 |
tests.maptests.primitive.GsMutableMapTest | 977 | 1883 | 3322 | 6256 | 7754 |
tests.maptests.primitive.GsImmutableMapTest | 997 | 1895 | 3279 | 6201 | 7786 |
tests.maptests.primitive.FastUtilMapTest | 1045 | 1590 | 3776 | 7655 | 10095 |
tests.maptests.primitive.HppcMapTest | 1021 | 1580 | 3693 | 7612 | 10086 |
tests.maptests.primitive.TroveMapTest | 1775 | 2642 | 5137 | 10799 | 13834 |
Image may be NSFW.
Clik here to view.
As you can see, libraries got split into 4 distinctly different groups (fastest to slowest):
-
Koloboke shows the best results: using a single
long[]
for storage and a clever trick of a random free cell values gives its results. All 3 versions of Koloboke collections are showing exactly the same result in this test (it does not mean they will be equally fast in other tests as well). - GS collections implementation is the second fastest - using 2 arrays instead of 3 as well as good code quality pays off here.
- FastUtil and HPPC are showing exactly the same performance (less than 2% difference).
- Trove is the the slowest implementation in this test, being about 2 times slower than Koloboke on most of map sizes, but becoming even more slower on huge maps sizes (10M+).
Note that Koloboke works faster on 100M map rather than on 10M map. According to Roman Leventov email, this happens due to bigger fill factor chosen for a map(size=10M) than for a map(size=100M). You will see the similar difference in Object-Object test results.
int-Object
tests.maptests.prim_object.FastUtilIntObjectMapTest | int[] key, Object[] value, boolean[] used |
tests.maptests.prim_object.GsIntObjectMapTest | int[] keys, Object[] values |
tests.maptests.prim_object.HftcIntObjectMapTest | int[] keys, Object[] values |
tests.maptests.prim_object.HppcIntObjectMapTest | int[] keys, Object[] values, boolean[] allocated |
tests.maptests.prim_object.TroveIntObjectMapTest | int[] _set, Object[] _values, byte[] _states |
No surprises here: FastUtil, HPPC and Trove are using 3 arrays (including an array of cell states). GS collections and Koloboke are using 2 arrays and the tricks similar to the listed above for the special cases.
int-Object test results
10000 | 100000 | 1000000 | 10000000 | 100000000 | |
tests.maptests.prim_object.HftcIntObjectMapTest | 1223 | 1358 | 3034 | 6187 | 7064 |
tests.maptests.prim_object.FastUtilIntObjectMapTest | 1213 | 1746 | 4112 | 7902 | 10595 |
tests.maptests.prim_object.GsIntObjectMapTest | 1764 | 2658 | 4310 | 7775 | 9715 |
tests.maptests.prim_object.HppcIntObjectMapTest | 1666 | 1725 | 4083 | 8447 | 12202 |
tests.maptests.prim_object.TroveIntObjectMapTest | 1987 | 2835 | 5812 | 11269 | 14265 |
Image may be NSFW.
Clik here to view.
There are 3 groups in this test (fastest to slowest):
- Koloboke is the fastest one due to using only 2 arrays and simpler code for the empty cells case.
- It is followed by GS collections (which did not manage to use the advantage of 2 storage arrays instead of 3), FastUtil and HPPC. Their results slightly vary in different tests, but they are relatively close to each other.
- Trove is the slowest again, losing 1.5 to 2 times to Koloboke.
Object-int
tests.maptests.object_prim.FastUtilObjectIntMapTest | Object[] key, int[] value, boolean[] used |
tests.maptests.object_prim.GsObjectIntMapTest | Object[] keys, int[] values |
tests.maptests.object_prim.HftcObjectIntMapTest | Object[] keys, int[] values |
tests.maptests.object_prim.HppcObjectIntMapTest | Object[] keys, int[] values, boolean[] allocated |
tests.maptests.object_prim.TroveObjectIntMapTest | Object[] _set, int[] _values |
FastUtil and HPPC are using the third array in case of Object keys. This seems to be a bad idea, because you can always use a private sentinel object as a flag in case of Object keys. We will see the actual performance a bit below.
GS collections, Koloboke and Trove are using 2 arrays, so we should expect them to be a little faster.
Object-int test results
10000 | 100000 | 1000000 | 10000000 | 100000000 | |
tests.maptests.object_prim.HftcObjectIntMapTest | 1775 | 1781 | 4320 | 8567 | 8962 |
tests.maptests.object_prim.GsObjectIntMapTest | 1598 | 2876 | 6214 | 8467 | 11700 |
tests.maptests.object_prim.FastUtilObjectIntMapTest | 1599 | 2614 | 6151 | 9273 | 15146 |
tests.maptests.object_prim.HppcObjectIntMapTest | 2297 | 2687 | 6077 | 10788 | 17425 |
tests.maptests.object_prim.TroveObjectIntMapTest | 2550 | 3286 | 5837 | 11804 | 14324 |
Image may be NSFW.
Clik here to view.
There are 2 groups in this test, though the groups are not that distinctive as before (fastest to slowest):
- Koloboke is faster than other implementations with the exceptions of 10K map, where it is slower than both GS collections and FastUtil and 10M, where it is slower than GS collections (yeah, the same problem with too big fill factor which was mentioned above).
- Other collections behave similarly to each other until map size = 1M. After that we can see that GS collections are getting faster than others, and it is followed by FastUtil.
Object-Object
tests.maptests.object.FastUtilObjMapTest | Object[] keys, Object[] values, boolean[] used |
tests.maptests.object.GsObjMapTest | Object[] table - interleaved keys and values |
tests.maptests.object.HftcMutableObjTest | Object[] tab - interleaved keys and values |
tests.maptests.object.HppcObjMapTest | Object[] keys, Object[] values, boolean[] allocated |
tests.maptests.object.JdkMapTest | Node<K,V>[] table - each Node could be a part of a linked list or a TreeMap (Java 8) |
tests.maptests.object.TroveObjMapTest | Object[] _set, Object[] _values |
In case of Object-to-Object mappings we have a more complex picture:
- FastUtil and HPPC are using 3 arrays per map. Nothing fancy.
-
JDK
HashMap
is the only map which stores entries in theNode
objects, which combine a key and a value. It means you have at least 24 bytes of overhead per entry. The actual overhead are 32 bytes because each bucket in aHashMap
is a double linked list, so each entry has 2 extra pointers. - Trove is using 2 maps (and a special sentinel object for empty cells).
- Finally, GS collections and Koloboke are using a single array with interleaved keys and values, which makes them most CPU cache friendly collections of these 6.
Now, armed with the implementation knowledge, let's test the maps performance.
Object-Object test results
10000 | 100000 | 1000000 | 10000000 | 100000000 | |
tests.maptests.object.HftcMutableObjTest | 1146 | 1378 | 2928 | 6215 | 5945 |
tests.maptests.object.JdkMapTest | 1151 | 1776 | 3759 | 5341 | 11523 |
tests.maptests.object.GsObjMapTest | 1566 | 2242 | 4582 | 6012 | 8110 |
tests.maptests.object.FastUtilObjMapTest | 1720 | 3002 | 6015 | 9360 | 13292 |
tests.maptests.object.HppcObjMapTest | 1726 | 3085 | 5692 | 9125 | 13139 |
tests.maptests.object.TroveObjMapTest | 2065 | 2979 | 5713 | 10266 | 12631 |
Image may be NSFW.
Clik here to view.
This test results are even less clear.
-
There is Koloboke which is generally faster than JDK
HashMap
, but the difference is not that big except the case of huge maps, where Koloboke wins. - GS collections is close to Koloboke and JDK on the large and huge maps, but sufficiently far in case of smaller maps.
- Finally there FastUtil, HPPC and Trove with approximately the same performance for all map sizes.
One billion entries test
I decided to see what will happen to these collections if I will try to create a map with a requested size of one billion entries and fill factor = 0.5, which means that all these maps will have to allocate an array very close to the maximal allowed array length = 231.
FastUtil, HPPC and GS collections have failed with various exceptions (not OOM - I have allocated 110G RAM for this test).
Koloboke, Trove and JDK managed to pass these tests. Unfortunately, I dod not manage to run some of these tests successfully in JMH, so they were run by a separate code.
Here are the test results (if you want to compare them to the previous results, multiply the previous results by 10, because all previous tests called map.get
100M times in total):
tests.maptests.primitive.HftcMutableMapTest : time = 95.05 sec tests.maptests.primitive.TroveMapTest : time = 235.062 sec tests.maptests.prim_object.HftcIntObjectMapTest : time = 216.361 sec tests.maptests.prim_object.TroveIntObjectMapTest : time = 304.019 sec tests.maptests.object_prim.HftcObjectIntMapTest : time = 335.139 sec tests.maptests.object_prim.TroveObjectIntMapTest : time = 217.412 sec tests.maptests.object.HftcMutableObjTest : time = 272.792 sec tests.maptests.object.JdkMapTest : time = 163.335 sec tests.maptests.object.TroveObjMapTest : time = 239.133 sec
As you can see, Koloboke wins by a large margin in the primitive-to-primitive test. It is also significantly faster in primitive-to-object test.
In case of object-to-primitive test Koloboke took significantly longer than Trove to complete.
Finally, for object-to-object test, I had to change Koloboke map initialization code, because by default it started to degrade extremely quickly once I have added half a billion elements into it:
1 HashObjObjMaps.getDefaultFactory().withHashConfig(HashConfig.fromLoads(0.5, 0.6, 0.8)).newMutableMap(keys.length)HashObjObjMaps.getDefaultFactory().withHashConfig(HashConfig.fromLoads(0.5, 0.6, 0.8)).newMutableMap(keys.length)
Koloboke 2.0?
Roman Leventov has just announced that he is considering to implement a newer and even faster version of Koloboke library, but he needs your feedback. Do you mind to write him a line?
Summary
- Koloboke has turned out to be the fastest and the most memory efficient library implementing hash maps. This library is too young and not widely used yet, but why don't give it a try?
- If you are looking for a more stable and mature library (and willing to sacrifice some performance), you should probably look at GS collections library. Unlike Koloboke, it gives you a wide range of collections out of box.
Source code
The article source code is now hosted at GitHub: https://github.com/mikvor/hashmapTest. You may expect that the test set would be slightly ahead of this article Image may be NSFW.
Clik here to view.
Please note you should run this project via tests.MapTestRunner class:
mvn clean install java -cp target/benchmarks.jar tests.MapTestRunner
The post Large HashMap overview: JDK, FastUtil, Goldman Sachs, HPPC, Koloboke, Trove appeared first on Java Performance Tuning Guide.