Monday, February 27, 2006

Why a ConcurrentHashMap is so fast

We all know a ConcurrentHashMap is nearly as fast as a HashMap plus provided concurrency like a Hashtable. Some very ingenious coding has made this possible. To understand the hows and whys the key is the structure of the ConcurrentHashMap itself.

A ConcurrentHashMap contains a final array of Segment objects. Each segment extends ReentrantLock, and contains a transient volatile array of HashEntry objects. Each HashEntry object is made of final key, hash and next variables and a volatile content variable.

Every get/put/remove/add operation involves creating an index from the hashkey that maps to one of the available segment objects. The call is then delegated to the obtained segment instance.

Let take put call. The segment first locks itself (since extends from ReentrantLock). Then proceeds to check if the hashkey already exists, if it does, it updates the value of the volatile value field in the HashEntry object. Since it is a volatile variable, the new value is guaranteed to be seen by other threads in the jvm without any explicit need for synchronization. Voila ! Now if the key is not present then a new HashEntry object is created and added to the head of the existing list. Since there are an array of Segments, the writes can be spread and not all threads might lock on the same segment leading to more concurrent writes.

Consider a get call. Get the first HashEntry and iterator till the end and if found return the value. No locking at all. Same as a HashMap but unlike a Hashtable. So all reads work at nearly same speed as HashMap.

There are 2 things that make this possible

1) The new JMM guarantees that Volatile reads are not re-ordered with volatile writes and all reads after a write will get the updated contents without synchronization. Variable that holds value reference in HashEntry is volatile. Plus the whole HashEntry array in each segment is volatile. So any changes to value or every newly added HashEntry object(or key-value pair) is visible to all threads after they are assigned without any syncronization.

2) Final fields initialization safety, all threads will see the values for its final fields that were set in its constructor.Further, any variables that can be reached through a final field of a properly constructed object, such as fields of an object referenced by a final field, are also guaranteed to be visible to other threads as well. So if a new key-value pair is added and is instantly accessed by another thread the key, hashkey and next pointer will have proper values and never null.

These ensure that any add in any thread instantly reflects in other threads, without any flushing of memory. Does this mean no locking at all ? None in the java code but the jvm implementation will have to do some locking to ensure that volatile variable reads return latest written values. Since its a very lower level it should be more faster.

Never seen an API that uses the features of the JMM to this extent. Hats off to Doug Lea who made this all possible.

Subscribe to comments for this post

Clicky Web Analytics