Performance Zone is brought to you in partnership with:

Martin is a high-performance and low-latency specialist, with experience gained over two decades working with large scale transactional and big-data domains, including automotive, gaming, financial, mobile, and content management. He believes Mechanical Sympathy - applying an understanding of the hardware to the creation of software - is fundamental to delivering elegant, high-performance, solutions. Martin was the co-founder and CTO of LMAX, until he left to specialise in helping other people achieve great performance with their software. The Disruptor concurrent programming framework is just one example of what his mechanical sympathy has created. He blogs at mechanical-sympathy.blogspot.com Martin is a DZone MVB and is not an employee of DZone and has posted 29 posts at DZone. You can read more from them at their website. View Full User Profile

Algorithm of the Week: Lock-based vs. Lock-free Concurrent Algorithms

08.27.2013
| 18410 views |
  • submit to reddit

Last week I attended a review session of the new JSR166StampedLock run by Heinz Kabutz at the excellent JCrete unconference. StampedLock is an attempt to address the contention issues that arise in a system when multiple readers concurrently access a shared state. StampedLock is designed to perform better than ReentrantReadWriteLock by taking an optimistic read approach.

While attending the session a couple of things occurred to me. First, I thought it was about time I reviewed the current status of Java lock implementations. Second, that, although StampedLock looks like a good addition to the JDK, it seems to miss the fact that lock-free algorithms are often a better solution to the multiple reader case.

Test Case

To compare implementations I needed an API test case that would not favor a particular approach. For example, the API should be garbage-free and allow the methods to be atomic. A simple test case is to design a spaceship that can be moved around a two-dimensional space with the coordinates of its position available to be read atomically. At least two fields need to be read or written per transaction to make the concurrency interesting.

public interface Spaceship
{
    int readPosition(final int[] coordinates);
 
    int move(final int xDelta, final int yDelta);
}
The above API would be cleaner by factoring out an immutable position object, but I want to keep it garbage-free and create the need to update multiple internal fields with minimal indirection. This API could easily be extended for a three-dimensional space and require the implementations to be atomic.

Multiple implementations are built for each spaceship and exercised by a test harness. All the code and results for this blog can be found here.

The test harness will run each of the implementations in turn by using a megamorphic dispatch pattern to try to prevent inlining, lock-coarsening, and loop unrolling when accessing the concurrent methods.

Each implementation is subjected to four distinct threading scenarios that result in different contention profiles:
  • one reader - one writer
  • two readers - one writer
  • three readers - one writer
  • two readers - two writers
All tests are run with 64-bit Java 1.7.0_25, Linux 3.6.30, and a quad-core 2.2GHz Ivy Bridge i7-3632QM. Throughput is measured over five-second periods for each implementation with the tests repeated five times to ensure sufficient warm up. The results below are throughputs averaged per-second over five runs. To approximate a typical Java deployment, no thread affinity or core isolation has been employed, which would have reduced variance significantly.

Note: Other CPUs and operating systems can produce very different results.

Results

Figure 1.
Figure 2.
Figure 3.
Figure 4.

The raw data for the above charts can be found here.

Analysis

The real surprise for me from the results is the performance of ReentrantReadWriteLock.  I cannot see a use for this implementation beyond a case whereby there is a huge balance of reads and very little writes. My main takeaways are:
  1. StampedLock is a major improvement over existing lock implementations, especially with increasing numbers of reader threads.
  2. StampedLock has a complex API. It is very easy to mistakenly call the wrong method for locking actions.
  3. Synchronized is a good general-purpose lock implementation when contention is from only two threads.
  4. ReentrantLock is a good general purpose lock implementation when thread counts grow as previously discovered.
  5. Choosing to use ReentrantReadWriteLock should be based on careful and appropriate measurement. As with all major decisions, measure and make decisions based on data.
  6. Lock-free implementations can offer significant throughput advantages over lock-based algorithms.
Conclusion

It is nice seeing the influence of lock-free techniques appearing in lock-based algorithms. The optimistic strategy employed on read is effectively a lock-free algorithm technique.

In my experience of teaching and developing lock-free algorithms, not only do they provide significant throughput advantages as evidenced here, they also provide much lower and less variance in latency.
Published at DZone with permission of Martin Thompson, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Christian Posta replied on Tue, 2013/08/27 - 6:37pm

Thanks Martin for this post.

For those interested, I've also converted the test code to maven:

https://github.com/christian-posta/rw-concurrency

Definitely useful for evaluating different locking strategies!

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.