Lukas has posted 5 posts at DZone. You can read more from them at their website. View Full User Profile

Think twice before using Java 8 parallel streams

02.24.2014
| 18219 views |
  • submit to reddit

If you listen to people from Oracle talking about design choices behind Java 8, you will often hear that parallelism was the main motivation. Parallelization was the main driving force behind lambdas, stream API and others. Let's take a look at an example of stream API.

private long countPrimes(int max) {
    return range(1, max).parallel().filter(this::isPrime).count();
}

private boolean isPrime(long n) {
    return n > 1 && rangeClosed(2, (long) sqrt(n)).noneMatch(divisor -> n % divisor == 0);
}

Here we have method countPrimes that counts number of prime numbers between 1 and max. Stream of numbers is created by a range method. The stream is then switched to parallel mode, numbers that are not primes are filtered out and the remaining numbers are counted.

You can see that stream API allow us to describe the problem in a neat and compact way. Moreover, parallelization is just a matter of calling parallel() method. When we do that, the stream is split into multiple chunks, with each chunk processed independently and with the result summarized at the end. Since our implementation of isPrime method is extremely ineffective and CPU intensive, we can take advantage of parallelization and utilize all available CPU cores.

Lets take a look at another example.

private List<StockInfo> getStockInfo(Stream<String> symbols) {
     return symbols.parallel()
            .map(this::getStockInfo) //slow network operation
            .collect(toList());
}

We have a list of stock symbols on the input and we have to call a slow networking operation to get some details about the stock. Here we do not deal with a CPU intensive operation, but we can take advantage of parallelization too. It's a good idea to execute multiple network request in parallel. Again, a nice task for parallel streams, do you agree?

If you do, please look at the previous example again. There is a big error. Do you see it? The problem is that all parallel streams use common fork-join thread pool and if you submit a long-running task, you effectively block all threads in the pool. Consequently you block all other tasks that are using parallel streams. Imagine a servlet environment, when one request calls getStockInfo() and another one countPrimes(). One will block the other one even though each of them requires different resources. What's worse, you can not specify thread pool for parallel streams, the whole class loader has to use the same one.

Let's illustrate it on the following example:
private void run() throws InterruptedException {
  ExecutorService es = Executors.newCachedThreadPool();

  // Simulating multiple threads in the system
  // if one of them is executing a long-running task.
  // Some of the other threads/tasks are waiting
  // for it to finish

  es.execute(() -> countPrimes(MAX, 1000)); //incorrect task
  es.execute(() -> countPrimes(MAX, 0));
  es.execute(() -> countPrimes(MAX, 0));
  es.execute(() -> countPrimes(MAX, 0));
  es.execute(() -> countPrimes(MAX, 0));
  es.execute(() -> countPrimes(MAX, 0));


  es.shutdown();
  es.awaitTermination(60, TimeUnit.SECONDS);
}

private void countPrimes(int max, int delay) {
  System.out.println(
  range(1, max).parallel()
  .filter(this::isPrime).peek(i -> sleep(delay)).count()
  );
}
Here we simulate six threads in the system. All of them are performing CPU intensive task, the first one is “broken” and sleeps for a second just after it founds a prime number. This is just an artificial example, you can imagine a thread that is stuck or performs a blocking operation instead.

The question is, what will happen when we execute this code? We have six tasks, one of them will take whole day to complete, the rest should finish much sooner. Not surprisingly, every time you execute the code, you get different result. Sometimes all healthy tasks finish, sometimes few of them are stuck behind the slow one. Do you want to have such behavior on the production system? One broken task taking down the rest of the application? I guess not.

There are only two options how to make sure that such thing will never happen. The first is to ensure that all tasks submitted to the common fork-join pool will not get stuck and will finish in a reasonable time. But it's easier said than done, especially in complex applications. The other option is to not use parallel streams and wait until Oracle allows us to specify the thread pool to be used for parallel streams.


Resources:
Published at DZone with permission of its author, Lukas Krecan.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Comments

Gerrit Jansen V... replied on Mon, 2014/02/24 - 9:29am

I've had a similar problem with threading but in a different context (i.e. not java 8).

A more automated solution is to measure task times and schedule tasks that take long on different thread pools (also created dynamically) as per their executions times.


Have a look at:

https://github.com/gerritjvv/thread-exec


Edward Harned replied on Mon, 2014/02/24 - 4:38pm

 Having multiple F/J frameworks just adds more threads to the equation, it doesn’t address the problem. (Although it would help some.)

The real problem is the lack of a thread manager in Java. Operating systems time threads in their states (running, waiting ...) and can preempt/expunge misbehaving threads. A good application thread manager can time threads and flag those that misbehave. The F/J framework does nothing with thread management since it was designed for academic use.

Oracle will never have a professional parallel option until it gets rid of the framework and creates a parallel engine embedded within Java (e.g. .Net )

Oleksandr Alesinskyy replied on Wed, 2014/02/26 - 5:58am in response to: Gerrit Jansen Van Vuuren

 "common fork-join thread poll" - I like the word "poll" in this context.

Lukas Krecan replied on Wed, 2014/02/26 - 6:55am in response to: Oleksandr Alesinskyy

 Fixed, thanks

Lukas Krecan replied on Sun, 2014/05/18 - 2:51am

There is a trick that will allow you to execute parallel operation in a specific thread-pool. It's described here .

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.