You are here

Wrestling With Concurrency - Scala Exchange 2012

I was very fortunate to join the London Scala Exchange conference over the last two days. A good range of quality talks were offered - some thought provoking, some highly entertaining. And most wrestled to some extent with concurrency.

It seems that Scala is not yet at ease with itself in its concurrency model. The root of the issue is the JVM: threads are very expensive and limited to small numbers (say, 30). This causes an inversion of priorities - instead of the CPU cores being seen as the most scarce resource, the threads themselves are seen this way. This is understandable but has perverse effects that we see when we stand back and look at the very hard work we then put in to optimise the use of our limited thread pools.

It's easy to write blocking code, much harder to write asynchronous non-blocking code (there's an example below). But we want higher load factors, so we go the extra step and write the non-blocking code. The buzz with Scala is this: use Futures from day one. Several APIs now exist that only provide asynchronous I/O via futures (e.g. Dispatch HTTP). All this is strongly encouraging the first-use of futures.

But this is a performance optimisation. As with all optimisation, a mature approach is demanded - premature optimisation can cause all sorts of evil (D.Knuth). Mature optimisation is a Good Thing and requires empirical evidence - measure and be happy. First, let's do some more thinking. How will I know what to expect?

Shall I Benefit from Non-blocking Requests over Blocking Requests?

  • If excess parallelism is already present, blocking I/O is not harmful and is simpler. Suppose you've got a many-threaded webserver (e.g. Tomcat) and 50 threads running on a four-core server. That's plenty of excess parallelism; the probability of all CPU cores being kept busy all the time is high - so there is no need for futures for downstream requests to the service or database layer.

  • Will I suffer thread-starvation problems? You may not know the answer to this until you try, but adding futures carelessly can seriously slow down an application in some cases (example case history). This happens because futures need a thread pool; if there are multiple thread pools, they are independent of each other in the sense that work is only given to threads within a given pool, not to all threads available. Whilst some thread pools might have excess parallelism, others might be starved of resources; this kind of imbalance might cause a lot of wastage. If you're not sure, make some measurements. Fewer, bigger thread pools are a good idea, but be wary of dynamically-resizing thread pools because they have other non-trivial costs. It's a pain to have to deal with this stuff; maybe try other optimisation approaches first. Ideally, there would only be one global thread pool and it would be automatic - just like the heap handles objects automatically - more on this later.

  • Am I affected by Amdahl's law? Just because something is expressed in a parallel way does not mean it will run faster. Often, the rate-limiting step is elsewhere, so don't optimise the wrong things. Adding code to achieve more parallelism means adding code - the extra complexity may slow things down more than the speed-up it hopes for.

Roll out the Futures...

Having considered all the above, then the time might be right to start using futures (and related techniques) for asynchronously executing blocks of code that depend on slow I/O. Fortunately, Scala futures are compositional, meaning I can chain them using the 'Futures Staircase' approach. Unfortunately, this is hard to read and hard to modify - brittle, unwieldy and far from ideal.

To illustrate the Futures Staircase, here's an example borrowed from SIP-14, simplified by removal of the error handling code.

val rateQuote = future {
  connection.getCurrentValue(USD)
}
rateQuote onSuccess { case quote =>
  val purchase = future {
    if (isProfitable(quote)) connection.buy(amount, quote)
    else throw new Exception("not profitable")
  }
  purchase onSuccess {
    case _ => println("Purchased " + amount + " USD")
  }
}

This chains just two blocks that wrap I/O calls; the 'future' operator appears twice, each time taking a block that is the closure to be executed in a separate thread. Then, for each future, we use the onSuccess method with another closure to specify what will happen, asynchronously, when the future succeeds.

We have one future nested inside another future's onSuccess. You can repeat this with lots of futures to form a grand staircase, but a limit is imposed by our brains - can we understand the complexity of a staircase of many futures? It's even worse when error handling is included. And then, how do you test it?

Taking away the asynchronous coding, what we are trying to achieve is

val rateQuote = connection.getCurrentValue(USD)
if (isProfitable(rateQuote)) {
  connection.buy(amount, rateQuote)
  println("Purchased " + amount + " USD")
}
else throw new Exception("not profitable")

which is undeniably simpler to read, to modify and to test. It would also be simple to add many many more sequential steps that use the connection I/O object, each needing to block the thread.

Green Threads - The Answer?

It was mentioned earlier that the culprit for all this stuff is the JVM. Since Java 1.2, Green Threads (also here) were abandoned in favour of operating system ("native") threads. This was largely a pragmatic move because multi-core support was essentially absent and needed a quick leg-up for obvious reasons. The poor performance of the green threads didn't help - they needed a lot of remedial work but they got ditched instead.

This was a shame. It left us with the poor abstraction we have today - threads are expensive, concurrency is hard. It used to be this way with memory too, until garbage collection technology improved. We'd never go back to manual allocation now - this illustrates how important good abstractions are.

Green threads provide a far better abstraction, allowing concurrency to be a useful tool not a stumbling block. But green threads are no use if several hurdles can't be overcome:

  • they must use all the cpu cores available to them
  • context switching must be cheap
  • I/O may block any thread engaged in it, but not any other thread and certainly not all other threads, which was the case in some early implementations.

I've often wondered why multi-threading is so hard in Java but it has now become clear - it was ultimately to do with the switch to native threads, which are:

  • good at using all the cpu cores
  • good at being truly concurrent, providing independent I/O etc
  • slow at context switching (compared with the best green thread implementations)
  • horribly greedy with memory, hence limiting the maximum usable number of them
  • a poor abstraction for any basis for expressing the real world, which is highly concurrent of course.

For comparison, besides Erlang the new Go language does a good job of huge concurrency, whilst enthusiastically advocating this as an importantly positive design decision. The grand-daddy of them all remains Occam, still an ongoing research project.

Is there anything we can do about it?

So, if a lot of programmer time is expended (wasted even?) coding up non-blocking I/O, futures etc, it's a big shame we don't have a better level of abstraction. Is all hope lost for Scala concurrency?

I don't think so. There are several ways to deal with the complexities of futures staircases and a continuation passing style of interleaving. For example,

  • Using a Scala compiler plugin (SIP-18) so that a sequential block of code that contains I/O operations could be automatically mapped to the equivalent Futures Staircase. A single SIP-14 ExecutionContext thread-pool would eliminate the thread-pool starvation problems mentioned above.

  • Ditto but mapped instead to continuation-passing style - very hard to write manually but should be straightforward to automate. This might not be possible - would blocking I/O be crippling? I suspect so.

There is plenty of scope here for further research, the results of which could be potentially highly beneficial.

Comments

Some very interesting points raised. Lots to think about. A couple of corrections / qualifications needed:

The idea of the ExecutionContext from SIP-14 is to create a single global thread pool used by all concurrency operations - so this is already in place for Scala 2.10. This has been present in the Akka codebase for some time. Typically in Akka you tend to configure this pool to match the number of processor cores, thus minimising the slow thread context switching of the JVM.

The example you show of a 'future staircase' is based around the low level underlying api of SIP-14. In practice you'd typically not be coding at this level. More likely you'd be using map/flatMap, andThen plus for comprehensions. This raises the programming abstraction to a higher level and makes the code much simpler. You can also use applicative style to create a very functional programming style.

Thanks for the comments, and especially for the update on ExecutionContext.

The future staircase issue still holds, even though it can be dressed up in various ways. It's still a hurdle for newbies, so I decided to pick the most explicit and leave the alternatives outside the scope of the article.