Let’s draw some parallels to process economics – Concurrency and Asynchronous Programming: a Detailed Overview-2

Alternative 2 – Parallel and synchronous task execution

So, you hire 12 bartenders, and you calculate that you can serve about 360 customers an hour. The line is barely going out the door now, and revenue is looking great.

One month goes by and again, you’re almost out of business. How can that be?

It turns out that having 12 bartenders is pretty expensive. Even though revenue is high, the costs are even higher. Throwing more resources at the problem doesn’t really make the bar more efficient.

Alternative 3 – Asynchronous task execution with one bartender

So, we’re back to square one. Let’s think this through and find a smarter way of working instead of throwing more resources at the problem.

You ask your bartender whether they can start taking new orders while the beer settles so that they’re never just standing and waiting while there are customers to serve. The opening night comes and…

Wow! On a busy night where the bartender works non-stop for a few hours, you calculate that they now only use just over 20 seconds on an order. You’ve basically eliminated all the waiting. Your theoretical throughput is now 240 beers per hour. If you add one more bartender, you’ll have higher throughput than you did while having 12 bartenders.

However, you realize that you didn’t actually accomplish 240 beers an hour, since orders come somewhat erratically and not evenly spaced over time. Sometimes, the bartender is busy with a new order, preventing them from topping up and serving beers that are finished almost immediately. In real life, the throughput is only 180 beers an hour.

Still, two bartenders could serve 360 beers an hour this way, the same amount that you served while employing 12 bartenders.

This is good, but you ask yourself whether you can do even better.

Alternative 4 – Parallel and asynchronous task execution with two bartenders

What if you hire two bartenders, and ask them to do just what we described in Alternative 3, but with one change: you allow them to steal each other’s tasks, so bartender 1 can start pouring and set the beer down to settle, and bartender 2 can top it up and serve it if bartender 1 is busy pouring a new order at that time? This way, it is only rarely that both bartenders are busy at the same time as one of the beers-in-progress becomes ready to get topped up and served. Almost all orders are finished and served in the shortest amount of time possible, letting customers leave the bar with their beer faster and giving space to customers who want to make a new order.

Now, this way, you can increase throughput even further. You still won’t reach the theoretical maximum, but you’ll get very close. On the opening night, you realize that the bartenders now process 230 orders an hour each, giving a total throughput of 460 beers an hour.

Revenue looks good, customers are happy, costs are kept at a minimum, and you’re one happy manager of the weirdest bar on earth (an extremely efficient bar, though).

The key takeaway

Concurrency is about working smarter. Parallelism is a way of throwing more resources at the problem.

Let’s draw some parallels to process economics – Concurrency and Asynchronous Programming: a Detailed Overview-1

I firmly believe the main reason we find parallel and concurrent programming hard to differentiate stems from how we model events in our everyday life. We tend to define these terms loosely, so our intuition is often wrong.
NOTE
It doesn’t help that concurrent is defined in the dictionary as operating or occurring at the same time, which doesn’t really help us much when trying to describe how it differs from parallel.
For me, this first clicked when I started to understand why we want to make a distinction between parallel and concurrent in the first place!
The why has everything to do with resource utilization and efficiency.
Efficiency is the (often measurable) ability to avoid wasting materials, energy, effort, money, and time in doing something or in producing a desired result.
Parallelism is increasing the resources we use to solve a task. It has nothing to do with efficiency.
Concurrency has everything to do with efficiency and resource utilization. Concurrency can never make one single task go faster. It can only help us utilize our resources better and thereby finish a set of tasks faster.
Let’s draw some parallels to process economics
In businesses that manufacture goods, we often talk about LEAN processes. This is pretty easy to compare with why programmers care so much about what we can achieve if we handle tasks concurrently.
Let’s pretend we’re running a bar. We only serve Guinness beer and nothing else, but we serve our Guinness to perfection. Yes, I know, it’s a little niche, but bear with me.
You are the manager of this bar, and your goal is to run it as efficiently as possible. Now, you can think of each bartender as a CPU core, and each order as a task. To manage this bar, you need to know the steps to serve a perfect Guinness:
• Pour the Guinness draught into a glass tilted at 45 degrees until it’s 3-quarters full (15 seconds).
• Allow the surge to settle for 100 seconds.
• Fill the glass completely to the top (5 seconds).
• Serve.
Since there is only one thing to order in the bar, customers only need to signal using their fingers how many they want to order, so we assume taking new orders is instantaneous. To keep things simple, the same goes for payment. In choosing how to run this bar, you have a few alternatives.
Alternative 1 – Fully synchronous task execution with one bartender
You start out with only one bartender (CPU). The bartender takes one order, finishes it, and progresses to the next. The line is out the door and going two blocks down the street – great! One month later, you’re almost out of business and you wonder why.
Well, even though your bartender is very fast at taking new orders, they can only serve 30 customers an hour. Remember, they’re waiting for 100 seconds while the beer settles and they’re practically just standing there, and they only use 20 seconds to actually fill the glass. Only after one order is completely finished can they progress to the next customer and take their order.
The result is bad revenue, angry customers, and high costs. That’s not going to work.

Concurrency versus parallelism – Concurrency and Asynchronous Programming: a Detailed Overview

Note

*However, modern CPUs can also do a lot of things in parallel. Most CPUs are pipelined, meaning that the next instruction is loaded while the current one is executing. It might have a branch predictor that tries to figure out what instructions to load next.

The processor can also reorder instructions by using out-of-order execution if it believes it makes things faster this way without ‘asking’ or ‘telling’ the programmer or the OS, so you might not have any guarantee that A happens before B.

The CPU offloads some work to separate ‘coprocessors’ such as the FPU for floating-point calculations, leaving the main CPU ready to do other tasks et cetera.

As a high-level overview, it’s OK to model the CPU as operating in a synchronous manner, but for now, let’s just make a mental note that this is a model with some caveats that become especially important when talking about parallelism, synchronization primitives (such as mutexes and atomics), and the security of computers and operating systems.

Concurrency versus parallelism

Right off the bat, we’ll dive into this subject by defining what concurrency is. Since it is quite easy to confuse concurrent with parallel, we will try to make a clear distinction between the two from the get-go.

Important

Concurrency is about dealing with a lot of things at the same time.

Parallelism is about doing a lot of things at the same time.

We call the concept of progressing multiple tasks at the same time multitasking. There are two ways to multitask. One is by progressing tasks concurrently, but not at the same time. Another is to progress tasks at the exact same time in parallel. Figure 1.1 depicts the difference between the two scenarios:

Figure 1.1 – Multitasking two tasks

First, we need to agree on some definitions:

• Resource: This is something we need to be able to progress a task. Our resources are limited. This could be CPU time or memory.
• Task: This is a set of operations that requires some kind of resource to progress. A task must consist of several sub-operations.
• Parallel: This is something happening independently at the exact same time.
• Concurrent: These are tasks that are in progress at the same time, but not necessarily progressing simultaneously.

This is an important distinction. If two tasks are running concurrently, but are not running in parallel, they must be able to stop and resume their progress. We say that a task is interruptible if it allows for this kind of concurrency.

Hyper-threading – Concurrency and Asynchronous Programming: a Detailed Overview

As CPUs evolved and added more functionality such as several arithmetic logic units (ALUs) and additional logic units, the CPU manufacturers realized that the entire CPU wasn’t fully utilized. For example, when an operation only required some parts of the CPU, an instruction could be run on the ALU simultaneously. This became the start of hyper-threading.

Your computer today, for example, may have 6 cores and 12 logical cores.. This is exactly where hyper-threading comes in. It “simulates” two cores on the same core by using unused parts of the CPU to drive progress on thread 2 and simultaneously running the code on thread 1. It does this by using a number of smart tricks (such as the one with the ALU).

Now, using hyper-threading, we could actually offload some work on one thread while keeping the UI interactive by responding to events in the second thread even though we only had one CPU core, thereby utilizing our hardware better.

You might wonder about the performance of hyper-threading

It turns out that hyper-threading has been continuously improved since the 90s. Since you’re not actually running two CPUs, there will be some operations that need to wait for each other to finish. The performance gain of hyper-threading compared to multitasking in a single core seems to be somewhere close to 30% but it largely depends on the workload.

Multicore processors

As most know, the clock frequency of processors has been flat for a long time. Processors get faster by improving caches, branch prediction, and speculative execution, and by working on the processing pipelines of the processors, but the gains seem to be diminishing.

On the other hand, new processors are so small that they allow us to have many on the same chip. Now, most CPUs have many cores and most often, each core will also have the ability to perform hyper-threading.

Do you really write synchronous code?

Like many things, this depends on your perspective. From the perspective of your process and the code you write, everything will normally happen in the order you write it.

From the operating system’s perspective, it might or might not interrupt your code, pause it, and run some other code in the meantime before resuming your process.

From the perspective of the CPU, it will mostly execute instructions one at a time.* It doesn’t care who wrote the code, though, so when a hardware interrupt happens, it will immediately stop and give control to an interrupt handler. This is how the CPU handles concurrency.

An evolutionary journey of multitasking – Concurrency and Asynchronous Programming: a Detailed Overview

In the beginning, computers had one CPU that executed a set of instructions written by a programmer one by one. No operating system (OS), no scheduling, no threads, no multitasking. This was how computers worked for a long time. We’re talking back when a program was assembled in a deck of punched cards, and you got in big trouble if you were so unfortunate that you dropped the deck onto the floor.

There were operating systems being researched very early and when personal computing started to grow in the 80s, operating systems such as DOS were the standard on most consumer PCs.

These operating systems usually yielded control of the entire CPU to the program currently executing, and it was up to the programmer to make things work and implement any kind of multitasking for their program. This worked fine, but as interactive UIs using a mouse and windowed operating systems became the norm, this model simply couldn’t work anymore.

Non-preemptive multitasking

Non-preemptive multitasking was the first method used to be able to keep a UI interactive (and running background processes).

This kind of multitasking put the responsibility of letting the OS run other tasks, such as responding to input from the mouse or running a background task, in the hands of the programmer.

Typically, the programmer yielded control to the OS.

Besides offloading a huge responsibility to every programmer writing a program for your platform, this method was naturally error-prone. A small mistake in a program’s code could halt or crash the entire system.

Note

Another popular term for what we call non-preemptive multitasking is cooperative multitasking. Windows 3.1 used cooperative multitasking and required programmers to yield control to the OS by using specific system calls. One badly-behaving application could thereby halt the entire system.

Preemptive multitasking

While non-preemptive multitasking sounded like a good idea, it turned out to create serious problems as well. Letting every program and programmer out there be responsible for having a responsive UI in an operating system can ultimately lead to a bad user experience, since every bug out there could halt the entire system.

The solution was to place the responsibility of scheduling the CPU resources between the programs that requested it (including the OS itself) in the hands of the OS. The OS can stop the execution of a process, do something else, and switch back.

On such a system, if you write and run a program with a graphical user interface on a single-core machine, the OS will stop your program to update the mouse position before it switches back to your program to continue. This happens so frequently that we don’t usually observe any difference whether the CPU has a lot of work or is idle.

The OS is responsible for scheduling tasks and does this by switching contexts on the CPU. This process can happen many times each second, not only to keep the UI responsive but also to give some time to other background tasks and IO events.

This is now the prevailing way to design an operating system.

Note

Later in this book, we’ll write our own green threads and cover a lot of basic knowledge about context switching, threads, stacks, and scheduling that will give you more insight into this topic, so stay tuned.