Communicating with the operating system – Concurrency and Asynchronous Programming: a Detailed Overview

Communication with an operating system happens through what we call a system call (syscall). We need to know how to make system calls and understand why it’s so important for us when we want to cooperate and communicate with the operating system. We also need to understand how the basic abstractions we use every day use system calls behind the scenes. We’ll have a detailed walkthrough in Chapter 3, so we’ll keep this brief for now.

A system call uses a public API that the operating system provides so that programs we write in ‘userland’ can communicate with the OS.

Most of the time, these calls are abstracted away for us as programmers by the language or the runtime we use.

Now, a syscall is an example of something that is unique to the kernel you’re communicating with, but the UNIX family of kernels has many similarities. UNIX systems expose this through libc.

Windows, on the other hand, uses its own API, often referred to as WinAPI, and it can operate radically differently from how the UNIX-based systems operate.

Most often, though, there is a way to achieve the same things. In terms of functionality, you might not notice a big difference but as we’ll see later, and especially when we dig into how epoll, kqueue, and IOCP work, they can differ a lot in how this functionality is implemented.

However, a syscall is not the only way we interact with our operating system, as we’ll see in the following section.

The CPU and the operating system

Does the CPU cooperate with the operating system?

If you had asked me this question when I first thought I understood how programs work, I would most likely have answered no. We run programs on the CPU and we can do whatever we want if we know how to do it. Now, first of all, I wouldn’t have thought this through, but unless you learn how CPUs and operating systems work together, it’s not easy to know for sure.

What started to make me think I was very wrong was a segment of code that looked like what you’re about to see. If you think inline assembly in Rust looks foreign and confusing, don’t worry just yet. We’ll go through a proper introduction to inline assembly a little later in this book. I’ll make sure to go through each of the following lines until you get more comfortable with the syntax:

Repository reference: ch01/ac-assembly-dereference/src/main.rs
fn main() {
    let t = 100;
    let t_ptr: *const usize = &t;
    let x = dereference(t_ptr);
    println!(“{}”, x);
}
fn dereference(ptr: *const usize) -> usize {
    let mut res: usize;
    unsafe {
        asm!(“mov {0}, [{1}]”, out(reg) res, in(reg) ptr)
    };
    res
}

What you’ve just looked at is a dereference function written in assembly.

The mov {0}, [{1}] line needs some explanation. {0} and {1} are templates that tell the compiler that we’re referring to the registers that out(reg) and in(reg) represent. The number is just an index, so if we had more inputs or outputs they would be numbered {2}, {3}, and so on. Since we only specify reg and not a specific register, we let the compiler choose what registers it wants to use.

The mov instruction instructs the CPU to take the first 8 bytes (if we’re on a 64-bit machine) it gets when reading the memory location that {1} points to and place that in the register represented by {0}. The [] brackets will instruct the CPU to treat the data in that register as a memory address, and instead of simply copying the memory address itself to {0}, it will fetch what’s at that memory location and move it over.

Anyway, we’re just writing instructions to the CPU here. No standard library, no syscall; just raw instructions. There is no way the OS is involved in that dereference function, right?

If you run this program, you get what you’d expect:
100

Now, if you keep the dereference function but replace the main function with a function that creates a pointer to the 99999999999999 address, which we know is invalid, we get this function:
fn main() {
    let t_ptr = 99999999999999 as *const usize;
    let x = dereference(t_ptr);
    println!(“{}”, x);
}

Now, if we run that we get the following results.

This is the result on Linux:
Segmentation fault (core dumped)

This is the result on Windows:
error: process didn’t exit successfully: `target\debug\ac-assembly-dereference.exe` (exit code: 0xc0000005, STATUS_ACCESS_VIOLATION)

We get a segmentation fault. Not surprising, really, but as you also might notice, the error we get is different on different platforms. Surely, the OS is involved somehow. Let’s take a look at what’s really happening here.

Concurrency from the operating system’s perspective – Concurrency and Asynchronous Programming: a Detailed Overview

The role of the operating system

The operating system (OS) stands in the center of everything we do as programmers (well, unless you’re writing an operating system or working in the embedded realm), so there is no way for us to discuss any kind of fundamentals in programming without talking about operating systems in a bit of detail.

Concurrency from the operating system’s perspective

This ties into what I talked about earlier when I said that concurrency needs to be talked about within a reference frame, and I explained that the OS might stop and start your process at any time.

What we call synchronous code is, in most cases, code that appears synchronous to us as programmers. Neither the OS nor the CPU lives in a fully synchronous world.

Operating systems use preemptive multitasking and as long as the operating system you’re running is preemptively scheduling processes, you won’t have a guarantee that your code runs instruction by instruction without interruption.

The operating system will make sure that all important processes get some time from the CPU to make progress.

Note

This is not as simple when we’re talking about modern machines with 4, 6, 8, or 12 physical cores, since you might actually execute code on one of the CPUs uninterrupted if the system is under very little load. The important part here is that you can’t know for sure and there is no guarantee that your code will be left to run uninterrupted.

Teaming up with the operating system

When you make a web request, you’re not asking the CPU or the network card to do something for you – you’re asking the operating system to talk to the network card for you.

There is no way for you as a programmer to make your system optimally efficient without playing to the strengths of the operating system. You basically don’t have access to the hardware directly. You must remember that the operating system is an abstraction over the hardware.

However, this also means that to understand everything from the ground up, you’ll also need to know how your operating system handles these tasks.To be able to work with the operating system, you’ll need to know how you can communicate with it, and that’s exactly what we’re going to go through next.

Choosing the right reference frame – Concurrency and Asynchronous Programming: a Detailed Overview

When you write code that is perfectly synchronous from your perspective, stop for a second and consider how that looks from the operating system perspective.

The operating system might not run your code from start to end at all. It might stop and resume your process many times. The CPU might get interrupted and handle some inputs while you think it’s only focused on your task.

So, synchronous execution is only an illusion. But from the perspective of you as a programmer, it’s not, and that is the important takeaway:

When we talk about concurrency without providing any other context, we are using you as a programmer and your code (your process) as the reference frame. If you start pondering concurrency without keeping this in the back of your head, it will get confusing very fast.

The reason I’m spending so much time on this is that once you realize the importance of having the same definitions and the same reference frame, you’ll start to see that some of the things you hear and learn that might seem contradictory really are not. You’ll just have to consider the reference frame first.

Asynchronous versus concurrent

So, you might wonder why we’re spending all this time talking about multitasking, concurrency, and parallelism, when the book is about asynchronous programming.

The main reason for this is that all these concepts are closely related to each other, and can even have the same (or overlapping) meanings, depending on the context they’re used in.

In an effort to make the definitions as distinct as possible, we’ll define these terms more narrowly than you’d normally see. However, just be aware that we can’t please everyone and we do this for our own sake of making the subject easier to understand. On the other hand, if you fancy heated internet debates, this is a good place to start. Just claim someone else’s definition of concurrent is 100 % wrong or that yours is 100 % correct, and off you go.

For the sake of this book, we’ll stick to this definition: asynchronous programming is the way a programming language or library abstracts over concurrent operations, and how we as users of a language or library use that abstraction to execute tasks concurrently.

The operating system already has an existing abstraction that covers this, called threads. Using OS threads to handle asynchrony is often referred to as multithreaded programming. To avoid confusion, we’ll not refer to using OS threads directly as asynchronous programming, even though it solves the same problem.

Given that asynchronous programming is now scoped to be about abstractions over concurrent or parallel operations in a language or library, it’s also easier to understand that it’s just as relevant on embedded systems without an operating system as it is for programs that target a complex system with an advanced operating system. The definition itself does not imply any specific implementation even though we’ll look at a few popular ones throughout this book.

If this still sounds complicated, I understand. Just sitting and reflecting on concurrency is difficult, but if we try to keep these thoughts in the back of our heads when we work with async code I promise it will get less and less confusing.

Concurrency and its relation to I/O – Concurrency and Asynchronous Programming: a Detailed Overview

As you might understand from what I’ve written so far, writing async code mostly makes sense when you need to be smart to make optimal use of your resources.

Now, if you write a program that is working hard to solve a problem, there is often no help in concurrency. This is where parallelism comes into play, since it gives you a way to throw more resources at the problem if you can split it into parts that you can work on in parallel.

Consider the following two different use cases for concurrency:

  • When performing I/O and you need to wait for some external event to occur
  • When you need to divide your attention and prevent one task from waiting too long

The first is the classic I/O example: you have to wait for a network call, a database query, or something else to happen before you can progress a task. However, you have many tasks to do so instead of waiting, you continue to work elsewhere and either check in regularly to see whether the task is ready to progress, or make sure you are notified when that task is ready to progress.

The second is an example that is often the case when having a UI. Let’s pretend you only have one core. How do you prevent the whole UI from becoming unresponsive while performing other CPU-intensive tasks?

Well, you can stop whatever task you’re doing every 16 ms, run the update UI task, and then resume whatever you were doing afterward. This way, you will have to stop/resume your task 60 times a second, but you will also have a fully responsive UI that has a roughly 60 Hz refresh rate.

What about threads provided by the operating system?

We’ll cover threads a bit more when we talk about strategies for handling I/O later in this book, but I’ll mention them here as well. One challenge when using OS threads to understand concurrency is that they appear to be mapped to cores. That’s not necessarily a correct mental model to use, even though most operating systems will try to map one thread to one core up to the number of threads equal to the number of cores.

Once we create more threads than there are cores, the OS will switch between our threads and progress each of them concurrently using its scheduler to give each thread some time to run. You also must consider the fact that your program is not the only one running on the system. Other programs might spawn several threads as well, which means there will be many more threads than there are cores on the CPU.

Therefore, threads can be a means to perform tasks in parallel, but they can also be a means to achieve concurrency.

This brings me to the last part about concurrency. It needs to be defined in some sort of reference frame.

Let’s draw some parallels to process economics – Concurrency and Asynchronous Programming: a Detailed Overview-2

Alternative 2 – Parallel and synchronous task execution

So, you hire 12 bartenders, and you calculate that you can serve about 360 customers an hour. The line is barely going out the door now, and revenue is looking great.

One month goes by and again, you’re almost out of business. How can that be?

It turns out that having 12 bartenders is pretty expensive. Even though revenue is high, the costs are even higher. Throwing more resources at the problem doesn’t really make the bar more efficient.

Alternative 3 – Asynchronous task execution with one bartender

So, we’re back to square one. Let’s think this through and find a smarter way of working instead of throwing more resources at the problem.

You ask your bartender whether they can start taking new orders while the beer settles so that they’re never just standing and waiting while there are customers to serve. The opening night comes and…

Wow! On a busy night where the bartender works non-stop for a few hours, you calculate that they now only use just over 20 seconds on an order. You’ve basically eliminated all the waiting. Your theoretical throughput is now 240 beers per hour. If you add one more bartender, you’ll have higher throughput than you did while having 12 bartenders.

However, you realize that you didn’t actually accomplish 240 beers an hour, since orders come somewhat erratically and not evenly spaced over time. Sometimes, the bartender is busy with a new order, preventing them from topping up and serving beers that are finished almost immediately. In real life, the throughput is only 180 beers an hour.

Still, two bartenders could serve 360 beers an hour this way, the same amount that you served while employing 12 bartenders.

This is good, but you ask yourself whether you can do even better.

Alternative 4 – Parallel and asynchronous task execution with two bartenders

What if you hire two bartenders, and ask them to do just what we described in Alternative 3, but with one change: you allow them to steal each other’s tasks, so bartender 1 can start pouring and set the beer down to settle, and bartender 2 can top it up and serve it if bartender 1 is busy pouring a new order at that time? This way, it is only rarely that both bartenders are busy at the same time as one of the beers-in-progress becomes ready to get topped up and served. Almost all orders are finished and served in the shortest amount of time possible, letting customers leave the bar with their beer faster and giving space to customers who want to make a new order.

Now, this way, you can increase throughput even further. You still won’t reach the theoretical maximum, but you’ll get very close. On the opening night, you realize that the bartenders now process 230 orders an hour each, giving a total throughput of 460 beers an hour.

Revenue looks good, customers are happy, costs are kept at a minimum, and you’re one happy manager of the weirdest bar on earth (an extremely efficient bar, though).

The key takeaway

Concurrency is about working smarter. Parallelism is a way of throwing more resources at the problem.

Let’s draw some parallels to process economics – Concurrency and Asynchronous Programming: a Detailed Overview-1

I firmly believe the main reason we find parallel and concurrent programming hard to differentiate stems from how we model events in our everyday life. We tend to define these terms loosely, so our intuition is often wrong.
NOTE
It doesn’t help that concurrent is defined in the dictionary as operating or occurring at the same time, which doesn’t really help us much when trying to describe how it differs from parallel.
For me, this first clicked when I started to understand why we want to make a distinction between parallel and concurrent in the first place!
The why has everything to do with resource utilization and efficiency.
Efficiency is the (often measurable) ability to avoid wasting materials, energy, effort, money, and time in doing something or in producing a desired result.
Parallelism is increasing the resources we use to solve a task. It has nothing to do with efficiency.
Concurrency has everything to do with efficiency and resource utilization. Concurrency can never make one single task go faster. It can only help us utilize our resources better and thereby finish a set of tasks faster.
Let’s draw some parallels to process economics
In businesses that manufacture goods, we often talk about LEAN processes. This is pretty easy to compare with why programmers care so much about what we can achieve if we handle tasks concurrently.
Let’s pretend we’re running a bar. We only serve Guinness beer and nothing else, but we serve our Guinness to perfection. Yes, I know, it’s a little niche, but bear with me.
You are the manager of this bar, and your goal is to run it as efficiently as possible. Now, you can think of each bartender as a CPU core, and each order as a task. To manage this bar, you need to know the steps to serve a perfect Guinness:
• Pour the Guinness draught into a glass tilted at 45 degrees until it’s 3-quarters full (15 seconds).
• Allow the surge to settle for 100 seconds.
• Fill the glass completely to the top (5 seconds).
• Serve.
Since there is only one thing to order in the bar, customers only need to signal using their fingers how many they want to order, so we assume taking new orders is instantaneous. To keep things simple, the same goes for payment. In choosing how to run this bar, you have a few alternatives.
Alternative 1 – Fully synchronous task execution with one bartender
You start out with only one bartender (CPU). The bartender takes one order, finishes it, and progresses to the next. The line is out the door and going two blocks down the street – great! One month later, you’re almost out of business and you wonder why.
Well, even though your bartender is very fast at taking new orders, they can only serve 30 customers an hour. Remember, they’re waiting for 100 seconds while the beer settles and they’re practically just standing there, and they only use 20 seconds to actually fill the glass. Only after one order is completely finished can they progress to the next customer and take their order.
The result is bad revenue, angry customers, and high costs. That’s not going to work.