Historically, operating systems supported multiprocessing with a single processor:
This provides the illusion that programs run in parallel. Programs may not directly interact, but need to sychronize access to common resources like a file storage, printer, or network. Thus concurrency and synchronization became part of books on operating systems and are still!
From Android Guides:
The main thread ... is in charge of dispatching events and rendering the user interface and is usually called the UI thread. All components (activities, services, etc) ... run in the same process and are instantiated by default in the UI thread.
... performing long operations such as network access or database queries in the UI thread will block the entire app UI from responding. When the UI thread is blocked, no events can be dispatched, including drawing events. From the user's perspective, the application will appear to freeze.
Additionally, ... the Android UI toolkit is not thread-safe and as such you must not manipulate your UI from a background thread ... two rules:
- Do not run long tasks on the main thread (to avoid blocking the UI)
- Do not change the UI at all from a background thread (only the main thread)
Programs under Windows have a similar structure.
The Business Process Modelling Notation (BPMN) serves for describing the interactions between agents in business processss. BPMN diagrams are similar to flowcharts, but they are not meant to be executed, they only serve for documenting the setting for which software is to be developed. (BPMN elements are also supported in draw.io and Vizio.)
Other requirements analysis techniques are textual. Being able to express concurrency is essential as the surrounding world is concurrent.
Ray tracing is a technique to render raster images of a three-dimensional scene, e.g. in animations. For all pixels of the image, rays are traced "backwards" from the camera (eye) until they hit an object of the scene. From there, the reflecting and refracting rays are traced until they hit a light source. This process is recursively repeated as objects may be in the shadow of other objects. The color and intensity of a pixel is then calculated based on the reflecting and refracting properties of the traced objects and the color of the light source. The run-time for each pixel is proportional to the number of light sources, proportional to the number of rays spawned by every object hit, and exponential in the depth of the recursion.
As light rays do not influence each other, all pixels of a frame and all frames of an animation can be computed in parallel. With 4K resolution (3840 × 2160 pixels) and 24 frames per second, a 110 min animation needs 1.3 × 10¹² pixels computed. For all rays needed for one pixel, the intersection of the ray with any object needs to be determined.
How Pixar brings its animated movies to life, Fortune, Sept 2015:
It took around 3,000 processors to render the movies The Incredibles and Cars, two films from the mid 2000s. For more recent films like Monsters University and Inside Out, that number has soared to around 20,000 processors.
All that extra processing power is noticeable when you study the hairs of the animated creature, Sulley, ... . When the original Monsters movie first appeared in 2001, Sulley had 1.1 million hairs covering his body. By Monsters University, released in 2013, Sulley had 5.5 million individual hairs.
Object-oriented design allows that the structure of program, the class hierarchy, reflects the structure of the problem domain: a description of the problem becomes part of the solution.
Simula-67, the first object-oriented languages, supported coroutines that allowed objects to be concurrent, particularly for simulations. Coroutines are scheduled cooperatively, meaning that transfer for control is explicit, unlike preemptively scheduled threads. (Coroutines are called fibers in Windows and are related to goroutines in Go; subsequent object-oriented languages, notably Smalltalk-80 and later C++, did not follow Simula in that respect.)
The overview to the right is for a simulation program for a post office: in principle, customers and clerks are all concurrent.
Embedded systems have to react to environmental events. As these can be independent, a natural structure is by having concurrent processes reacting to different kinds of events. To the left is a statechart for an alarm clock (statecharts are related to UML state machines). For example, light and alarm status are independent and expressed as concurrent states, visually seperated by dashed lines.
An early server architecture is the three-tier architecture: presentation, application, and database servers are all concurrent. They may run on the same computer or on different computers.
The architecture does not scale well beyond a certain size; why? Data centers for cloud computing connect servers differently, with a "fat tree".
Bluetooth is a protocol for personal area networks (PAN) with a mesh topology: devices form piconets; devices have different roles, which changes as devices join and leave the network. All devices are concurrent and need to manage communication, including forwarding of messages to the right recipient.
Processes execute in a sequence of steps. Concurrent execution leads to interleaving of steps. For example, the parallel (concurrent) composition A ‖ B of processes A and B may result in:
A | ➀ ➁ ➂ ➃ ➄ ➅ |
B | ➀ ➁ ➂ ➃ ➄ ➅ |
A ‖ B | ➀ ➀ ➁ ➂ ➁ ➃ ➂ ➃ ➄ ➄ ➅ ➅ |
Interleaving implies nondeterminism: different executions may lead to different interleavings. Interleavings may cause data races.
Suppose processes inc1
, inc2
intend to increment t
, a variable stored in memory, by 1
and 2
. For this, processors have to load a variable into a register. With ri
being a register, the steps become:
inc1: inc2:
r1 := t r2 := t
r1 := r1 + 1 r2 := r2 + 2
t := r1 t := r2
For example, t
could represent the number of sold tickets.
Question. Which amount does inc1 ‖ inc2
add to t
when run concurrently?
Answer. Either 1
, 2
, or 3
. A possible interleaving for t
to be set to 1
is:
r1 := t ; r2 := t ; r2 := r2 + 2 ; t := r2 ; r1 := r1 + 1 ; t := r1
Locking a variable (or any resource) gives exclusive access to that variable:
P: Q:
lock x and y lock x and y
x := x + 1 x := x + 2
y := y – 1 y := y – 2
unlock x and y unlock x and y
Variables x
and y
could be two bank accounts or the number of sold and available concert tickets.
Question. What could happen in P ‖ Q
if P
locks x
, y
and Q
locks y
, x
in that order?
Answer. If P
locks x
and then Q
locks y
, a deadlock occurs: neither can proceed.
In July 1997, Pathfinder landed on Mars.
After a while Pathfinder stopped sending data and reset itself continuously.
After 18 hours the failure was reproduced in a lab replica: priority inversion, a form of starvation.
The system had a "watch dog" that discovered the situation and did a reset, and a reset, and a reset, …
The engineers managed to transmit code to Mars and execute it, to update the software. Testing during development did not reveal the error.
Device drivers typically run in the operating system kernel and interface to mice, keyboards, drives, and other devices. As they run in the kernel, a faulty device driver can cause the whole operating system to crash.
Wikipedia: World's second most widespread blackout in history:
10 million people in Ontario and 45 million people in eight U.S. states without power
Task Force Report:
… a software bug in General Electric Energy's Unix-based XA/21 energy management system that prevented alarms from showing on their control system. This alarm system stalled because of a race condition. After the alarm system failed silently without being noticed by the operators, unprocessed events (that had to be checked for an alarm) started to queue up and the primary server failed within 30 minutes.
J.R. Minkel, The 2003 Northeast Blackout--Five Years Later, Scientific American, August 2008:
The event contributed to at least 11 deaths and cost an estimated $6 billion.
Nancy Leveson and Clark Turner, An Investigation of the Therac-25 Accidents, IEEE Computer, July 1993:
Some of the most widely cited software-related accidents in safety-critical systems involved a computerized radiation therapy machine called the Therac-25. Between June 1985 and January 1987, six known accidents involved massive overdoses by the Therac-25 with resultant deaths and serious injuries. They have been described as the worst series of radiation accidents in the 35-year history of medical accelerators.
...
It is clear from the AECL [Atomic Energy of Canada Limited] documentation on the modifications that the software allows concurrent access to shared memory, that there is no real synchronization aside from data stored in shared variables, and that the "test" and "set" for such variables are not indivisible operations. Race conditions resulting from this implementation of multitasking played an important part in the accidents.
Cisco Global Mobile Data Traffic Forecast, March 2020:
The compound annual growth rate (CAGR) of device and connections is predicted to be 10% annually, with machine-to-machine connection (IoT devices) growing most.
Processor frequency is no longer increasing due to the power wall. For CMOS circuits:
Power = Capacitive load × Voltage² × Frequency
Single-thread performance is no longer increasing: the benefits of caching, pipelining, etc. are maxed out.
Number of transitors per processor is still increasing–linearly on a logarithmic scale, i.e. exponentially, doubling every 18 months as predicted by Gordon Moore.
This allow for more cores.
Further reading: Peter Denning and Ted Lewis, Exponential Laws of Computing Growth, Communications of the ACM, Jan 2017
Libraries hide some of the complexity for specific applications, e.g.
These are useful in practice, but limited to the intended applications.
Microsoft's VCC is one such tool:
VCC is a mechanical verifier for concurrent C programs. VCC takes a C program, annotated with function specifications, data invariants, loop invariants, and ghost code, and tries to prove these annotations correct. If it succeeds, VCC promises that your program actually meets its specifications.
It's main use is for Microsoft's Hyper-V hypervisor. Try out VCC online at: http://rise4fun.com/vcc
Similar tools for Java and other languages exist. They are mainly used for highly-critical software, despite the fact that these tools dramatically reduce the time for testing and can even reduce development and maintenance effort.
Unlike verification tools, static analysis tools find errors with no or little annotation. However, they can't find all errors (incomplete) and may produce false warnings (unsound).
For detecting concurrency errors in Windows:
For detecting concurrency errors in Java:
Despite their drawbacks, static analysis tools have become popular in practice, e.g. https://scan.coverity.com/ as they still reduce testing time significantly, don't require training, and fit in existing development processes.
ThreadSanitizer is one such tool, developed by Google, and included in clang:
ThreadSanitizer is a tool that detects data races. It consists of a compiler instrumentation module and a run-time library.
From a Google report "How Developers Use Data Race Detection Tools", 2014:
[ThreadSanitizer] regularly finds critical bugs, and is in wide use across Google ... . One interesting incident occurred in the open source Chrome browser. Up to 15% of known crashes were attributed to just one bug ..., which proved difficult to understand–the Chrome engineers spent over 6 months tracking this bug without success. On the other hand, the [ThreadSanitizer] team found the reason for this bug in a 30 minute run, without even knowing about these crashes. The crashes were caused by data races on a couple of reference counters.
ThreadSanitizer is included in Xcode. From Apple's developer documentation:
Running your code with Thread Sanitizer checks enabled can result in CPU slowdown of 2⨉ to 20⨉, and an increase in memory usage by 5⨉ to 10⨉.
Further reading: Edward Lee, The Problem with Threads, 2006.
Granularity of atomic operations can reach from nanoseconds (for arithmetic operations) to days. For very fine-grained concurrency the overhead of starting processes outweighs the benefit, for example evaluating parameters of the calls p(x + y, x - y)
in parallel.
Coupling can be loose or tight. For very tightly coupled programs the overhead of communication and synchronization outweighs the benefit: for example, sorting an array with one process for each array element.
Coupling can be between processes can be independent, regular, or general.
a
, b
, c
, the vector addition
algorithm
c := a + b
a
repacing each element with the average of its neighbours
algorithm
a[i] := (a[i - 1] + a[i + 1]) / 2
Some compilers, notably Fortran compilers can recognize independent or regular concurrency and automatically generate code for vector processors that perform the same operation on a number of elements, leading to data parallelism.
The main concern for independent and regular concurrency is performance, for general concurrency it is correctness.
The support for concurrency in programming languages is evolving. The emphasis is on
This notes use algorithmic notation for brevity and clarity in addition to examples in Python, Java, and Go. For example, setting x
to 1
and in parallel y
to 2
is expressed as:
algorithm
x := 1 ‖ y := 2
The assignment x := e
, also written as x ← e
, is read x
becomes e
or x
gets e
.
To illustrate the verbosity of programming languages, here is the same in Python, with a print statement added. For both assignments, classes need to be declared, objects created, threads started, and awaited for termination. (Select the cell and run the program with control-return).
from threading import Thread
class SetX(Thread):
def run(self):
global x; x = 1
class SetY(Thread):
def run(self):
global y; y = 2
setX = SetX(); setY = SetY() # create new threads
setX.start(); setY.start() # run threads
setX.join(); setY.join() # wait for threads to finish
print(x, y)
In Java, also classes need to be declared; additionally, exceptions need to be caught (select the cell and save the file with control-return; select the next cell and run the shell commands with control-return).
%%writefile SetXY.java
public class SetXY {
static int x, y;
public static void main(String args[]) {
class SetX extends Thread {
public void run() {
x = 1;
}
}
class SetY extends Thread {
public void run() {
y = 2;
}
}
Thread setX = new SetX(), setY = new SetY();
setX.start(); setY.start();
try {setX.join(); setY.join();}
catch (Exception e) {};
System.out.println(x + ", " + y);
}
}
!javac SetXY.java
!java SetXY
In Go, for each assignment a function needs to be declared, which below is anonymous. For awaiting the termination of both functions, a channel is introduced to which both functions send a dummy value; the main program waits for these values:
%%writefile SetXY.go
package main
import "fmt"
func main() {
var x, y int
done := make(chan bool)
go func() {x = 1; done <- true} ()
go func() {y = 2; done <- true} ()
<- done; <- done
fmt.Println(x, y)
}
!go run SetXY.go
We cover the main concurrency concepts:
The course also includes a review of the correctness of sequential and object-oriented programs, as the correctness of concurrent programs emerges as an extension.
Python and Java are used for semaphores and monitors, Go is used for message passing.