The assignment statement, sequential composition, conditional statement, and repetitive statement are universal in the sense that any sequential program can be expressed through those. For concurrent programs, we need to add means to express:
When reasoning about concurrency, it is natural to think in terms of those. Programming languages commonly do not support parallel composition, mutual exclusion, and condition synchronization in full generality, but their constructs can be understood by those notions.
Statements execute in a sequence of atomic steps. Such sequences are called traces. These can be either finite or infinite. The behaviour of a statement is given by the set of all traces.
The parallel composition S₁ ‖ S₂
executes S₁
and S₂
"in parallel" by interleaving (merging) their traces; formally, σ
is a trace of S₁ ‖ S₂
if σ
projected onto S₁
steps is a trace of S₁
and projected onto S₂
steps is a trace of S₂
.
In state diagrams, parallel composition is visualized by a dashed line; transitions are atomic steps:
The composition S₁ ‖ S₂
terminates when both S₁
and S₂
terminate. For example:
Statement | Behaviour |
---|---|
A |
{ ➀ ➁ } |
B |
{ ❶ ❷ } |
A ‖ B |
{ ➀ ➁ ❶ ❷ , ➀ ❶ ➁ ❷ , ➀ ❶ ❷ ➁ , ❶ ➀ ➁ ❷ , ❶ ➀ ❷ ➁ , ❶ ❷ ➀ ➁ } |
</span>
Question: If ➀ = x := 1
, ➁ = x := 2
, ❶ = y := 1
, ❷ = y := 2
what is the final state?
Answer: x = 2 ∧ y = 2
The statement fork P
, where P
is a procedure name, creates a new process and executes it concurrently with the one creating it. The statement join P
waits for a process to terminate. For example:
Any ‖
statement can be equivalently expressed by fork
and join
:
algorithm
S₁ ‖ … ‖ Sₙ
is equivalent to
Unlike the parallel composition, fork
statements allow an arbitrary number of processes to be started, but introduce a parent-child relation among the processes.
The process declaration executes the declared process concurrently with any other declared processes.
algorithm
process p
S
Process declarations can be equivalently expressed with ‖
or with fork
and join
. For example
is equivalent to:
The par
statement is a generalization of ‖
in the same way as for
is a generalization of ;
:
algorithm
par x = E to F do S
Here, x
is a constant local to the par
statement. For example, vectors a, b : 1 .. N → integer
can be added in parallel, with the same result as adding them with a sequential for
loop:
algorithm
par i = 1 to N do
b(i) := b(i) + a(i)
This can be equivalently expressed with fork
and join
:
For a, b, c: 0 .. N − 1 × 0 .. N − 1 → float
, sequential matrix multiplication is expressed by:
algorithm
for i = 0 to N − 1 do
for j = 0 to N − 1 do
c(i, j) := 0.0
for k = 0 to N − 1 do
c(i, j) := c(i, j) + a(i, k) × b(k, j)
Question: Which of the three for
can be turned into a par
?
Answer: Only the outer two can:
algorithm
par i = 0 to N − 1 do
par j = 0 to N − 1 do
c(i, j) := 0.0
for k = 0 to N − 1 do
c(i, j) := c(i, j) + a(i, k) × b(k, j)
Aside: It would be more systematic to write seq
instead of for
for generalized sequential composition, as in occam, a predecessor of Go.
The previous program creates N²
processes, which may be far more than there are processors; in some languages, the overhead of process creation may outweigh the benefit of parallelism. A solution is to use one worker process per strip.
Let P
be the number of worker processes:
algorithm
procedure worker(w: 1 .. P)
var first = (w – 1) × N div P
var last = w × N div P – 1
for i = first to last do
for j = 0 to N – 1 do
c(i, j) := 0.0
for k = 0 to N – 1 do
c(i, j) := c(i, j) + a(i, k) × b(k, j)
Matrix multiplication is an example of "embarrassing parallelism", as no synchronization is needed: all processes assign to different array elements.
In algorithms, atomic steps are indicated by atomicity brackets. These can be placed around expressions and statements, for example:
Atomicity brackets cannot be nested. Auxiliary variables ("registers") are needed to express atomicity in state diagrams. Above statements are represented by:
When atomicity brackets are left out, implicitly only access to individual basic variables and array elements of basic types is atomic:
algorithm
x := x + 1 = ⟨x⟩ := ⟨x⟩ + ⟨1⟩
If processes run by multiplexing on a single processor, atomicity can be implemented by disabling and enabling interrupts:
If multiple processors (cores) are available, ensuring atomicity by suspending processors would be too drastic. Instead, the hardware guarantees that certain operations are atomic.
int
, boolean
, float
, and pointers, which are all word-sized, is atomic, but not access to double
and long
. We continue to use atomicity brackets as a design notation and return to techniques for implementing atomic regions.Question: Which of these algorithms for computing the maximum of a: 1 .. N → integer
is correct?
algorithm
m := a(1)
for i = 2 to N do if a(i) > m then m := a(i)
m := a(1)
par i = 2 to N do if ⟨a(i)⟩ > ⟨m⟩ then ⟨m⟩ := ⟨a(i)⟩
m := a(1)
par i = 2 to N do ⟨if a(i) > m then m := a(i)⟩
m := a(1)
par i = 2 to N do if ⟨a(i) > m⟩ then ⟨m := a(i)⟩
m := a(1)
par i = 2 to N do
if a(i) > m then ⟨if a(i) > m then m := a(i)⟩
Answer: Correct, incorrect, correct (but less efficient than sequential version), incorrect, correct (and potentially faster than sequential version).
What can we say about the postcondition of a parallel composition? In the example below, the statements x := x + 1
and y := y + 2
are correct with respect to their pre- and postconditions:
Their parallel composition establishes the conjunction of their postconditions.
Compare this to the parallel composition of x := 1
and x := 2
:
Their parallel composition establishes the disjunction of their postconditions.
Now consider the parallel composition of ⟨x := x + 1⟩
and ⟨x := x + 2⟩
:
Question: What are appropriate predicates P₁
, P₂
, Q₁
, Q₂
?
Answer: Here is a naive attempt:
If both processes start with x = 0
and ⟨x := x + 1⟩
is executed first, the precondition of ⟨x := x + 2⟩
is no longer x = 0
. Likewise, if ⟨x := x + 2⟩
is executed first and establishes x = 2
, then ⟨x := x + 1⟩
would invalidate that postcondition. The reason is interference. Additionally, from x = 1
and x = 2
the postcondition x = 3
cannot be inferred.
Question: How can the annotation be modified such that the parallel execution will not invalidate the predicates?
Answer: The annotation needs to be weakened.
Now each process does not interfere with the states of the other process.
As another example, consider following two procedures operating on global variables a
, b
, t
:
Question: Do P
and Q
terminate and what is their outcome? What are the loop invariants?
Answer:
P
terminates with t ∧ a = 1
; the invariant is (¬t ∧ a = 0) ∨ (t ∧ a = 1)
;Q
terminates with ¬t ∧ b = 1
; the invariant is (t ∧ b = 0) ∨ (¬t ∧ b = 1)
.Question: Does P ‖ Q
terminate and what is the outcome?
Answer: P ‖ Q
may terminate with a ≥ 0 ∧ b ≥ 0
or may not terminate. The state diagram shows how the annotation of P
that is correct for sequential execution is invalidated by the concurrent execution of Q
:
Question: How can the annotation be modified such that the parallel execution will not invalidate the predicates?
Answer: Weaken the loop invariant in P
to a ≥ 0
and in Q
to b ≥ 0
.
Consider following parallel compositions:
The assignment x := y + z
is executed as ⟨x⟩ := ⟨y⟩ + ⟨z⟩
. On the left-hand side, the value taken for y
could be that before or after y := 3
and the value of z
could be that before or after z := 5
. However, on the right-hand side the effect of x := y + z
is the same as ⟨x := y + z⟩
, allowing a simpler state diagram:
The reason is that on the right-hand side, only one of the variables of y + z
is changed to in the parallel process.
In a parallel composition
algorithm
S₁ ‖ … ‖ Sₙ
an expression E
in Sᵢ
satisfies the at-most-once property if it refers at most once to a basic variable that may be changed in another process. An assignment x, y := E, F
in Sᵢ
satisfies the at-most-once property if it refers at most once to a basic variable (x
, y
, or in E
, F
) that may be changed in another process.
Expressions and assignment statements with the at-most-once property are evaluated or executed as if they were atomic. Thus in state diagrams they don't need to be broken up.
Question: which of the following assignments satisfy the at-most once property?
x := 3 ‖ x := x + 1
x := 3 ‖ y := 5 ‖ z := x + y
t, a := false, 0
in procedure P
above when considering P ‖ Q
t, a := true, a + 1
in procedure P
above when considering P ‖ Q
Answer:
x := 3
, that is ⟨x := 3⟩ ‖ x := x + 1
x := 3
and y := 5
, that is ⟨x := 3⟩ ‖ ⟨y := 5⟩ ‖ z := x + y
t
is changed in Q
, that is ⟨t, a := false, 0⟩
t
is changed in P
, that is ⟨t, a := true, a+1⟩
This retrospectively justifies the state diagram for P ‖ Q
.
An atomic statement A
that starts in state P
does not interfere with state Q
if A
preserves Q
, formally:
algorithm
{P ∧ Q} A {Q}
In state diagrams non-interference is visualized by a green dotted line. If A
is a guarded assignment statement, the condition for non-interference is:
if P ∧ B ∧ Q ⇒ Q[x, y := E, F]
then
</div>
Question: Consider again the state diagram for ⟨x := x + 1⟩ ‖ ⟨x := x + 2⟩
with labelled states. What are the conditions for non-interference? Give the proofs!
Answer: The conditions for x := x + 1
not interfering are: P₁
, P₂
P₁ ∧ P₂ ⇒ P₂[x := x + 1]
P₁ ∧ Q₂ ⇒ Q₂[x := x + 1]
The conditions for x := x + 2
not interfering are:
P₂ ∧ P₁ ⇒ P₁[x := x + 2]
P₂ ∧ Q₁ ⇒ Q₁[x := x + 2]
The proof of (1.) is:
algorithm
(x = 0 ∨ x = 2) ∧ (x = 0 ∨ x = 1) ⇒ (x = 0 ∨ x = 1)[x := x + 1]
≡ {by substitution, factoring out x = 0, simplification}
x = 0 ⇒ (x + 1 = 0 ∨ x + 1 = 1)
≡ {arithmetic}
x = 0 ⇒ (x = - 1 ∨ x = 0)
≡ {weakening / strengthening}
true
Rule for correctness of parallel composition:
Assume that statements S₁
, S₂
establish Q₁
, Q₂
under P₁
, P₂
, respectively:
algorithm
{P₁} S₁ {Q₁}
{P₂} S₂ {Q₂}
Their parallel composition will establish the conjunction of the postconditions under the conjunction of the preconditions,
algorithm
{P₁ ∧ P₂} S₁ ‖ S₂ {Q₁ ∧ Q₂}
provided that all atomic statements (transitions) S₁,ᵢ
in S₁
do not interfere with any state of S₂
and vice versa, visually:
The rule generalizes to more than two processes.
Condition synchronization delays one process until another establishes a certain condition. The fundamental mechanism is the await
statement:
algorithm
⟨await B then S⟩
</div>
The await statement "waits" until Boolean expression B
is true and then executes statement S
atomically. Formally, this reduces the number of interleavings. The body S
can also be empty:
algorithm
⟨await B⟩ = ⟨await B then skip⟩
For example:
Question: What is the difference between these two?
Answer: The first guarantees that s
will not be negative, the second not.
The await
statement is at the very core of every synchronization mechanism, even though few programming languages support it in full generality. One way to implement await B
is by spinning: the process keeps looping as long as B
is false.
Formally, the statement ⟨await B then S⟩
is just ⟨if B → S⟩
, so the same correctness rule holds:
Rule for correctness of await:
algorithm
{P} ⟨await B then S⟩ {Q}
</div>
if P ⇒ ∆B
and {P ∧ B'} S {Q}
</div>
If B
is false, the interpretation in concurrent programs is waiting until it becomes true, rather than staying in the initial state forever.
The guard of an await
statement can be associated with a condition that has to hold when the guard is true. Given a
and b
, consider computing s = a² + b²
. The variable d
indicates if processes H
is done with assigning b²
to t
:
algorithm
d := false
Here, d ⇒ t = b²
is needed in SumSquare
to ensure that finally s = a² + b²
.
All statements satisfy the at-most-once property; hence, in the state diagram, these become atomic transitions. The global invariant d ⇒ t = b²
is written across processes and is a shorthand for conjoining it to all substates:
With I
being d ⇒ t = b²
, this is equivalent to:
The correctness of the diagram follows from the correctness of the transitions and non-interference. We assume that all expressions are defined:
Correctness of initialization:
I[d := false]
Correctness of transitions in SumSquare
:
I ⇒ (I ∧ s = a²)[s := a × a]
I ∧ s = a² ∧ d ⇒ I ∧ s = a² ∧ t = b²
I ∧ s = a² ∧ t = b² ⇒ (I ∧ s = a² + b²)[s := s + t]
Correctness of transitions in H
:
I ⇒ (I ∧ t = b²)[t := b × b]
I ∧ t = b² ⇒ I[d := true]
Non-interference of s := a × a
with states of H
:
I ⇒ I[s := a × a]
I ∧ t = b² ⇒ (I ∧ t = b²)[s := a × a]
I ⇒ I[s := a × a]
...
Question: How many non-interference conditions are there?
Answer: Three transitions of SumSquare
must not interfere with three states of H
, so nine conditions;
two transitions of H
must not interfere with four states of SumSquare
, so eight conditions; in total there are 17
non-interference conditions.
Here is the proof of (3.):
algorithm
I ∧ s = a² ∧ d ⇒ I ∧ s = a² ∧ t = b²
≡ {definition of I, using P ⇒ P for any P}
(d ⇒ t = b²) ∧ s = a² ∧ d ⇒ t = b²
≡ {logic}
true
Here is the proof of (4.):
algorithm
I ∧ s = a² ∧ t = b² ⇒ (I ∧ s = a² + b²)[s := s + t]
≡ {by substitution, s does not occur in I}
I ∧ s = a² ∧ t = b² ⇒ (I ∧ s + t = a² + b²)
≡ {logic}
true
As a note, with I
being d ⇒ t = b²
, the annotated state diagram is equivalent to:
algorithm
d := false
{I}
A producer places objects into a (one-place) buffer. A consumer removes objects from the buffer. The producer has to wait until the buffer is empty before placing an object. Producers and consumers proceed in their own pace. The consumer has to wait until the buffer is not empty before removing the object. Assume buf: T
is a global variable:
algorithm
p, c := 0, 0
Question: Informally, what is the relation between p
and c
and what is it between a
and b
?
Answer:
p
and c
are either the same or p
is one larger than c
.c
elements of a
and b
are the same.The state diagram shows the required annotations to allow to conclude that upon termination of Consumer
, it will have a copy of array a
of Producer
in its own local array b
. Formally, the producer invariant PI
is conjoined to every state of Producer
, the consumer invariant CI
is conjoined to every state of Consumer
, and the global invariant PC
is conjoined to every state:
With PC
, PI
, CI
as above and with some simplifications, this is textually equivalent to:
algorithm
p, c := 0, 0
Each of the black solid transitions has to be correct in isolation. The producer invariant PI
does not contain variables that are modified by Consumer
and likewise the consumer invariant CI
does not contain variables that are modified by Producer
, so these are always preserved by transitions of the other process. The conditions for Consumer
are
algorithm
{PC ∧ CI ∧ c < N ∧ p > c}
b(c) := buf
{PC ∧ CI ∧ c < N ∧ p > c ∧ b(c) = a(c)}
algorithm
{PC ∧ CI ∧ c < N ∧ p > c ∧ b(c) = a(c)}
c := c + 1
{PC ∧ CI}
or equivalently:
PC ∧ CI ∧ c < N ∧ p > c ⇒ (PC ∧ CI ∧ c < N ∧ p > c ∧ b(c) = a(c))[b := (b; c: buf)]
PC ∧ CI ∧ c < N ∧ p > c ∧ b(c) = a(c) ⇒ (PC ∧ CI)[c := c + 1]
The green dotted arrows indicate which transitions of Consumer
may interfere with the states of Producer
. There are six non-interference conditions to check for the transition with b(c) := buf
; of those the three with dotted arrows going to "empty" states of Producer
are identical, leaving four conditions:
algorithm
{PC ∧ PI ∧ CI ∧ c < N ∧ p > c}
b(c) := buf
{true}
algorithm
{PC ∧ PI ∧ p < N ∧ CI ∧ c < N ∧ p > c}
b(c) := buf
{p < N}
algorithm
{PC ∧ PI ∧ p < N ∧ p = c ∧ CI ∧ c < N ∧ p > c}
b(c) := buf
{p < N ∧ p = c}
algorithm
{PC ∧ PI ∧ p < N ∧ p = c ∧ buf = a(p) ∧ CI ∧ c < N ∧ p > c}
b(c) := buf
{p < N ∧ p = c ∧ buf = a(p)}
These hold as b(c) := buf
does not modify any of the variables in the states that need to be preserved.
Similarly, there are four distinct non-interference conditions for the transition with c := c + 1
:
algorithm
{PC ∧ PI ∧ CI ∧ c < N ∧ p > c}
c := c + 1
{true}
algorithm
{PC ∧ PI ∧ p < N ∧ CI ∧ c < N ∧ p > c }
c := c + 1
{p < N}
algorithm
{PC ∧ PI ∧ p < N ∧ p = c ∧ CI ∧ c < N ∧ p > c }
c := c + 1
{p < N ∧ p = c}
algorithm
{PC ∧ PI ∧ p < N ∧ p = c ∧ buf =a(p) ∧ CI ∧ c < N ∧ p > c }
c := c + 1
{p < N ∧ p = c ∧ buf =a(p)}
The first two hold as c := c + 1
does not modify any of the variables in the states that need to be preserved. The last two hold as the precondition (antecedent) is false
: p = c
and p > c
cannot hold at the same time.
Formally, concurrent states P
and Q
exclude each other if ¬(P ∧ Q)
. For example, states p < N ∧ p = c
and c < N ∧ p > c
exclude each other.
The producer-consumer processes do not interfere with each other. In general, interference can be avoided by
⟨x := x + 1⟩ ‖ ⟨x := x + 2⟩
,a
, b
, t
, that could be ⟨while ¬t do t := true ; a := a + 1⟩
Question: Does the following program terminate? What is the postcondition if it ever terminates? (Answer below)
algorithm
n, t := 0, false
(while ¬t do n := n + 1) ‖ t := true
The minimal progress assumption implies only that one process has to make progress, if possible at all. Under minimal progress, above program may not terminate.
An unfair scheduler may select one process and neglect others. We may require a scheduler to be (weakly) fair:
Answer: Under fairness, above program does necessarily terminate; if it does, n
will be at least 0
.
Question: Does the following program terminate under a (weakly) fair scheduler?
algorithm
t, u := false, true
⟨await t then u := false⟩ ‖ while u do t := true ‖ while u do t := false
Answer: Under (weak) fairness, it does not necessarily terminate.
For termination, the scheduler would have to be strongly fair: if the guard of a conditional atomic region is repeatedly (but not necessarily continuously) true, it is taken.
Typically, schedulers of operating systems and programming languages are weakly fair. Strong fairness is more difficult to implement.
Every functional property of a (sequential or concurrent) program can be formulated as either a safety or a liveness property.
Safety: nothing bad will ever happen; all states of all traces are "not bad". Safety is expressed by invariance properties.
Liveness: something good will eventually happen; all traces will eventually lead to a "good" state.
Termination is a liveness property: a final state will eventually be reached.
Formally, safety properties are properties of individual states of traces (or finite sequences of states), and liveness properties are properties of infinite traces.
Question: What are the safety and liveness properties of a traffic intersection?
Answer:
Question: What is the outcome of P ‖ Q
? Do P
and Q
interfere with each other with the given annotation?
Answer: The parallel composition P ‖ Q
does not always terminate, but if it does, it establishes both postconditions; P
and Q
do not interfere with each other.
Without interference, safety properties are preserved in a parallel composition, but liveness, including termination not!
This is an example why concurrent programming is hard: testing P
and Q
separately can never find the problem; when testing P ‖ Q
, the problem may not always appear.
Mutual exclusion is typically implemented by locks that protect critical sections.
algorithm
process CSi
while true do
entry protocol
critical section
exit protocol
noncritical section
Any solution has to satisfy three safety properties and one liveness property:
Question: What are these properties at a traffic intersection? You may consider 4-way stops and traffic lights.
Answer:
Here is an abstract solution, i.e. with "coarse-grained" atomic regions; CS
is the critical section invariant:
algorithm
in1, in2 := false, false
{CS: ¬(in1 ∧ in2)}
Question: Do the four properties, mutual exclusion, no deadlock (livelock), no unnecessary delay, eventual entry hold?
Answer: Yes, yes, yes, no.
With the minimal progress assumption, it is possible that the process CS1
always run when the guard ¬in2
is true
but the process CS2
has no chance to run.
Exercise: Give the annotation that is required for the CS
invariant in a state diagram! Check both the correctness of transitions as well as non-interference.
The solution can be easily generalized to more processes by observing that only one of in1
, in2
, ...
can be true, and hence these can be replaced by a single variable lock
. The relationship is lock = in1 ∨ in2 ∨ ...
:
algorithm
lock := false
For implementation, programming languages do not have await
and atomicity statements. All processors have an atomic test-and-set instruction or an equivalent instruction:
algorithm
procedure TAS(lock: pointer to boolean) → (init: boolean)
⟨init := *lock ; *lock := true⟩
Here, *lock
refers to the variable pointed to by lock
; procedure TAS
returns a boolean result init
(in a processor, lock
and init
would be registers).
By employing a spin lock, eventual entry (property 4) is "more likely":
algorithm
process CS1
while true do
repeat init ← TAS(lock) until ¬init
critical section
lock := false
noncritical section
The loop repeat S until B
is formally equivalent to S ; while ¬B do S
.
The atomic region ⟨S⟩
can be implemented by
algorithm
CSenter
S
CSexit
provided that access to all variables of S
are protected with the same lock; atomic regions accessing disjoint variables can proceed in parallel.
The guarded atomic region ⟨await B then S⟩
can be implemented by
algorithm
CSenter
while ¬B do CSexit ; CSenter
S
CSexit
provided that access to all variables of B
and S
are protected with the same lock.
This involves spinning and may lead to memory contention when accessing the variables of B
. To reduce this, it is preferable to insert a delay:
algorithm
CSenter
while ¬B do CSexit ; delay ; CSenter
S
CSexit
A delay can be randomly chosen from an interval that is doubled each time, known as the exponential back-off protocol.
The Tie-Breaker Algorithm (Peterson's Algorithm) obtains a fair solution to the critical sections problem by recording which process had the last turn; in1
, in2
now mean that CS1
, CS2
want to enter their critical section:
algorithm
in1, in2, last := false, false, 1
The drawback of the tie-breaker algorithm is that it is difficult to generalize to more than two processes. The idea of the ticket algorithm is that each process first gets a ticket with the number in the queue. Assume turn: 0 .. N - 1 → integer
:
algorithm
number, next, turn := 1, 1, [0, ..., 0]
{next > 0 ∧
∀ i ∈ 0 .. N - 1 • (CS(i) in critical section ⇒ turn(i) = next) ∧
( turn(i) > 0 ⇒ (∀ j ∈ 0 .. N - 1 • j ≠ i ⇒ turn(i) ≠ turn(j)) ) }
process CS(i: 0 .. N - 1)
while true do
⟨turn(i) := number ; number := number + 1⟩
⟨await turn(i) = next⟩
critical section
⟨next := next + 1⟩
noncritical section
The parameter of CS
is a notation for process replication: N
copies of CS
are started, each with a different value of i
.
The ticket algorithm requires fetching and incrementing an integer as one atomic operation. Some processors have a fetch-and-add instruction specifically for this purpose.
The x86 processor has an XADD instruction that can be used with the LOCK prefix to achieve atomicity,
algorithm
LOCK XADD dst, src = ⟨dst, src := dst + src, dst⟩
where dst
is a register or memory location and src
is a register.
The ARM v7 processor splits atomic updates into two instructions, load-exclusive and store-exclusive, with the hardware monitoring what happens in between,
algorithm
LDREX dst, [src] = ⟨dst := Memory[src]⟩
STREX R, src, [dst] = ⟨Memory[dst], R := src, 0 ⫿ R := 1⟩
were dst
, src
, R
are registers and ⫿
is nondeterministic choice: if the memory location was modified in the meantime, R
is set to 1
, otherwise to 0
. An increment of Memory[ptr]
is performed in a loop:
algorithm
L: LDREX R1, [ptr] ; load exclusive and monitor Memory[ptr] to R1
ADD R1, R1, 1 ; add 1 to R1
STREX R2, R1, [ptr] ; attempt to store R1 to Memory[ptr] and R2 is set to 1 if Memory[ptr] was not modified in the meantime
CMPNE R2, #1 ; check if store-exclusive failed
BEQ L ; failed, retry by jumping back to L
algorithm
type __atomic_add_fetch (type *ptr, type val, int memorder)
where type
must be 1, 2, 4, or 8 bytes in length.Question: Does the ARM code guarantee fairness among processes trying to increment Memory[ptr]
?
Answer: No, it is not fair, but the chances of one process starving are negligible. It has a chance that one process is continuously selected and the other processes are starving, but the probability is negligible.
If a special instruction is not available, then an implementation of
algorithm
⟨turn(i) := number ; number := number + 1⟩
would be
algorithm
CSenter
turn(i) := number ; number := number + 1
CSexit
where CSenter
and CSexit
use a lock for variable number
. Although the contention for accessing number
may not be resolved in a fair way, it turns out that this implementation works sufficiently well in practice.
The ticket algorithm leads to memory contention if many processors are trying to access number
. The bakery algorithm does not rely on a single global counter and does not need a special fetch-and-add instruction. Assuming turn: 0 .. N - 1 → integer
, a coarse-grained solution is:
algorithm
turn := [0, ..., 0]
process CS(i: 0 .. N - 1)
while true do
⟨turn(i) := max(turn) + 1⟩
for j = 0 to N - 1 do
if i ≠ j then ⟨await turn(j) = 0 or turn(i) < turn(j)⟩
critical section
turn(i) := 0
noncritical section
Operation max(turn)
returns the maximum value in array turn
.
The max
operation would need to be implemented by a loop, making entry inefficient. If we were to implement
algorithm
turn(i) := max(turn) + 1
non-atomically, two processes, say i
and j
, may obtain the same value for turn(i)
and turn(j)
, leading to a deadlock in the await
statement.
This situation can be avoided by giving preference to one of the two or more processes with the same value of turn in some arbitrary way, for example by using the process number. The conditions of the await do not need to be evaluated atomically. Thus we can arrive at a solution that only relies on assignments being atomic:
algorithm
turn:= [0, ..., 0]
process CS(i: 0 .. N - 1)
while true do
turn(i) := max(turn) + 1
for j = 0 to N - 1 do
while turn(j) ≠ 0 and ( turn(i) > turn(j) or (turn(i) = turn(j) and i > j) ) do
skip
critical section
turn(i) := 0
noncritical section
Question: Does this guarantee fairness among processes trying to enter their critical section?
Answer: Yes, although processes with a lower process number may have an advantage.