Atomics in Objective-C
Published on — Filed under cocoade, bare metal
This post talks about the use of OS low level atomic functions (compare-and-swap, fetch-and-increment) to achieve both wait and lock free, thread-safe concurrent algorithms in your iOS/OSX apps.
It introduces a few classes that abstract away the gory details of some of the atomic functions made available by libkern/OSAtomic.h
into conceptually simpler components that use Foundation
types.
You can find the project with all the code for this post on Timehop's GitHub.
Discussion up on HackerNews or just ping me on Twitter.
Disclaimer
If you've never heard of this stuff, don't fret, there's a good reason for that: you probably never have, nor ever will, need it.
Using the good ol' @synchronized()
will be performant enough. Hell, I've seen enough code that was prone to concurrency disaster (read thread-unsafe) work for years, fueled by prayer and blissful nescience — I call this "Success Oriented Programming: the fine art of completely disregarding everything that WILL go wrong". But I digress...
This post talks about practices that in the world of apps can be seen as mostly academic and/or PrematureOptimization® — at the end of the day, these are systems that need only to present (the illusion of) visual responsiveness to an evaluator that has a pretty terrible reaction time — you! — so as long as you keep @synchronized()
's short and simple (and away from the main thread) you should be fine.
That being said, it's fun stuff — a lot of constructs and components you build upon and take for granted end up relying on these lower level primitives, one way or another.
Throttler
There are plenty of examples of why these things are incredibly useful but let's go with one I really like: a throttler.
Consider you have a component, Foo
, that is responsible for performing some very expensive computations and/or IO access — doesn't really matter what, only that it is considered to be resource intensive and expensive. This component can be accessed by any number of threads at any time so you want to ensure that, in order not to bog the system, only N threads at a time can execute the expensive method. Any other attempts to execute it should immediately return — graceful fast fail.
Let's see how we'd go about building this thing using the tools at our disposal and their pitfalls/shortcomings.
Note
For the sake of brevity I won't get too pedantic about all the possible things that could be wrong and/or effectively be happening after the compiler optimizes the code, i.e. caching of ivars per thread, instruction reordering, etc. I'll stick to how broken (or not) each approach is at a simpler, more conceptual, level.
The SuccessOrientedProgramming™ way
Don't cringe too hard, I'm sure you've seen worse than this...
@implementation Foo {
NSInteger _counter;
}
- (BOOL)veryExpensiveMethod:(id)bar {
if (_counter++ > N) {
_counter--;
return NO;
}
// Critical section
_counter--;
return YES;
}
@end
This kind of thing will break in all sorts of ways when veryExpensiveMethod:
is called from multiple threads. Most common rookie mistake is to assume ++
and --
to be a single operation.
Hint: they're not; under any architecture.
The compiler will unwrap those into a read, an addition on a temporary variable and finally a write operation. Then there's no atomicity guarantee between the ++
and the --
on the first two lines (when access to the critical section fails), which can lead to counter becoming permanently positive or negative... It's just a trainwreck.
All it takes is the right interleaving of threads — and it will eventually happen — for that thing to let more than N threads into that critical section or to get to a state where it never lets threads in again.
The throw-a-@synchronized-at-it way
@implementation Foo {
- (BOOL)veryExpensiveMethod:(id)bar {
@synchronized (self) {
// Critical section
return YES;
}
}
@end
There. Problem solved!
Except now all others threads will get stuck waiting to go in — if that critical section takes a few seconds, the whole app will get stuck for a long time.
We just broke the contract of gracefully failing fast and this is no longer throttling but rather queuing in the worst possible way — by blocking threads.
ProTip™
Another horrible anti-pattern found here worth mentioning is synchronizing on
self
. Remember thatself
is the instance, which was created somewhere else. Who knows which other components hold references to it and what the hell they might be doing with it. All it takes is some of those unknown components to also synchronize on this reference and we have a potential deadlock time-bomb that's totally unrelated to this critical section.Never synchronize on
self
. Ever. I cannot stress this enough. If you must@synchronize
a block of code use an internal (ivar) dummyNSObject
as a mutex instead.
The slightly better synchronized version
@implementation Foo {
NSObject _counterMutex; // assume it's been properly initialized
NSInteger _counter;
}
- (BOOL)veryExpensiveMethod:(id)bar {
@synchronized (_counterMutex) {
if (_counter > N) return NO;
_counter++;
}
// Critical section
@synchronized (_counterMutex) {
_counter--;
}
return YES;
}
@end
Seems legit. We can "immediately" return when more than N threads are executing the critical section and we only update the counters when we're sure we can go ahead and execute the critical section.
Those quotes on "immediately" are intentional — there are now two contention spots in that single function. In a highly concurrent scenario, this means there will potentially be a lot of waiting — but not nearly as much as the previous debacle.
At the end of the day this would get the job done by being both correct and performant enough.
ProTip™
It's good practice to place the lock/mutex/semaphore definition right above the resource it's guarding. It's scientifically proven to help keeping other programmer's desire to murder you at acceptable levels.
We don't like performant enough. Not when it could be more performant!
Pause for manical laughter.
The atomic counter
If we had a class that exposed incrementAndGet
and decrementAndGet
operations as atomic (i.e. conceptually executed as a single or as a series of uninterrupted instructions, therefore thread-safe), we could rewrite the program without any locks whatsoever:
@implementation Foo {
AtomicInteger _counter; // assume it's initialized to 0
}
- (BOOL)veryExpensiveMethod:(id)bar {
if ([_counter incrementAndGetValue] > N) {
[_counter decrementAndGet];
return NO;
}
// Critical section
[_counter decrementAndGet];
return YES;
}
@end
And there you go. Lock-free, wait-free conditional access to a critical section.
Unlike ++
and --
, incrementAndGet
and decrementAndGet
are atomic, which means their values never become corrupted by how they're used above and they'll always end up in perfect balance:
- If we try to increment and the result is over the limit, we cannot gain access to the critical section — we decrement the counter (because we made the optimistic assumption that we could get in and incremented it) and bail out
- If we succeed at incrementing the counter below the limit, we enter the critical section and, at the end, decrement the counter again
// Assuming N = 2
T1: _counter 0 -> 1, OK
T2: _counter 1 -> 2, OK
T3: _counter 2 -> 3, Not-OK
T1: critical section
T2: critical section
T3: _counter 3 -> 2, return NO
T2: _counter 2 -> 1, return YES
T1: _counter 1 -> 0, return YES
Building a Throttler with atomic integers
Using AtomicInteger
, we could go ahead and clean up the code above to make it a bit more elegant by creating a Throttler
:
@implementation Throttler {
AtomicInteger _counter;
NSInteger _limit;
}
- (instancetype)initWithLimit:(NSInteger)limit {
self = [super init];
if (self != nil) {
NSParameterAssert(limit > 0);
_counter = [[AtomicInteger alloc] initWithValue:0];
_limit = limit;
}
return self;
}
// Returns YES if executed, NO otherwise
- (BOOL)throttledExec:(void (^)())block {
if ([_counter incrementAndGetValue] > _limit) {
[_counter decrementAndGetValue];
return NO;
}
block();
[_counter decrementAndGetValue];
return YES;
}
@end
And then rewrite Foo
's veryExpensiveMethod:
as:
- (BOOL)veryExpensiveMethod:(id)bar {
return [_throttler throttledExec:^{
// Critical section
}];
}
Busy flag (single access throttling)
If we were trying to ensure a single thread got access into that critical section there's an even cleaner way; using an AtomicBoolean
:
@implementation Foo {
AtomicBoolean _busy; // assume it's initialized to NO
}
- (BOOL)veryExpensiveMethod:(id)bar {
if (![_busy compareTo:NO andSetValue:YES]) return NO;
// Critical section
[_busy setValue:NO];
return YES;
}
@end
Here, instead of the AtomicInteger
's incrementAndGet
(which needs to be immediately balanced by a decrementAndGet
in case the value goes over N), we use the compareTo:andSetValue:
(the CAS operation) that attempts to atomically toggle _busy
from NO
to YES
if (and only if) _busy
is currently set to NO
, failing otherwise.
The fail-fast contract is preserved without incurring any contention.
Note
This is equivalent to a semaphore's try-acquire semantics. In fact, take a wild guess at how that'd be implemented underneath...
Since the code for AtomicBoolean
is simpler, let's go over it first, then jump on to AtomicInteger
.
Implementations
The AtomicBoolean class
This class is basically a wrapper for a BOOL that can be updated atomically.
@import Foundation;
@interface AtomicBoolean : NSObject
- (instancetype)initWithValue:(BOOL)value;
- (BOOL)getValue;
- (void)setValue:(BOOL)value;
- (BOOL)compareTo:(BOOL)expected andSetValue:(BOOL)value;
- (BOOL)getAndSetValue:(BOOL)value;
@end
All operations exposed by the interface are atomic. Worthy of mention:
getValue:
andsetValue:
are simple variable read and write operations, guaranteed to be atomic under all architectures (the ones you deal with, anyway) as long as the size of the type being read/written fits in a single register (up to 4 bytes in a 32bit archs, up to 8 bytes in 64bit archs)compareTo:andSetValue:
atomically sets a new value if the current value matches the expected value. ReturnsYES
if successful,NO
otherwisegetAndSetValue:
atomically updates the underlying value, returning the old value
#import <libkern/OSAtomic.h>
@implementation AtomicBoolean {
volatile int32_t _underlying;
}
- (instancetype)init {
return [self initWithValue:NO];
}
- (instancetype)initWithValue:(BOOL)value {
self = [super init];
if (self != nil) {
_underlying = value ? 1 : 0;
}
return self;
}
- (BOOL)getValue {
return _underlying != 0;
}
- (void)setValue:(BOOL)value {
// Same atomic guarantees as getValue.
_underlying = value ? 1 : 0;
}
- (BOOL)compareTo:(BOOL)expected andSetValue:(BOOL)value {
return OSAtomicCompareAndSwap32((expected ? 1 : 0),
(value ? 1 : 0),
&_underlying);
}
- (BOOL)getAndSetValue:(BOOL)value {
while (true) {
// We could do 'current = _underlying ? 1 : 0' to save a method call
// but it'll most likely end up being inlined anyway.
BOOL current = [self getValue];
if ([self compareTo:current andSetValue:value]) return current;
}
}
@end
You'll notice there's a strange while (true)
loop on getAndSetValue:
— that's a rather naive spinlock.
The following scenario with 2 threads depicts why it's necessary:
// _underlying initially at 0
// T1 setting value to 1, T2 to 2
T1: cur@T1 = _underlying; cur@T1 is 0
T2: cur@T2 = _underlying; cur@T2 is 0
T2: compareTo:cur@T2 andSet:2 // OK, _underlying was 0
T1: compareTo:cur@T1 andSet:1 // Fail, _underlying was 2, not 0
Again, this is very naive spinlock. A stronger alternative would be using an OSSpinLock
and the matching lock/unlock functions.
Note
If you don't need the
getAndSetValue:
semantics and want the extra performance, useOSAtomicCompareAndSwap()
or std::atomicdirectly as you'll avoid the objc_msgSend()
cost associated with calling methods on classes.
The AtomicInteger
The AtomicInteger
class is pretty much the same thing as AtomicBoolean
, with a few syntatic sugar operations we can throw in to make them simpler to use as counters:
// Return old value and update
- (NSInteger)getAndIncrementValue;
- (NSInteger)getAndDecrementValue;
- (NSInteger)getAndAddValue:(NSInteger)delta;
// Update and return new value
- (NSInteger)incrementAndGetValue;
- (NSInteger)decrementAndGetValue;
- (NSInteger)addAndGetValue:(NSInteger)delta;
The implementation itself is pretty simple, with all the code living under addAndGetValue:
and getAndAddValue:
— the remaining functions simply call these with either +1 or -1:
// Just like getAndSetValue: but with a delta instead of a value
- (NSInteger)getAndAddValue:(NSInteger)delta {
while (true) {
NSInteger current = [self getValue];
NSInteger next = current + delta;
if ([self compareTo:current andSetValue:next]) return current;
}
}
// The inverse of getAndAddValue:
- (NSInteger)addAndGetValue:(NSInteger)delta {
return OSAtomicAdd64(delta, &_underlying);
}
This class introduces the semantics of "set a new value and return it", which we didn't have before — we had getAndSetValue:
but no setAndGetValue:
.
While we could add a setAndGetValue:
, there's nothing we gain from it, as we already know what the value will be — we're providing it —, which doesn't hold true for addAndGetValue:
, since we can't atomically guarantee what the value was before the addition.
32/64bit architecture caveat
You probably noticed the "64" in OSAtomicAdd64
under addAndGetValue:
's implementation.
Similarly, compareTo:andSetValue:
uses OSAtomicCompareAndSwap64
- (BOOL)compareTo:(NSInteger)expected andSetValue:(NSInteger)value {
return OSAtomicCompareAndSwap64(expected, value, &_underlying);
}
If you merely implement it this way and try to run that code on an older phone, you'll see this:

The reason for this is that the pointer to that 64bit value can't fit in a single 32bit register; the CPU would need to hit two registers to read the full thing, thus losing the atomicity of the CAS operation.
NSInteger
itself is declared as a different type (read size) depending on which architecture it's being compiled for. If you peak into NSObjCRuntime.h
, you'll see:
#if __LP64__ || (TARGET_OS_EMBEDDED && !TARGET_OS_IPHONE) ||\
TARGET_OS_WIN32 || NS_BUILD_32_LIKE_64
typedef long NSInteger;
typedef unsigned long NSUInteger;
#else
typedef int NSInteger;
typedef unsigned int NSUInteger;
#endif
This means we need to modify a few parts of AtomicInteger
to make it both 32 and 64bit compatible:
#define Is64BitArch __LP64__ ||\
(TARGET_OS_EMBEDDED && !TARGET_OS_IPHONE) ||\
TARGET_OS_WIN32 || NS_BUILD_32_LIKE_64
@implementation AtomicInteger {
#if Is64BitArch
volatile int64_t _underlying;
#else
volatile int32_t _underlying;
#endif
}
//...
- (BOOL)compareTo:(NSInteger)expected andSetValue:(NSInteger)value {
#if Is64BitArch
return OSAtomicCompareAndSwap64(expected, value, &_underlying);
#else
return OSAtomicCompareAndSwap32(expected, value, &_underlying);
#endif
}
//...
- (NSInteger)addAndGetValue:(NSInteger)delta {
#if Is64BitArch
return (NSInteger)OSAtomicAdd64(delta, &_underlying);
#else
return (NSInteger)OSAtomicAdd32(delta, &_underlying);
#endif
}
@end
Et voilà!
We've now gone over atomic booleans and integers, which allow you operate on simple primitives. What if you want to compare objects?
Enter AtomicReference
, which allows you to atomically compare-and-swap the pointers (not the values) of two objects.
AtomicReference
With atomic references for objects we run into a different beast: retention counters. As we've all learned from the pre-ARC days when we were forced to manually retain
and (auto)release
, when an objects' retention counter reaches 0, it is deallocated — under ARC this means "when there are no more strong
references to it".
By looking at the signatures of all OSAtomicCompareAndSwap
functions, you'll see that the closest thing we have to deal with objects (NSObject
or subclasses) are the OSAtomicCompareAndSwapPtr
functions, which not only deal in 'void *
' and 'void *volatile
' types, but will also not, naturally, perform any retentions or releases on the objects whose pointers you pass it.
This means that we have to explicitly do two things:
- Transfer the ownership outside of ARC by casting to a C-style type (also known as bridging)
- Manually manage retentions and releases
First, let's look at our options, straight from the spec:
__bridge
(__bridge T) op
casts the operand to the destination typeT
. IfT
is a retainable object pointer type, then op must have a non-retainable pointer type. IfT
is a non-retainable pointer type, then op must have a retainable object pointer type. Otherwise the cast is ill-formed. There is no transfer of ownership, and ARC inserts no retain operations.Having no transfer of ownership means that as soon as the object to which the
_underlying
reference points is no longer retained elsewhere, accessing this pointer would cause anEXC_BAD_ACCESS
.In other words, we'd never really hold a strong reference to the wrapped object — which we need to.
__bridge_transfer
(__bridge_transfer T) op
casts the operand, which must have non-retainable pointer type, to the destination type, which must be a retainable object pointer type. ARC will release the value at the end of the enclosing full-expression, subject to the usual optimizations on local values.This is not even valid since
op
is retainable and the destination is non-retainable.__bridge__retained
(__bridge_retained T) op
casts the operand, which must have retainable object pointer type, to the destination type, which must be a non-retainable pointer type. ARC retains the value, subject to the usual optimizations on local values, and the recipient is responsible for balancing that +1.This one is actually useful. It adds a +1 retain count on
op
, which means that even if the all other external references to this object are lost, the object will always have this +1 — it also means that if we don't explicitly balance it out, we're going to have a leak.
So whenever we do this...
- (void)setObject:(id)object {
_underlying = (__bridge_retained void *)object;
}
... we'd actually have to ensure that if there was something on _underlying
, we'd have to properly release it with a call to CFRelease()
, which would look like this:
- (void)setObject:(id)object {
void *previous = _underlying;
_underlying = (__bridge_retained void *)object;
if (previous != nil) CFRelease(previous);
}
The problem with this code is that it's not thread-safe. Imagine two threads, T1 and T2 executing the instructions of setObject:
in a fully interleaved scenario — in the land of concurrency, anything can happen:
// _underlying holds a pointer to object A living at 0x10,
// T1 calls setObject: with an object B living at 0x20
// and T2 calls setObject: with an object C living at 0x30.
1. T1: previous = A@0x10
2. T2: previous = A@0x10
3. T1: _underlying = B@0x20 (B retain count +1)
4. T2: _underlying = C@0x30 (C retain count +1)
5. T1: CFRelease(A@0x10) (A retain count -1)
6. T2: CFRelease(A@0x10) (A retain count -1)
Two immediate problems:
- At line 3,
B@0x20
's retain counter is incremented but there is no balancing decrement as it is immediately shadowed by the write of T2 on line 4 — this leads to a memory leak - At line 6,
A@0x10
is manually decremented a second time, meaning it'll prematurely be deallocated, which will end up with anEXC_BAD_ACCESS
crash either immediately or somewhere else down the line in the application's lifetime
We could always throw a @synchronized
in there, but that would completely defeat the purpose of the whole AtomicReference
class — lock-free atomic guarantees.
ProTip™
This is why you need
atomic
in your@properties
if they're going to be accessed by multiple threads!
To solve this problem we actually need to go ahead and implement compareTo:andSetObject:
first.
Balancing releases
Let's look at the signature:
- (BOOL)compareTo:(id)expected andSetObject:(id)object;
We know that, under ARC, upon entering the method's scope, the retention counters on
expected
andobject
will be incremented (and, conversely, decremented upon exit), meaning they're guaranteed to live throughout the method's scope — i.e. we don't need to worry about them magically disappearing half-way throughWe know that in order for
compareTo:andSetObject:
to returnYES
,expected
must be the object that lives in the same memory address that_underlying
is referring to
By combining those two properties we know that if (and only if!) the CAS operation succeeded, we can safely emit the appropriate CFRelease
to expected
and CFRetain
to object
— we decrement the retain counter on the object to which _underlying
pointed to and we increment the retain counter on the new object that is taking its place.
- (BOOL)compareTo:(id)expected andSetObject:(id)object {
void *old = (__bridge void *)expected;
void *new = (__bridge void *)object;
BOOL swapped = OSAtomicCompareAndSwapPtr(old, new, &_underlying);
if (swapped) {
if (expected != nil) CFRelease(old);
if (object != nil) CFRetain(new);
}
return swapped;
}
Because we're now manually retaining and releasing, the setter needs to be slightly different — it must be similar to what an atomic
synthesized property would be. Luckily, we already have something like that built with getAndSetObject:
so all we need to do is call it:
- (void)setObject:(id)object {
[self getAndSetObject:object];
}
This will end up calling compareTo:andSetObject:
, which will optimistically (and naively) spinlock until we're able replace the value.
Note
If you look at the runtime's implementation of a property setter, you'll see the atomic flow will resort to a spinlock too — granted, a better one than a simple
while (true)
but we've covered that above.
One last thing we must also do is make sure we decrement the references to the object pointed by _underlying
when the enclosing atomic gets deallocated.
- (void)dealloc {
void *previous = _underlying;
if (previous != nil) CFRelease(previous);
}
Sample use case for atomic references
While I've found plenty of use cases for atomic integers and booleans, references are something that I've rarely ever needed. I did come across one interesting usage for it recently.
I had this class whose sole responsibility was emitting an HTTP request and notify the delegate about the success/error of said request. By design, it was supposed to execute a single request at any given time. I could have just used a @synchronized()
blocks to ensure this:
- (BOOL)launchRequest:(...)... {
// requestMutex is just a NSObject
@synchronized (self.requestMutex) {
if (self.currentRequest != nil) return NO;
id request = // create request
self.currentRequest = request;
[request start];
return YES;
}
}
- (BOOL)cancelActiveRequest {
@synchronized (self.requestMutex) {
if (self.currentRequest == nil) return NO;
[self.currentRequest cancel]
self.currentRequest = nil;
}
}
But with an AtomicReference
, it can be rewritten as:
- (BOOL)launchRequest:(...)... {
id request = // create the request
// _ref is an AtomicReference initialized on init
if ([_ref compareTo:nil andSetObject:request]) {
[request start];
return YES;
}
return NO;
}
- (BOOL)cancelActiveRequest {
id current = [_ref getAndSetObject:nil];
if (current == nil) return NO;
[current cancel];
return YES;
}
Basically, you start by building the request and then try to atomically compare the current request to nil
and set it to the newly created request — in other words, if the current request is nil, we can launch a new request.
Same goes for cancelling; you get-and-set the current request to nil
. If that method call returns a non-nil
object, it means there's an active request and we should cancel it.
Aside from being lock-free, feels a lot cleaner, doesn't it?
Note
Unless you need to use the compare-and-swap semantics, you'd gain nothing over using
@property (atomic)
.
Conclusion
Atomics make for a very compelling alternative to hard locking solutions due to their wait and lock-free properties, which makes them very fast.
They are not, however, concurrency silver bullets that guarantee the thread safety or correctness of your algorithms — sometimes you just can't replace a lock around a critical section of code.
These things are mostly used in what would otherwise be (or potentially be) high-contention, computationally heavy parts of programs.
It's very important to understand that every example in this article could have legitimately been solved with different concurrency primitives — like semaphores and locks — without any noticeable impact to a human playing around with your app.
If you're looking to extract that extra bit of performance, instead of using AtomicInteger
or AtomicBoolean
go straight for std::atomic<> since objc_msgSend()
— calling methods on classes — has a non-neglectable impact. Just make sure that when you do so, your code remains readable and self-explanatory.
At the end of the day, go with whatever makes your code easier to understand and maintain; optimize when you see the need for it.
Correctness is paramount.
Interesting links
"atomic<> Weapons" parts one and two (video) — long but loaded with gold; definitely worth watching, especially if you never learned about compilers, hardware and all the magic that happens between the code you write and what the machine ends up executing
The Synchronization section and the Thread safety summary appendix of the iOS Thread Programming Guide (on how initialization and retention functions are thread-safe)
Thread-safe Class design article on objc.io, by Peter Steinberger
Objective-C runtime code on GitHub
java.util.concurrent.atomic
package
Big thanks to Adam Wulf and Peter Steinberger for their valuable insight and critique.