Atomics in Objective-C

Published on — Filed under cocoade, bare metal

This post talks about the use of OS low level atomic functions (compare-and-swap, fetch-and-increment) to achieve both wait and lock free, thread-safe concurrent algorithms in your iOS/OSX apps.

It introduces a few classes that abstract away the gory details of some of the atomic functions made available by libkern/OSAtomic.h into conceptually simpler components that use Foundation types.

You can find the project with all the code for this post on Timehop's GitHub.

Discussion up on HackerNews or just ping me on Twitter.

Disclaimer

If you've never heard of this stuff, don't fret, there's a good reason for that: you probably never have, nor ever will, need it.

Using the good ol' @synchronized() will be performant enough. Hell, I've seen enough code that was prone to concurrency disaster (read thread-unsafe) work for years, fueled by prayer and blissful nescience — I call this "Success Oriented Programming: the fine art of completely disregarding everything that WILL go wrong". But I digress...

This post talks about practices that in the world of apps can be seen as mostly academic and/or PrematureOptimization® — at the end of the day, these are systems that need only to present (the illusion of) visual responsiveness to an evaluator that has a pretty terrible reaction time — you! — so as long as you keep @synchronized()'s short and simple (and away from the main thread) you should be fine.

That being said, it's fun stuff — a lot of constructs and components you build upon and take for granted end up relying on these lower level primitives, one way or another.

Throttler

There are plenty of examples of why these things are incredibly useful but let's go with one I really like: a throttler.

Consider you have a component, Foo, that is responsible for performing some very expensive computations and/or IO access — doesn't really matter what, only that it is considered to be resource intensive and expensive. This component can be accessed by any number of threads at any time so you want to ensure that, in order not to bog the system, only N threads at a time can execute the expensive method. Any other attempts to execute it should immediately return — graceful fast fail.

Let's see how we'd go about building this thing using the tools at our disposal and their pitfalls/shortcomings.

Note

For the sake of brevity I won't get too pedantic about all the possible things that could be wrong and/or effectively be happening after the compiler optimizes the code, i.e. caching of ivars per thread, instruction reordering, etc. I'll stick to how broken (or not) each approach is at a simpler, more conceptual, level.

The SuccessOrientedProgramming™ way

Don't cringe too hard, I'm sure you've seen worse than this...

@implementation Foo {
  NSInteger _counter;
}

- (BOOL)veryExpensiveMethod:(id)bar {
  if (_counter++ > N) {
    _counter--;
    return NO;
  }
  // Critical section
  _counter--;
  return YES;
}
@end

This kind of thing will break in all sorts of ways when veryExpensiveMethod: is called from multiple threads. Most common rookie mistake is to assume ++ and -- to be a single operation.

Hint: they're not; under any architecture.

The compiler will unwrap those into a read, an addition on a temporary variable and finally a write operation. Then there's no atomicity guarantee between the ++ and the -- on the first two lines (when access to the critical section fails), which can lead to counter becoming permanently positive or negative... It's just a trainwreck.

All it takes is the right interleaving of threads — and it will eventually happen — for that thing to let more than N threads into that critical section or to get to a state where it never lets threads in again.

The throw-a-@synchronized-at-it way

@implementation Foo {
- (BOOL)veryExpensiveMethod:(id)bar {
  @synchronized (self) {
    // Critical section
    return YES;
  }
}
@end

There. Problem solved!

Except now all others threads will get stuck waiting to go in — if that critical section takes a few seconds, the whole app will get stuck for a long time.

We just broke the contract of gracefully failing fast and this is no longer throttling but rather queuing in the worst possible way — by blocking threads.

ProTip™

Another horrible anti-pattern found here worth mentioning is synchronizing on self. Remember that self is the instance, which was created somewhere else. Who knows which other components hold references to it and what the hell they might be doing with it. All it takes is some of those unknown components to also synchronize on this reference and we have a potential deadlock time-bomb that's totally unrelated to this critical section.

Never synchronize on self. Ever. I cannot stress this enough. If you must @synchronize a block of code use an internal (ivar) dummy NSObject as a mutex instead.

The slightly better synchronized version

@implementation Foo {
  NSObject _counterMutex; // assume it's been properly initialized
  NSInteger _counter;
}

- (BOOL)veryExpensiveMethod:(id)bar {
  @synchronized (_counterMutex) {
    if (_counter > N) return NO;
    _counter++;
  }
  // Critical section
  @synchronized (_counterMutex) {
    _counter--;
  }
  return YES;
}
@end

Seems legit. We can "immediately" return when more than N threads are executing the critical section and we only update the counters when we're sure we can go ahead and execute the critical section.

Those quotes on "immediately" are intentional — there are now two contention spots in that single function. In a highly concurrent scenario, this means there will potentially be a lot of waiting — but not nearly as much as the previous debacle.

At the end of the day this would get the job done by being both correct and performant enough.

ProTip™

It's good practice to place the lock/mutex/semaphore definition right above the resource it's guarding. It's scientifically proven to help keeping other programmer's desire to murder you at acceptable levels.

We don't like performant enough. Not when it could be more performant!

Pause for manical laughter.

The atomic counter

If we had a class that exposed incrementAndGet and decrementAndGet operations as atomic (i.e. conceptually executed as a single or as a series of uninterrupted instructions, therefore thread-safe), we could rewrite the program without any locks whatsoever:

@implementation Foo {
  AtomicInteger _counter; // assume it's initialized to 0
}

- (BOOL)veryExpensiveMethod:(id)bar {
  if ([_counter incrementAndGetValue] > N) {
    [_counter decrementAndGet];
    return NO;
  }
  // Critical section
  [_counter decrementAndGet];
  return YES;
}
@end

And there you go. Lock-free, wait-free conditional access to a critical section.

Unlike ++ and --, incrementAndGet and decrementAndGet are atomic, which means their values never become corrupted by how they're used above and they'll always end up in perfect balance:

// Assuming N = 2
T1: _counter 0 -> 1, OK
T2: _counter 1 -> 2, OK
T3: _counter 2 -> 3, Not-OK
T1: critical section
T2: critical section
T3: _counter 3 -> 2, return NO
T2: _counter 2 -> 1, return YES
T1: _counter 1 -> 0, return YES

Building a Throttler with atomic integers

Using AtomicInteger, we could go ahead and clean up the code above to make it a bit more elegant by creating a Throttler:

@implementation Throttler {
  AtomicInteger _counter;
  NSInteger _limit;
}

- (instancetype)initWithLimit:(NSInteger)limit {
  self = [super init];
  if (self != nil) {
    NSParameterAssert(limit > 0);
    _counter = [[AtomicInteger alloc] initWithValue:0];
    _limit = limit;
  }
  return self;
}

// Returns YES if executed, NO otherwise
- (BOOL)throttledExec:(void (^)())block {
  if ([_counter incrementAndGetValue] > _limit) {
    [_counter decrementAndGetValue];
    return NO;
  }
  block();
  [_counter decrementAndGetValue];
  return YES;
}
@end

And then rewrite Foo's veryExpensiveMethod: as:

- (BOOL)veryExpensiveMethod:(id)bar {
  return [_throttler throttledExec:^{
    // Critical section
  }];
}

Busy flag (single access throttling)

If we were trying to ensure a single thread got access into that critical section there's an even cleaner way; using an AtomicBoolean:

@implementation Foo {
  AtomicBoolean _busy; // assume it's initialized to NO
}

- (BOOL)veryExpensiveMethod:(id)bar {
  if (![_busy compareTo:NO andSetValue:YES]) return NO;
  // Critical section
  [_busy setValue:NO];
  return YES;
}
@end

Here, instead of the AtomicInteger's incrementAndGet (which needs to be immediately balanced by a decrementAndGet in case the value goes over N), we use the compareTo:andSetValue: (the CAS operation) that attempts to atomically toggle _busy from NO to YES if (and only if) _busy is currently set to NO, failing otherwise.

The fail-fast contract is preserved without incurring any contention.

Note

This is equivalent to a semaphore's try-acquire semantics. In fact, take a wild guess at how that'd be implemented underneath...

Since the code for AtomicBoolean is simpler, let's go over it first, then jump on to AtomicInteger.

Implementations

The AtomicBoolean class

This class is basically a wrapper for a BOOL that can be updated atomically.

@import Foundation;

@interface AtomicBoolean : NSObject
- (instancetype)initWithValue:(BOOL)value;
- (BOOL)getValue;
- (void)setValue:(BOOL)value;
- (BOOL)compareTo:(BOOL)expected andSetValue:(BOOL)value;
- (BOOL)getAndSetValue:(BOOL)value;
@end

All operations exposed by the interface are atomic. Worthy of mention:

#import <libkern/OSAtomic.h>

@implementation AtomicBoolean {
  volatile int32_t _underlying;
}

- (instancetype)init {
  return [self initWithValue:NO];
}

- (instancetype)initWithValue:(BOOL)value {
  self = [super init];
  if (self != nil) {
    _underlying = value ? 1 : 0;
  }
  return self;
}

- (BOOL)getValue {
  return _underlying != 0;
}

- (void)setValue:(BOOL)value {
  // Same atomic guarantees as getValue.
  _underlying = value ? 1 : 0;
}

- (BOOL)compareTo:(BOOL)expected andSetValue:(BOOL)value {
  return OSAtomicCompareAndSwap32((expected ? 1 : 0),
                                  (value ? 1 : 0),
                                  &_underlying);
}

- (BOOL)getAndSetValue:(BOOL)value {
  while (true) {
    // We could do 'current = _underlying ? 1 : 0' to save a method call
    // but it'll most likely end up being inlined anyway.
    BOOL current = [self getValue];
    if ([self compareTo:current andSetValue:value]) return current;
  }
}

@end

You'll notice there's a strange while (true) loop on getAndSetValue: — that's a rather naive spinlock.

The following scenario with 2 threads depicts why it's necessary:

// _underlying initially at 0
// T1 setting value to 1, T2 to 2
T1: cur@T1 = _underlying; cur@T1 is 0
T2: cur@T2 = _underlying; cur@T2 is 0
T2: compareTo:cur@T2 andSet:2 // OK, _underlying was 0
T1: compareTo:cur@T1 andSet:1 // Fail, _underlying was 2, not 0

Again, this is very naive spinlock. A stronger alternative would be using an OSSpinLock and the matching lock/unlock functions.

Note

If you don't need the getAndSetValue: semantics and want the extra performance, use OSAtomicCompareAndSwap() or std::atomic directly as you'll avoid the objc_msgSend() cost associated with calling methods on classes.

The AtomicInteger

The AtomicInteger class is pretty much the same thing as AtomicBoolean, with a few syntatic sugar operations we can throw in to make them simpler to use as counters:

// Return old value and update
- (NSInteger)getAndIncrementValue;
- (NSInteger)getAndDecrementValue;
- (NSInteger)getAndAddValue:(NSInteger)delta;

// Update and return new value
- (NSInteger)incrementAndGetValue;
- (NSInteger)decrementAndGetValue;
- (NSInteger)addAndGetValue:(NSInteger)delta;

The implementation itself is pretty simple, with all the code living under addAndGetValue: and getAndAddValue: — the remaining functions simply call these with either +1 or -1:

// Just like getAndSetValue: but with a delta instead of a value
- (NSInteger)getAndAddValue:(NSInteger)delta {
  while (true) {
    NSInteger current = [self getValue];
    NSInteger next = current + delta;
    if ([self compareTo:current andSetValue:next]) return current;
  }
}

// The inverse of getAndAddValue:
- (NSInteger)addAndGetValue:(NSInteger)delta {
  return OSAtomicAdd64(delta, &_underlying);
}

This class introduces the semantics of "set a new value and return it", which we didn't have before — we had getAndSetValue: but no setAndGetValue:.

While we could add a setAndGetValue:, there's nothing we gain from it, as we already know what the value will be — we're providing it —, which doesn't hold true for addAndGetValue:, since we can't atomically guarantee what the value was before the addition.

32/64bit architecture caveat

You probably noticed the "64" in OSAtomicAdd64 under addAndGetValue:'s implementation.

Similarly, compareTo:andSetValue: uses OSAtomicCompareAndSwap64

- (BOOL)compareTo:(NSInteger)expected andSetValue:(NSInteger)value {
  return OSAtomicCompareAndSwap64(expected, value, &_underlying);
}

If you merely implement it this way and try to run that code on an older phone, you'll see this:

Crash when using 64bit atomic operations on 32bit architectures
Crash when using 64bit atomic operations on 32bit architectures

The reason for this is that the pointer to that 64bit value can't fit in a single 32bit register; the CPU would need to hit two registers to read the full thing, thus losing the atomicity of the CAS operation.

NSInteger itself is declared as a different type (read size) depending on which architecture it's being compiled for. If you peak into NSObjCRuntime.h, you'll see:

#if __LP64__ || (TARGET_OS_EMBEDDED && !TARGET_OS_IPHONE) ||\
    TARGET_OS_WIN32 || NS_BUILD_32_LIKE_64
typedef long NSInteger;
typedef unsigned long NSUInteger;
#else
typedef int NSInteger;
typedef unsigned int NSUInteger;
#endif

This means we need to modify a few parts of AtomicInteger to make it both 32 and 64bit compatible:

#define Is64BitArch __LP64__ ||\
                    (TARGET_OS_EMBEDDED && !TARGET_OS_IPHONE) ||\
                    TARGET_OS_WIN32 || NS_BUILD_32_LIKE_64

@implementation AtomicInteger {
#if Is64BitArch
  volatile int64_t _underlying;
#else
  volatile int32_t _underlying;
#endif
}
//...
- (BOOL)compareTo:(NSInteger)expected andSetValue:(NSInteger)value {
#if Is64BitArch
  return OSAtomicCompareAndSwap64(expected, value, &_underlying);
#else
  return OSAtomicCompareAndSwap32(expected, value, &_underlying);
#endif
}
//...
- (NSInteger)addAndGetValue:(NSInteger)delta {
#if Is64BitArch
    return (NSInteger)OSAtomicAdd64(delta, &_underlying);
#else
    return (NSInteger)OSAtomicAdd32(delta, &_underlying);
#endif
}
@end

Et voilà!

We've now gone over atomic booleans and integers, which allow you operate on simple primitives. What if you want to compare objects?

Enter AtomicReference, which allows you to atomically compare-and-swap the pointers (not the values) of two objects.

AtomicReference

With atomic references for objects we run into a different beast: retention counters. As we've all learned from the pre-ARC days when we were forced to manually retain and (auto)release, when an objects' retention counter reaches 0, it is deallocated — under ARC this means "when there are no more strong references to it".

By looking at the signatures of all OSAtomicCompareAndSwap functions, you'll see that the closest thing we have to deal with objects (NSObject or subclasses) are the OSAtomicCompareAndSwapPtr functions, which not only deal in 'void *' and 'void *volatile' types, but will also not, naturally, perform any retentions or releases on the objects whose pointers you pass it.

This means that we have to explicitly do two things:

  1. Transfer the ownership outside of ARC by casting to a C-style type (also known as bridging)
  2. Manually manage retentions and releases

First, let's look at our options, straight from the spec:

So whenever we do this...

- (void)setObject:(id)object {
  _underlying = (__bridge_retained void *)object;
}

... we'd actually have to ensure that if there was something on _underlying, we'd have to properly release it with a call to CFRelease(), which would look like this:

- (void)setObject:(id)object {
  void *previous = _underlying;
  _underlying = (__bridge_retained void *)object;
  if (previous != nil) CFRelease(previous);
}

The problem with this code is that it's not thread-safe. Imagine two threads, T1 and T2 executing the instructions of setObject: in a fully interleaved scenario — in the land of concurrency, anything can happen:

// _underlying holds a pointer to object A living at 0x10,
// T1 calls setObject: with an object B living at 0x20
// and T2 calls setObject: with an object C living at 0x30.
1. T1: previous = A@0x10
2. T2: previous = A@0x10 
3. T1: _underlying = B@0x20 (B retain count +1)
4. T2: _underlying = C@0x30 (C retain count +1)
5. T1: CFRelease(A@0x10) (A retain count -1)
6. T2: CFRelease(A@0x10) (A retain count -1)

Two immediate problems:

We could always throw a @synchronized in there, but that would completely defeat the purpose of the whole AtomicReference class — lock-free atomic guarantees.

ProTip™

This is why you need atomic in your @properties if they're going to be accessed by multiple threads!

To solve this problem we actually need to go ahead and implement compareTo:andSetObject: first.

Balancing releases

Let's look at the signature:

- (BOOL)compareTo:(id)expected andSetObject:(id)object;
  1. We know that, under ARC, upon entering the method's scope, the retention counters on expected and object will be incremented (and, conversely, decremented upon exit), meaning they're guaranteed to live throughout the method's scope — i.e. we don't need to worry about them magically disappearing half-way through

  2. We know that in order for compareTo:andSetObject: to return YES, expected must be the object that lives in the same memory address that _underlying is referring to

By combining those two properties we know that if (and only if!) the CAS operation succeeded, we can safely emit the appropriate CFRelease to expected and CFRetain to object — we decrement the retain counter on the object to which _underlying pointed to and we increment the retain counter on the new object that is taking its place.

- (BOOL)compareTo:(id)expected andSetObject:(id)object {
    void *old = (__bridge void *)expected;
    void *new = (__bridge void *)object;

    BOOL swapped = OSAtomicCompareAndSwapPtr(old, new, &_underlying);
    if (swapped) {
        if (expected != nil) CFRelease(old);
        if (object != nil) CFRetain(new);
    }

    return swapped;
}

Because we're now manually retaining and releasing, the setter needs to be slightly different — it must be similar to what an atomic synthesized property would be. Luckily, we already have something like that built with getAndSetObject: so all we need to do is call it:

- (void)setObject:(id)object {
  [self getAndSetObject:object];
}

This will end up calling compareTo:andSetObject:, which will optimistically (and naively) spinlock until we're able replace the value.

Note

If you look at the runtime's implementation of a property setter, you'll see the atomic flow will resort to a spinlock too — granted, a better one than a simple while (true) but we've covered that above.

One last thing we must also do is make sure we decrement the references to the object pointed by _underlying when the enclosing atomic gets deallocated.

- (void)dealloc {
    void *previous = _underlying;
    if (previous != nil) CFRelease(previous);
}

Sample use case for atomic references

While I've found plenty of use cases for atomic integers and booleans, references are something that I've rarely ever needed. I did come across one interesting usage for it recently.

I had this class whose sole responsibility was emitting an HTTP request and notify the delegate about the success/error of said request. By design, it was supposed to execute a single request at any given time. I could have just used a @synchronized() blocks to ensure this:

- (BOOL)launchRequest:(...)... {
  // requestMutex is just a NSObject
  @synchronized (self.requestMutex) {
    if (self.currentRequest != nil) return NO;

    id request = // create request
    self.currentRequest = request;
    [request start];

    return YES;
  }
}

- (BOOL)cancelActiveRequest {
  @synchronized (self.requestMutex) {
    if (self.currentRequest == nil) return NO;

    [self.currentRequest cancel]
    self.currentRequest = nil;
  }
}

But with an AtomicReference, it can be rewritten as:

- (BOOL)launchRequest:(...)... {
  id request = // create the request
  // _ref is an AtomicReference initialized on init
  if ([_ref compareTo:nil andSetObject:request]) {
    [request start];
    return YES;
  }

  return NO;
}

- (BOOL)cancelActiveRequest {
  id current = [_ref getAndSetObject:nil];
  if (current == nil) return NO;

  [current cancel];
  return YES;
}

Basically, you start by building the request and then try to atomically compare the current request to nil and set it to the newly created request — in other words, if the current request is nil, we can launch a new request.

Same goes for cancelling; you get-and-set the current request to nil. If that method call returns a non-nil object, it means there's an active request and we should cancel it.

Aside from being lock-free, feels a lot cleaner, doesn't it?

Note

Unless you need to use the compare-and-swap semantics, you'd gain nothing over using @property (atomic).

Conclusion

Atomics make for a very compelling alternative to hard locking solutions due to their wait and lock-free properties, which makes them very fast.

They are not, however, concurrency silver bullets that guarantee the thread safety or correctness of your algorithms — sometimes you just can't replace a lock around a critical section of code.

These things are mostly used in what would otherwise be (or potentially be) high-contention, computationally heavy parts of programs.

It's very important to understand that every example in this article could have legitimately been solved with different concurrency primitives — like semaphores and locks — without any noticeable impact to a human playing around with your app.

If you're looking to extract that extra bit of performance, instead of using AtomicInteger or AtomicBoolean go straight for std::atomic<> since objc_msgSend() — calling methods on classes — has a non-neglectable impact. Just make sure that when you do so, your code remains readable and self-explanatory.

At the end of the day, go with whatever makes your code easier to understand and maintain; optimize when you see the need for it.

Correctness is paramount.

Interesting links


Big thanks to Adam Wulf and Peter Steinberger for their valuable insight and critique.