Cocoa data serialization benchmark: Archive framework vs custom serialization

TL;DR: Custom serialization proposed here is ~2x faster than Cocoa Archive Framework. Results are here.


Companion code for this article is on GitHub

Applying the findings of my last benchmark comparing Core Data and File System storage, I went with a similar solution on a recent project where I didn't need all the firepower in Core Data. The twist this time, however, was that instead of simply storing a key-value pair (NSString, NSDate), I needed to store instances of a class that had multiple fields, indexed by one of its fields — (NSString, SomeClass).

Objective: writing to/reading from disk, as fast as possible

This essay compares Cocoa Archiving Framework against a custom serialization method, using Binary Property Lists on serialization/deserialization of a NSDictionary that contains multiple objects, indexed by one of their properties — I'll call this dictionary the object index.

Cocoa Archiving Framework is pretty straightforward:

  • Model class implements the NSCoding protocol;
  • Use NSKeyedArchiver/NSKeyedUnarchiver's one-liners to write/read the object index — the framework will take care of all the magic.

The custom serialization method requires a bit more work, both to serialize...

  1. Each object in object index must be converted to a NSDictionary instance;
  2. A copy of the object index must be created with the same keys but with the NSDictionary representations of the objects;
  3. The copy of the object index is then serialized to NSData using NSPropertyListSerialization;
  4. NSData is written to disk;

Custom serialization flow

... and to deserialize:

  1. NSData is read from disk;
  2. NSData is converted into a NSDictionary, using NSPropertyListSerialization;
  3. For each entry in this NSDictionary, convert it to the model to be instantiated;
  4. Create the object index with these deserialized instances.

Custom deserialization flow

The model

Let's assume we want to store instances of the class BBItem:

@interface BBItem : NSObject

@property(strong, nonatomic) NSString*  identifier;
@property(strong, nonatomic) NSDate*    createdAt;
@property(strong, nonatomic) NSString*  hash;
@property(strong, nonatomic) NSData*    data;
@property(strong, nonatomic) NSString*  optionalString;
@property(assign, nonatomic) NSUInteger views;
@property(assign, nonatomic) CGRect     displayInRect;

@end

It has pretty much all the stuff you might find on a regular model class:

  • Object fields
  • Optional object fields
  • Scalar fields

Note:
For the sake of brevity, I decided not to include fields with custom classes. I address this point in the end of the article, explaining how it could be done.

The repository

The purpose of the repository is to act as facade to store an arbitrary number of BBItem instances and query them by identifier. Hence, the interface of a repository class could be something like this:

@interface BBItemRepository : NSObject

- (NSUInteger)itemCount;
- (BBItem*)itemWithIdentifier:(NSString*)identifier;
- (void)addItem:(BBItem*)item;
- (void)removeItem:(BBItem*)item;
- (void)removeItemWithIdentifier:(NSString*)identifier;

@end

By looking at these operations, NSMutableDictionary immediately comes to mind as the perfect structure to hold and query this data under the hood. Also, since we want to persist data to disk, we need to add a couple of methods to load and flush data.

Here's how it would look after a slight upgrade to support these requirements:

@interface BBItemRepository : NSObject
{
@protected
    __strong NSMutableDictionary* _entries;
}

- (void)reload; // deserializes content from disk into memory
- (BOOL)flush;  // flushes all data in memory to disk (but keeps data in memory)

- (NSUInteger)itemCount;
- (BBItem*)itemWithIdentifier:(NSString*)identifier;
- (void)addItem:(BBItem*)item;
- (void)removeItem:(BBItem*)item;
- (void)removeItemWithIdentifier:(NSString*)identifier;

@end

Serialization and deserialization: how & when

In order to use this class, we must first call reload — a good place to do it would be on the app delegate's application:didFinishLaunchingWithOptions: method — and eventually call flush after performing some changes — good candidates would be applicationWillTerminate: and applicationDidEnterBackground: on the app delegate.

To simplify things for this particular case I wrote a default repository implementation, BBItemRepository, with no-op flush and reload methods — an in-memory repository.

I then subclass this BBItemRepository with BBPlistItemRepository (custom serialization) and BBArchiveItemRepository (Cocoa Archive serialization).

Note
This article will not cover the implementation of the query methods of the superclass. You can take a quick peek here.

NSKeyedArchiver/Unarchiver and NSCoding serialization

In order to use Cocoa's Archiving Framework, our class must implement the NSCoding protocol. This is a very straight-forward process, where we provide the current values of the properties to the encoder or set the properties' values by reading fields from the decoder.

@implementation BBItem

...

- (void)encodeWithCoder:(NSCoder*)coder
{
    // Object
    [coder encodeObject:_identifier forKey:@"identifier"];
    [coder encodeObject:_createdAt forKey:@"createdAt"];
    [coder encodeObject:_hash forKey:@"hash"];
    [coder encodeObject:_data forKey:@"data"];
    // Scalar
    [coder encodeInteger:_views forKey:@"views"];
    [coder encodeCGRect:_displayInRect forKey:@"displayInRect"];
    // Optional
    if (_optionalString != nil) {
        [coder encodeObject:_optionalString forKey:@"optionalString"];
    }
}

- (id)initWithCoder:(NSCoder*)decoder
{
    self = [super init];
    if (self != nil) {
        // Object
        self.identifier = [decoder decodeObjectForKey:@"identifier"];
        self.createdAt = [decoder decodeObjectForKey:@"createdAt"];
        self.hash = [decoder decodeObjectForKey:@"hash"];
        self.data = [decoder decodeObjectForKey:@"data"];
        // Scalar
        self.views = [decoder decodeIntegerForKey:@"views"];
        self.displayInRect = [decoder decodeCGRectForKey:@"displayInRect"];
        // Optional
        if ([decoder containsValueForKey:@"optionalString"]) {
            self.optionalString = [decoder decodeObjectForKey:@"optionalString"];
        }
    }
    return self;
}

...

@end

While tedious to write initially (and maintain, if you tend to change your models very frequently), it's a conceptually simple task.

Assuming we have an ivar named _archiveFilePath which was initialized with the path where the archive should sit, reading and flushing these items requires two one-liners:

@implementation BBArchiveItemRepository

...

- (void)reload
{
    [super reload]; // Ensures initialization of _entries (NSMutableDictionary)

    NSMutableDictionary* entries = [NSKeyedUnarchiver
                                    unarchiveObjectWithFile:_archiveFilePath];
    if (entries == nil) {
        return;
    }

    // Entries are not null, so assign to the ivar
    _entries = entries;
}

- (BOOL)flush
{
    // This pretty much does nothing but it's always
    // nice to call the superclass's method...
    if (![super flush]) {
        return NO;
    }

    return [NSKeyedArchiver
            archiveRootObject:_entries
            toFile:_archiveFilePath];
}

...

@end

And that's all there is to it.

Binary Property List (plist) serialization

Since an NSDictionary serialized/deserialized using Binary Plists can only contain objects of the classes NSData, NSString, NSArray, NSDictionary, NSDate and NSNumber, the conversion BBItem -> NSDictionary is a tad bit more cumbersome than using a NSCoder. Thus, by convention, our model will have two new methods:

@implementation BBItem

...

+ (BBItem*)itemFromDictionary:(NSDictionary*)dictionary
{
    BBItem* model = [[BBItem alloc] init];
    // Object - straight forward conversions, retrieved from
    // the dictionary without any further changes required
    model.identifier = [dictionary objectForKey:@"identifier"];
    model.createdAt = [dictionary objectForKey:@"createdAt"];
    model.hash = [dictionary objectForKey:@"hash"];
    model.data = [dictionary objectForKey:@"data"];
    // Scalar, require conversion from the objects stored in the NSDictionary
    NSNumber* viewsNumber = [dictionary objectForKey:@"views"];
    if (viewsNumber != nil) {
        model.views = [viewsNumber unsignedIntegerValue];
    }
    NSString* displayInRectAsString = [dictionary objectForKey:@"displayInRect"];
    if (displayInRectAsString != nil) {
        model.displayInRect = CGRectFromString(displayInRectAsString);
    }
    // Optional
    model.optionalString = [dictionary objectForKey:@"optionalString"];

    // Optionally, we can validate the model here
    if ((model.identifier == nil) ||
        (model.createdAt == nil) ||
        (model.hash == nil) ||
        (model.data == nil)) {
        return nil;
    }

    return model;
}

- (NSDictionary*)convertToDictionary
{
    NSMutableDictionary* dictionary = [NSMutableDictionary dictionaryWithObjectsAndKeys:
        // Object
        _identifier, @"identifier",
        _createdAt, @"createdAt",
        _hash, @"hash",
        _data, @"data",
        // Scalar
        [NSNumber numberWithUnsignedInteger:_views], @"views",
        NSStringFromCGRect(_displayInRect), @"displayInRect",
        nil];

    // Optional properties should be checked before
    // being committed to the NSDictionary
    if (_optionalString != nil) {
        [dictionary setValue:_optionalString forKey:@"optionalString"];
    }

    return dictionary;
}

...

@end

Using these methods, we convert each instance of BBItem in _entries into a NSDictionary representation. We then create another top-level index NSDictionary using the same keys as _entries but this time using the NSDictionary representations of the BBItems as values. Conversely, when reading from disk, we must create new BBItem instances from the NSDictionary values in the deserialized binary plist file.

Again, assuming we already have an ivar _indexFilePath which has been initialized with the path where the binary plist file is located, the reload and flush implementations of the BBPlistItemRepository are:

@implementation BBPlistItemRepository

...

- (void)reload
{
    [super reload];

    // Load the file as NSData
    NSData* dictionaryData = [NSData dataWithContentsOfFile:_indexFilePath];
    if (dictionaryData == nil) {
        return;
    }

    // Deserialize the contents of the file to an NSDictionary
    NSString* error = nil;
    NSDictionary* serializedEntries = [NSPropertyListSerialization
                                       propertyListFromData:dictionaryData
                                       mutabilityOption:NSPropertyListImmutable
                                       format:NULL errorDescription:&error];

    if (error != nil) {
        return;
    }

    // Convert each key-value pair (NSString, NSDictionary)
    // into our entries: (item.id, item)
    [serializedEntries enumerateKeysAndObjectsUsingBlock:^(NSString* key,
                                                           NSDictionary* serializedEntry,
                                                           BOOL* stop) {
        BBItem* item = [BBItem itemFromDictionary:serializedEntry];
        [_entries setObject:item forKey:key];
    }];
}

- (BOOL)flush
{
    if (![super flush]) {
        return NO;
    }

    // Convert each BBItem in the _entries dictionary to it's NSDictionary representation
    NSError* error = nil;
    NSMutableDictionary* serializedEntries = [NSMutableDictionary
                                              dictionaryWithCapacity:[_entries count]];
    [_entries enumerateKeysAndObjectsUsingBlock:^(NSString* key, BBItem* item, BOOL* stop) {
        NSDictionary* serializedEntry = [item convertToDictionary];
        [serializedEntries setObject:serializedEntry forKey:key];
    }];

    // Create NSData from the dictionary created above,
    // by serializing using binary property lists.
    NSData* dictionaryData = [NSPropertyListSerialization
                              dataWithPropertyList:serializedEntries
                              format:NSPropertyListBinaryFormat_v1_0
                              options:0 error:&error];
    if (error != nil) {
        return NO;
    }

    if (![dictionaryData writeToFile:_indexFilePath
                         options:NSDataWritingAtomic error:&error]) {
        return NO;
    }

    return YES;
}

...

@end

Even though this code only needs to be written once, it's significantly more complex than BBArchiveItemRepository.

Time to figure out whether the extra complexity actually pays up or not.

Benchmark description

The benchmark is pretty simple; in a typical usage of this repository, all the records would be loaded into memory on boot and flushed back to disk when the app enters background or is about to terminate.

It thus consists of profiling the executions of both reload and flush with varying numbers of items — all other operations will end up being query calls to a NSDictionary.

- (NSString*)testSpeed:(BBItemRepository*)repository
         withDummyData:(NSArray*)items
{
    // Make sure we have no content
    [repository reset];

    // items is a NSArray* filled with dummy BBItem instances
    for (BBItem* item in items) {
        [repository addItem:item];
    }

    // Time executions of flush (write to disk) and reload (read from disk)
    uint64_t flushNanoseconds = [BBProfiler profileBlock:^() {
        [repository flush];
    }];

    uint64_t reloadNanoseconds = [BBProfiler profileBlock:^() {
        [repository reload];
    }];

    // Clean it up again
    [repository reset];

    return ...;
}

Note
Just like in the Core Data vs File System comparison, I begin by ensuring both repositories work exactly as expected. These assertions can be found on the method testRepositoryCorrectness: of the class BBBenchmarkViewController.

Results

What we've all been waiting for:

Custom serialization, 100 items:
Flush:      17.95ms
Reload:     13.99ms

Cocoa Archive Framework, 100 items:
Flush:      37.80ms
Reload:     17.99ms

---

Custom serialization, 1000 items:
Flush:      137.68ms
Reload:     112.71ms

Cocoa Archive Framework, 1000 items:
Flush:      319.13ms
Reload:     204.75ms

---

Custom serialization, 10000 items:
Flush:      1391.21ms
Reload:     1221.77ms

Cocoa Archive Framework, 10000 items:
Flush:      3479.27ms
Reload:     2139.62ms

Over 2x faster when flushing to disk, almost 2x faster when loading from disk.

Conclusion

At the cost of a slightly more complex initial implementation, the custom serialization method proposed here does offer a significant speed boost when compared to Cocoa's Archive Framework.

The main reason behind this is that the custom serialization method simply creates different representations of the items when serializing, whereas with CAF, hidden under those handy one-liners, there is a lot of object graph voodoo going on; this naturally slows the process down.

It's very important to mention that this repository is not meant to scale past a few thousand records. For very large amounts of objects or very large objects, you should stick to Core Data.


Bonus round: using custom classes as properties on the model items

While Cocoa archiving framework takes care of object graphs, using the custom serialization method introduced in this article does not. If you plan on serializing complex graphs, it's probably better to use Cocoa's Archiving Framework.

The purpose of this custom serialization method is to be able to quickly serialize & deserialize either simple objects or simple trees of objects — quickly being the keyword.

With that said, you could easily add a custom class member to BBItem:

@interface BBItem : NSObject

...
@property(strong, nonatomic) BBSubItem* subItem;

@end

You'd need to:

  • Make sure BBSubItem has subItemFromDictionary: and convertToDictionary methods;
  • When serializing, BBItem needs to call its subitem convertToDictionary on its convertToDictionary;
  • When deserializing, BBItem needs to create its BBSubItem by using subItemFromDictionary on BBSubItem class.

Additional notes

  • Code in this project uses ARC;
  • Tests were performed on an iPhone 4S running iOS 5.1.1, an iPhone 3GS running iOS 5.0.1 and the 5.1 iPhone Simulator, proving to be consistent across devices;
  • Results shown are from a random run on the iPhone 4S;
  • Tests were ran with 100, 1000 and 10000 BBItem instances a couple of times to ensure the values were consistent.