File System vs Core Data: the image cache test

Code for this project is on GitHub

While doing a full re-write of Droplr's iOS app for the 2.0 launch, I couldn't find any good file/image caches out there had a particular feature I really wanted: extending item expiration whenever it's touched.

I set out to write my own — which wasn't that much of a challenge — but somewhere along the process I had this crazy idea that perhaps (SQLite-backed) Core Data would be a much better tool for the job:

  • No mismatch between cache index file and actual data stored;
  • Trivial querying;
  • Nice and easy object oriented code.

Being a structured data store with Object-Relational Mapping, it's only logical that it would be slower. Just how slower is what I wanted to find out.

One protocol to rule them all

The number one goal with this pet project was that both the file system and Core Data cache implementations had to present the exact same signature to the programmer, while being consistent in the way they work.

@protocol BBImageCache <NSObject>

// Returns the number of items in the cache
- (NSUInteger)itemsInCache;

// Clears all items in the cache
- (void)clearCache;

// Synchronizes all changes (persists to disk)
- (BOOL)synchronizeCache;

// Deletes all entries that have expired
- (NSUInteger)purgeStaleData;

// Saves an image with a key
- (BOOL)storeImage:(UIImage*)image forKey:(NSString*)key;

// Retrieves and touches (extends expiration) the image for the given key
- (UIImage*)imageForKey:(NSString*)key;

@end

Super simple stuff...

The File System cache implementation

The file system cache implementation consists of a folder with an index file (a NSDictionary serialized as a .plist file) and a file for each cache entry — the PNG representation of the UIImage stored (with UIImagePNGRepresentation()).

Also, for the sake of consistency, I used GCD's dispatch_sync blocks to ensure that no two cache operations would ever overlap.

A file system cache is created by calling:

- (id)initWithCacheName:(NSString*)cacheName andItemDuration:(NSTimeInterval)duration;

The cacheName parameter will determine the folder created for this cache (good idea if you want to keep separate caches) and the timeout parameter will control the expiration of items. When the cache is booted, it automatically purges stale data and synchronizes the index to disk. It is up to the programmer to perform these two calls again (purge & synchronize) when deemed appropriate, although a good time would be on application backgrounding/termination.

Synchronizing the cache merely serializes the cache index dictionary into a .plist file on the cache directory matching the cache's name.

Here's a sample of code for the retrieve & extend method:

- (UIImage*)imageForKey:(NSString*)key
{
    // cachePathForKey builds a string with the absolute file path for the image
    NSString* cachePathForKey = [self cachePathForKey:key];
    __block UIImage* image = nil;

    // _queue is a serial normal priority dispatch queue
    dispatch_sync(_queue, ^() {
        image = [UIImage imageWithContentsOfFile:cachePathForKey];

        // No image, bail out immediately
        if (image == nil) {
            return;
        }

        // New expiration date (_duration is configured on init)
        NSDate* newExpirationDate = [NSDate dateWithTimeIntervalSinceNow:_duration];

        // Write new expiration date for this file
        // Will override previous if set (even if expired)
        [_cacheEntries setObject:newExpirationDate forKey:key];
    });

    return image;
}

The Core Data cache implementation

The Core Data cache implementation revolves around the BBCacheEntry class:

@interface BBCacheEntry : NSManagedObject

@property(nonatomic, retain) NSString* key;
@property(nonatomic, retain) UIImage*  image;
@property(nonatomic, retain) NSDate*   expiration;

@end

Similarly to its file system sibling, an instance of this cache can be created with:

- (id)initWithContext:(NSManagedObjectContext*)context
      andItemDuration:(NSTimeInterval)duration

The managed object context must be passed along — makes this implementation easier to merge in a project that already has a Core Data model — and the duration parameter controls when an item is considered stale.

The rest is just standard Core Data code, fetching items by the key using the following predicate:

(...) = [NSPredicate predicateWithFormat:@"key like %@", key];

As a side note, I also needed a transformer for the UIImage property of the core data entity, which was a matter of subclassing NSValueTransformer and overriding a couple of simple methods:

@implementation BBImageTransformer

+ (Class)transformedValueClass
{
    return [NSData class];
}

+ (BOOL)allowsReverseTransformation
{
    return YES;
}

- (id)transformedValue:(id)value
{
    return (value == nil) ? nil : UIImagePNGRepresentation(value);
}

- (id)reverseTransformedValue:(id)value
{
    return (value == nil) ? nil : [UIImage imageWithData:value];
}

@end

Correctness check

Before jumping right away for the performance tests I wanted to make sure that both the implementations behaved as expected. On the project's BBRootViewController.m file, you'll find the method:

- (void)testCacheCorrectness:(id<BBImageCache>)cache

It's basically just a bunch of NSAssert() calls to ensure both caches work the same.

Performance tests — round 1

The first round of performance tests I ran was quite naive:

  1. Load an image into a UIImage;
  2. Store the image X times;
  3. Synchronize;
  4. Load the image X times;
  5. Clear and synchronize;
  6. Store + synchronize, X times;

The first results were very surprising (for X = 100):

(INFO) Testing coredata cache...
(INFO) Execution times:
Store:      253.87ms
Sync:       118.22ms
Load:       829.77ms
Clear&Sync: 32.76ms
Store&Sync: 2129.16ms
Item count: 100

(INFO) Testing filesystem cache...
(INFO) Execution times:
Store:      17981.17ms
Sync:       11.54ms
Load:       102.53ms
Clear&Sync: 119.54ms
Store&Sync: 18756.39ms
Item count: 100

Woah!

Contrary to popular belief, Core Data was apparently a lot faster at storing — even though it was, as expected, slower at everything else.

Clearly, something fishy was going on behind the scenes for the gap to be this big. Suspecting that it was Core Data caching the value transformation of the UIImage loaded at step 1, I rolled in a little something to break it.

Performance tests — round 2

In order to ensure that no other caching was ocurring behind the scenes with the Core Data transformations, I altered the process a little bit:

  1. Load an image into a UIImage and convert it to its PNG NSData representation;
  2. Create X UIImage instances from that NSData;
  3. Iterate through created images and store them;
  4. Synchronize;
  5. Load all the images stored;
  6. Clear and synchronize;
  7. Iterate through created images doing store + synchronize;

By creating a UIImage with [UIImage imageWithData:...] I'm basically ensuring that a different UIImage instance is created, thus avoiding Core Data's behind-the-scenes transformation caching. This is pretty much what would happen in a real scenario, where images are downloaded from some web server.

The results got a bit different (for X = 100):

(INFO) Testing coredata cache...
(INFO) Execution times (w/ image building):
Store:      829.00ms
Sync:       22605.57ms
Load:       1704.57ms
Clear&Sync: 121.96ms
Store&Sync: 25991.81ms
Item count: 100

(INFO) Testing filesystem cache...
(INFO) Execution times (w/ image building):
Store:      22830.87ms
Sync:       7.95ms
Load:       334.56ms
Clear&Sync: 109.37ms
Store&Sync: 23691.79ms
Item count: 100

While storing still remains very fast, it's when synchronization is performed that the Core Data cache takes the performance hit. The cause is pretty simple:

  • On the file system cache, every time an item is stored, the index is updated in memory but the file is actually written to disk; when the cache is synchronized, only the index is flushed to disk.
  • On the Core Data cache, it all remains in memory until the managed object context is saved; then it all gets persisted to disk.

Still, Core Data's store & sync is still pretty close to the file system implementation's speed. The reason for it not being faster than it is is that every time the store routine is called on the Core Data implementation, it actually has to check if an item with the same key already exists in the storage to avoid duplicates.

Conclusion

File system cache is, as expected, faster. Core Data falls shortly behind when storing (marginally slower) but load times are way higher when performing single random accesses.

For such a simple case Core Data functionality really doesn't pay up, so stick to the file system version.

Additional notes

  • Code in this project uses ARC;
  • Tests were performed on an iPhone 4S running iOS 5.0.1, an iPhone 3GS running iOS 5.0.1 and the 5.0 iPhone Simulator, proving to be consistent across devices;
  • Results shown are from a random run on the iPhone 4S;
  • All Core Data calls are optimized to not include fields when fetching data for operations where the fields are not needed;
  • Tests were ran with 10, 100 and 1000 iterations a couple of times to ensure the values were consistent;
  • The code in this project is not intended to be used directly in other projects (there's no point in having the BBImageCache protocol if you're opting for either of the implementations) even though all efforts were made to ensure it's correctness — think of it as an essay from which you can draw (or copy) code.