As I was reviewing my last post, I realized it was long. So long I suspected most people would just give up after the first paragraphs — or until they noticed the scroll bar.
Being an in-depth technical post, there's only so much you can cut before it starts feeling like there are pieces missing. So I went with a TL;DR. Rationale was that if compelling results were immediately shown, people would bother to stick around to read the "how" & "why".
This blog is built with Jekyll so first thing I thought was "there's surely some plugin for that". Google said no.
The A/B test
The objective of this A/B test was to randomly show a TL;DR chapter and track each page view on Google Analytics with information on whether the element was displayed or not and analyze the effect of this element on the average time on page metric.
I inserted this right below Jekyll's headers on the post, before the rest of the article's text:
<div id="tldr">
<p>
<strong>TL;DR:</strong> Custom serialization proposed
here is ~2x faster than Cocoa Archive Framework.
Results are <a href="#results">here</a>.
</p>
<hr>
</div>
I'm not a big fan of introducing html into markdown posts, but in this case there was no choice.
Setting up Jekyll
After a bit of trial and error and some failed experiments, the solution was to have each post include an extra page template variable on the headers, which would be picked up by the top level template:
---
layout: post
title: "Cocoa data serialization benchmark: Archive framework vs custom serialization"
categories: [ cocoade ]
abtesting: [ tldr, 1 ]
---
That custom abtesting variable will be picked up by Jekyll and become available under page.abtesting.
You'll notice it's an array — let me remind you this is a very simplistic approach — with the values tldr and 1. This is information that will be passed along to Google Analytics, but we'll get there in a moment.
Displaying or hiding the TL;DR element with JavaScript
Whether this div was show or not had to be decided when the page loaded at the client side — Jekyll generates static html resources.
So, on default.html which is my top level template for Jekyll, I added a snippet to determine whether the page being displayed includes an A/B test — that is, if the post has that custom abtesting template variable set:
<!DOCTYPE html>
<html>
<head>
<title>A/B testing with Jekyll and Google Analytics</title>
<!-- other scripts, css -->
<script src="/js/biasedbit.js"></script>
{% if page.abtesting %}
<script type="text/javascript">
var abTestingId = '{{ page.abtesting[0] }}';
var abTestingSlot = {{ page.abtesting[1] }};
var abTestingValue = Math.random() < 0.5;
</script>
{% endif %}
</head>
<body>
<!-- content -->
<!-- google analytics script -->
</body>
If the page does include an A/B test, three variables will be generated:
abTestingId: the identifier of the A/B test;
abTestingSlot: the slot for the custom variable, required for Google Analytics;
abTestingValue: a randomly calculated boolean, that decides whether version A or B of the site will be shown.
To hide the element, I added a small snippet inside that biasedbit.js file:
$(document).ready(function() {
// Test if we're doing a/b testing on tl;dr
if ((abTestingId == 'tldr') && !abTestingValue) {
$('#tldr').remove();
}
}
This is the piece of code that will remove the element with id tldr from the page as soon as the document finishes loading.
Tracking page views on Google Analytics with custom variables
Up until now, we've taken care of randomly displaying or hiding an element with id tldr on the page, so all that's left is actually track this information so we can later view the results on GA. Using the 3 variables introduced above, we can add a small snippet before the call to _gaq.push(['_trackPageview']); — on the top level template, default.html:
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'YOUR-TRACKING-ID']);
// If we're doing A/B testing, make sure we include the proper variables
if (abTestingId) {
if (abTestingValue) {
_gaq.push(['_setCustomVar', abTestingSlot, abTestingId, 'A', 3]);
} else {
_gaq.push(['_setCustomVar', abTestingSlot, abTestingId, 'B', 3]);
}
}
_gaq.push(['_trackPageview']);
...
</script>
</body>
Which these four little changes I got A/B testing with tracking up and running.
Wrap up: querying the data
To view the data gathered with this method, I had to create two custom segments. You can do that by logging in to Google Analytics, head on to your site and click "Advanced Segments" and then "New Custom Segment". Create something like this:
Then do the same for the 'B' value.
Finally, open "Advanced Segments" again and and enable these two filters you just created. You will now see both types of traffic.
So, does the TL;DR thing work or not?
Nope. No difference:
I'll be testing this on more posts but I suspect that it won't make a difference. TL;DR or not, when people really are interested in in-depth articles, they'll stick around.
Note:
This post doesn't have the TL;DR experiment :)
Thanks to @pelf and @slitz for making me go from "Wouldn't it be nice if..." to actually trying this out.
Applying the findings of my last benchmark comparing Core Data and File System storage, I went with a similar solution on a recent project where I didn't need all the firepower in Core Data. The twist this time, however, was that instead of simply storing a key-value pair (NSString, NSDate), I needed to store instances of a class that had multiple fields, indexed by one of its fields — (NSString, SomeClass).
Objective: writing to/reading from disk, as fast as possible
This essay compares Cocoa Archiving Framework against a custom serialization method, using Binary Property Lists on serialization/deserialization of a NSDictionary that contains multiple objects, indexed by one of their properties — I'll call this dictionary the object index.
Cocoa Archiving Framework is pretty straightforward:
Model class implements the NSCoding protocol;
Use NSKeyedArchiver/NSKeyedUnarchiver's one-liners to write/read the object index — the framework will take care of all the magic.
The custom serialization method requires a bit more work, both to serialize...
Each object in object index must be converted to a NSDictionary instance;
A copy of the object index must be created with the same keys but with the NSDictionary representations of the objects;
The copy of the object index is then serialized to NSData using NSPropertyListSerialization;
NSData is written to disk;
... and to deserialize:
NSData is read from disk;
NSData is converted into a NSDictionary, using NSPropertyListSerialization;
For each entry in this NSDictionary, convert it to the model to be instantiated;
Create the object index with these deserialized instances.
The model
Let's assume we want to store instances of the class BBItem:
It has pretty much all the stuff you might find on a regular model class:
Object fields
Optional object fields
Scalar fields
Note:
For the sake of brevity, I decided not to include fields with custom classes. I address this point in the end of the article, explaining how it could be done.
The repository
The purpose of the repository is to act as facade to store an arbitrary number of BBItem instances and query them by identifier. Hence, the interface of a repository class could be something like this:
By looking at these operations, NSMutableDictionary immediately comes to mind as the perfect structure to hold and query this data under the hood. Also, since we want to persist data to disk, we need to add a couple of methods to load and flush data.
Here's how it would look after a slight upgrade to support these requirements:
@interfaceBBItemRepository : NSObject{@protected__strongNSMutableDictionary*_entries;}-(void)reload;// deserializes content from disk into memory-(BOOL)flush;// flushes all data in memory to disk (but keeps data in memory)-(NSUInteger)itemCount;-(BBItem*)itemWithIdentifier:(NSString*)identifier;-(void)addItem:(BBItem*)item;-(void)removeItem:(BBItem*)item;-(void)removeItemWithIdentifier:(NSString*)identifier;@end
Serialization and deserialization: how & when
In order to use this class, we must first call reload — a good place to do it would be on the app delegate's application:didFinishLaunchingWithOptions: method — and eventually call flush after performing some changes — good candidates would be applicationWillTerminate: and applicationDidEnterBackground: on the app delegate.
To simplify things for this particular case I wrote a default repository implementation, BBItemRepository, with no-op flush and reload methods — an in-memory repository.
I then subclass this BBItemRepository with BBPlistItemRepository (custom serialization) and BBArchiveItemRepository (Cocoa Archive serialization).
Note
This article will not cover the implementation of the query methods of the superclass. You can take a quick peek here.
NSKeyedArchiver/Unarchiver and NSCoding serialization
In order to use Cocoa's Archiving Framework, our class must implement the NSCoding protocol. This is a very straight-forward process, where we provide the current values of the properties to the encoder or set the properties' values by reading fields from the decoder.
While tedious to write initially (and maintain, if you tend to change your models very frequently), it's a conceptually simple task.
Assuming we have an ivar named _archiveFilePath which was initialized with the path where the archive should sit, reading and flushing these items requires two one-liners:
@implementationBBArchiveItemRepository...-(void)reload{[superreload];// Ensures initialization of _entries (NSMutableDictionary)NSMutableDictionary*entries=[NSKeyedUnarchiverunarchiveObjectWithFile:_archiveFilePath];if(entries==nil){return;}// Entries are not null, so assign to the ivar_entries=entries;}-(BOOL)flush{// This pretty much does nothing but it's always// nice to call the superclass's method...if(![superflush]){returnNO;}return[NSKeyedArchiverarchiveRootObject:_entriestoFile:_archiveFilePath];}...@end
And that's all there is to it.
Binary Property List (plist) serialization
Since an NSDictionary serialized/deserialized using Binary Plists can only contain objects of the classes NSData, NSString, NSArray, NSDictionary, NSDate and NSNumber, the conversion BBItem -> NSDictionary is a tad bit more cumbersome than using a NSCoder. Thus, by convention, our model will have two new methods:
@implementationBBItem...+(BBItem*)itemFromDictionary:(NSDictionary*)dictionary{BBItem*model=[[BBItemalloc]init];// Object - straight forward conversions, retrieved from// the dictionary without any further changes requiredmodel.identifier=[dictionaryobjectForKey:@"identifier"];model.createdAt=[dictionaryobjectForKey:@"createdAt"];model.hash=[dictionaryobjectForKey:@"hash"];model.data=[dictionaryobjectForKey:@"data"];// Scalar, require conversion from the objects stored in the NSDictionaryNSNumber*viewsNumber=[dictionaryobjectForKey:@"views"];if(viewsNumber!=nil){model.views=[viewsNumberunsignedIntegerValue];}NSString*displayInRectAsString=[dictionaryobjectForKey:@"displayInRect"];if(displayInRectAsString!=nil){model.displayInRect=CGRectFromString(displayInRectAsString);}// Optionalmodel.optionalString=[dictionaryobjectForKey:@"optionalString"];// Optionally, we can validate the model hereif((model.identifier==nil)||(model.createdAt==nil)||(model.hash==nil)||(model.data==nil)){returnnil;}returnmodel;}-(NSDictionary*)convertToDictionary{NSMutableDictionary*dictionary=[NSMutableDictionarydictionaryWithObjectsAndKeys:// Object_identifier,@"identifier",_createdAt,@"createdAt",_hash,@"hash",_data,@"data",// Scalar[NSNumbernumberWithUnsignedInteger:_views],@"views",NSStringFromCGRect(_displayInRect),@"displayInRect",nil];// Optional properties should be checked before// being committed to the NSDictionaryif(_optionalString!=nil){[dictionarysetValue:_optionalStringforKey:@"optionalString"];}returndictionary;}...@end
Using these methods, we convert each instance of BBItem in _entries into a NSDictionary representation. We then create another top-level index NSDictionary using the same keys as _entries but this time using the NSDictionary representations of the BBItems as values. Conversely, when reading from disk, we must create new BBItem instances from the NSDictionary values in the deserialized binary plist file.
Again, assuming we already have an ivar _indexFilePath which has been initialized with the path where the binary plist file is located, the reload and flush implementations of the BBPlistItemRepository are:
@implementationBBPlistItemRepository...-(void)reload{[superreload];// Load the file as NSDataNSData*dictionaryData=[NSDatadataWithContentsOfFile:_indexFilePath];if(dictionaryData==nil){return;}// Deserialize the contents of the file to an NSDictionaryNSString*error=nil;NSDictionary*serializedEntries=[NSPropertyListSerializationpropertyListFromData:dictionaryDatamutabilityOption:NSPropertyListImmutableformat:NULLerrorDescription:&error];if(error!=nil){return;}// Convert each key-value pair (NSString, NSDictionary)// into our entries: (item.id, item)[serializedEntriesenumerateKeysAndObjectsUsingBlock:^(NSString*key,NSDictionary*serializedEntry,BOOL*stop){BBItem*item=[BBItemitemFromDictionary:serializedEntry];[_entriessetObject:itemforKey:key];}];}-(BOOL)flush{if(![superflush]){returnNO;}// Convert each BBItem in the _entries dictionary to it's NSDictionary representationNSError*error=nil;NSMutableDictionary*serializedEntries=[NSMutableDictionarydictionaryWithCapacity:[_entriescount]];[_entriesenumerateKeysAndObjectsUsingBlock:^(NSString*key,BBItem*item,BOOL*stop){NSDictionary*serializedEntry=[itemconvertToDictionary];[serializedEntriessetObject:serializedEntryforKey:key];}];// Create NSData from the dictionary created above,// by serializing using binary property lists.NSData*dictionaryData=[NSPropertyListSerializationdataWithPropertyList:serializedEntriesformat:NSPropertyListBinaryFormat_v1_0options:0error:&error];if(error!=nil){returnNO;}if(![dictionaryDatawriteToFile:_indexFilePathoptions:NSDataWritingAtomicerror:&error]){returnNO;}returnYES;}...@end
Even though this code only needs to be written once, it's significantly more complex than BBArchiveItemRepository.
Time to figure out whether the extra complexity actually pays up or not.
Benchmark description
The benchmark is pretty simple; in a typical usage of this repository, all the records would be loaded into memory on boot and flushed back to disk when the app enters background or is about to terminate.
It thus consists of profiling the executions of both reload and flush with varying numbers of items — all other operations will end up being query calls to a NSDictionary.
-(NSString*)testSpeed:(BBItemRepository*)repositorywithDummyData:(NSArray*)items{// Make sure we have no content[repositoryreset];// items is a NSArray* filled with dummy BBItem instancesfor(BBItem*iteminitems){[repositoryaddItem:item];}// Time executions of flush (write to disk) and reload (read from disk)uint64_tflushNanoseconds=[BBProfilerprofileBlock:^(){[repositoryflush];}];uint64_treloadNanoseconds=[BBProfilerprofileBlock:^(){[repositoryreload];}];// Clean it up again[repositoryreset];return...;}
Note
Just like in the Core Data vs File System comparison, I begin by ensuring both repositories work exactly as expected. These assertions can be found on the method testRepositoryCorrectness: of the class BBBenchmarkViewController.
Over 2x faster when flushing to disk, almost 2x faster when loading from disk.
Conclusion
At the cost of a slightly more complex initial implementation, the custom serialization method proposed here does offer a significant speed boost when compared to Cocoa's Archive Framework.
The main reason behind this is that the custom serialization method simply creates different representations of the items when serializing, whereas with CAF, hidden under those handy one-liners, there is a lot of object graph voodoo going on; this naturally slows the process down.
It's very important to mention that this repository is not meant to scale past a few thousand records. For very large amounts of objects or very large objects, you should stick to Core Data.
Bonus round: using custom classes as properties on the model items
While Cocoa archiving framework takes care of object graphs, using the custom serialization method introduced in this article does not. If you plan on serializing complex graphs, it's probably better to use Cocoa's Archiving Framework.
The purpose of this custom serialization method is to be able to quickly serialize & deserialize either simple objects or simple trees of objects — quickly being the keyword.
With that said, you could easily add a custom class member to BBItem:
Make sure BBSubItem has subItemFromDictionary: and convertToDictionary methods;
When serializing, BBItem needs to call its subitem convertToDictionary on its convertToDictionary;
When deserializing, BBItem needs to create its BBSubItem by using subItemFromDictionary on BBSubItem class.
Additional notes
Code in this project uses ARC;
Tests were performed on an iPhone 4S running iOS 5.1.1, an iPhone 3GS running iOS 5.0.1 and the 5.1 iPhone Simulator, proving to be consistent across devices;
Results shown are from a random run on the iPhone 4S;
Tests were ran with 100, 1000 and 10000 BBItem instances a couple of times to ensure the values were consistent.
I usually think of twitter as place where (mostly) tech-savvy people get the chance to engage with each other.
From time to time I do end up looking at some absurd trending topic but, just like staring at the sun, I can't take more than 2 seconds of it. On such a brief and seldom contact with this alternate reality my brain does what it was trained to do with any piece of aberrant data: discard.
But on days like today — we pushed pro accounts at Droplr — I realize how much of a zombie land it really is. For hours on end, there's a flood of copies of copies of copies of tweets, every now and then randomly inserting a cliché hashtag.
The octothorpe attack. Spam. Bigtime. Yeah.
There's a small percentage of sane content producers and then the rest of the horde, wildly retweeting on zombie land.
A quick and easy way to generate a random password in Cocoa is to generate a random number using arc4random() and then convert that number into its Base 62 representation.
According to Google, converting to Base 62 doesn't seem to be very common in objc, so here's a handy piece of code, straight from DroplrKit — converts an unsigned long number in decimal base (radix 10) to string a representing that same number in base 62.
A couple of nights ago, my better half read me aloud a small excerpt of a book, which has been resounding in my head:
"Of course they are ugly!" Stein replies. "Picasso says that when you make a thing for the first time it is so complicated it is bound to be ugly — but those who follow, who do the same thing after you, do not have the worry of making it, so they can make it pretty and everyone can like it!"
Artificial Love, page 31 — Paul Shepheard
While valid for pretty much everything, this brief quote made me look at end-user software — nowadays more commonly known as apps — with different eyes.
Have you used the new [new twitter client]?
It's so much better (read: prettier) than [old twitter client].
So here's to all the unsung heroes who once built ugly software. I can only hope to one day look back and count myself amongst your ranks.