This is an article about how I set out to write a Binary Serializer for Silverlight and what I learned along the way. I have some very large Reference Data entity sets in isolated storage and loading these sets so I can run LINQ queries against them has been extremely painful – I don’t want to have to wait 30 seconds before my feature is usable. My first analysis lead me to the conclusion that disk IO was my main performance bottleneck and that drastically reducing the file sizes by using Binary Serialization would be just what the doctor ordered.
[Note: if you’ve seen me speak recently, yes I’ve been talking about posting this for quite a while]
Design goals:
- Uses familiar DataContract/DataMember/KnownType semantics – use something that my types are already decorated with and that developers are already familiar with.
- No fixed buffer size for strings – I have seen some other Silverlight/Binary Serialization schemes that use fixed buffer sizes for strings. This is undesirable.
- FAST – I wanted this to be faster than the built in XML + DataContractSerializer mechanism.
- Theoretically Survive assembly rebuilds – Most binary serialization schemes (think ASP.NET SQL server session state) do not tolerate different assembly versions but rather require an exact match. Since I’m using [DataContract] semantics I hoped to allow whatever properties still match to survive across builds if possible.
- Handle lists and complex object graphs
- Low disk usage: binary files should be far smaller than equivalent XML files
Building the Serializer
There’s no built in binary serialization for Silverlight or I wouldn’t be writing this article. There are, however, BinaryWriter and BinaryReader classes. This saves one from needing to write custom logic for all primitive types and in particular being careful to encode strings with additional length attributes. This leaves us with fewer problems:
- Logic to recursively serialize object graphs
- Handling null objects
- Handling collection types
- Persisting data about what types and what properties of those types were serialized. This is key for meeting the durability design goal (#4)
The source code will be provided so we’ll only go over some highlights here. The first thing to do was to build an object with several different property types to test with. Instances will populate themselves with random values. While this can throw off apples-to-apples comparisons the differences should be significant enough to ignore this.
[DataContract]
public class RefDataRow
{
/// <summary>
/// Set some random values
/// </summary>
public RefDataRow()
{
Random r = new Random();
Id = Guid.NewGuid();
Field0 = r.Next(100000);
Field1 = r.Next(100000);
Field2 = r.Next(100000);
Field3 = r.Next(100000);
Field4 = r.Next(100000);
Field5 = r.Next(100000);
Field6 = r.Next(100000);
Field7 = r.Next(100000);
Field8 = r.Next(100000);
Field9 = r.Next(100000);
Value = r.NextDouble();
int dLen = r.Next(50);
var sb = new StringBuilder();
for (int i = 0; i < dLen; ++i)
{
sb.Append((char)r.Next(26) + 65);
}
Description = sb.ToString();
}
[DataMember]
public Guid Id { get; set; }
[DataMember]
public int Field0 { get; set; }
[DataMember]
public int Field1 { get; set; }
[DataMember]
public int Field2 { get; set; }
[DataMember]
public int Field3 { get; set; }
[DataMember]
public int Field4 { get; set; }
[DataMember]
public int Field5 { get; set; }
[DataMember]
public int Field6 { get; set; }
[DataMember]
public int Field7 { get; set; }
[DataMember]
public int Field8 { get; set; }
[DataMember]
public int Field9 { get; set; }
[DataMember]
public double Value { get; set; }
[DataMember]
public string Description { get; set; }
We add some more code to this class later, but for now this will do. So far we’ve done nothing but decorate this class with DataContract/DataMember attributes. There’s a mix of various data types in here, with a lot of integers in the middle.
Test UI
A user interface to run tests and show results will help. I’ve come up with the following options which can be ran in order if we wish to perform all tests. The last two buttons can be ignored for now.
Running Some Tests
My Generate Test Data command creates 300,000 randomly instantiated instances of my RefDataRow class.
I am using some new framework libraries I’m working on here, but in essence there’s a ViewModel with a command bound to each button. You can see that using the BinarySerializer looks very similar to DataContract serializer:
DataContractSerCmd = new TimedCommand<string>(s =>
{
using (var fs = file.CreateFile(XmlDataFileName))
{
var dcs = new DataContractSerializer(typeof(List<RefDataRow>));
dcs.WriteObject(fs, TestData);
}
ReportSizes();
}, timer, "Data Contract Serialization");
BinarySerCmd = new TimedCommand<string>(s =>
{
using (var fs = file.CreateFile(BinDataFileName))
{
var bs = new BinarySerializer(typeof(List<RefDataRow>));
bs.Serialize(TestData, fs);
}
ReportSizes();
},timer, "Binary DataContract Serialization");
DataContractDeSerCmd = new TimedCommand<string>(s =>
{
using (var fs = file.OpenFile(XmlDataFileName, FileMode.Open))
{
var dcs = new DataContractSerializer(typeof(List<RefDataRow>));
var obj = dcs.ReadObject(fs);
}
}, timer, "DataContract Deserialize");
BinaryDeSerCmd = new TimedCommand<string>(s =>
{
using (var fs = file.OpenFile(BinDataFileName, FileMode.Open))
{
var bs = new BinarySerializer(typeof(List<RefDataRow>));
var obj = bs.DeSerialize<List<RefDataRow>>(fs);
}
}, timer, "Binary Deserialize");
So, we have four DelegateCommand<T> style commands wrapped in a low-resolution timer. Since I’m randomly generating test data each time the results will be slightly different, but I found the run below to represent an average case:
Thinking About the Results
So, I’ve met my design goals, but my performance goals are way off the mark. What’s going on?
Disk Space Usage
Without paying the angle-bracket tax, and by using binary serialization, the binary file is less than 1/3 the size of the XML file. The one saving grace I can think of, in terms of my performance goals, is that I’m doing these tests on a Solid State Drive. With incredibly high sequential write speeds, any performance gains that might come from writing a much smaller file to disk are certainly minimized.
Performance
The out of the box DataContractSerializer is faster, and on deserialization it completely eats my lunch. What happened? I have a theory.
What happens when you try to out-do Microsoft engineers? Sometimes you lose. Reflection is an awesome and powerful CLR feature and has always been known as a performance no-no. While I read many times how much faster Reflection (and App Domains) became in the 2.0 runtime it’s still really slow to invoke methods via reflection. My code is calling all getters and setters here using PropertyInfo. The built in DataContractSerializer is almost certainly using Reflection Emit to generate “real” classes to get and set values and that is going to be a lot faster than what I’m doing here. While I’ve written a lot of SRE code in the past couple of years, I want to test my theory before I take the time to do that here.
Another Try
In order to prove that Reflection is my downfall, I create a new .NET interface:
public interface IBinarySerializable
{
void WritePrimitiveValue(string propName, BinaryWriter bw);
void ReadPrimitiveValue(string propName, BinaryReader br);
}
The goal here is to have my RefDataRow call setters on itself, looking up a lambda expression in a Dictionary by property name. Some of the Writer code might look like so…
_actions= new Dictionary<string,Action<BinaryWriter>>();
_actions.Add("Id", bw => bw.Write(Id.ToByteArray()));
_actions.Add("Value", bw => bw.Write(Value));
_actions.Add("Description", bw => bw.Write(Description));
_actions.Add("Field0", bw => bw.Write(Field0));
… with the corresponding reader code looking like this:
_readActions = new Dictionary<string, Action<BinaryReader>>();
_readActions.Add("Id", br => Id = new Guid(br.ReadBytes(16) ) );
_readActions.Add("Value", br => Value = br.ReadDouble());
_readActions.Add("Description", br => Description = br.ReadString());
_readActions.Add("Field0", br => Field0 = br.ReadInt32() );
I am then giving the BinarySerializer a hint, telling it to check for DataContract objects implementing IBinarySerializable and using real method calls. Even though these are Virtual method calls, they’re going to be a lot faster than Reflection. What’s the result?
I finally take a strong lead in serialization, and deserialization has increased by 100%, but that aspect is still slower than the built in DataContractSerializer.
Conclusion
This needs a lot more testing before it’s production ready for complex object graphs but it does work and it’s easy to use. If I can do some more research and get the read&deserialize speed way down, I’ll go ahead and do the System.Reflection.Emit work necessary to make the speed gains from IBinarySerializable automatic without having to implement this interface. The source code will be provided as part of a new application framework I’m working on.
Since reading is my issue here, I have to wonder about implementing a XamlWriter in Silverlight, so that I could use XamlReader on the reading end. XamlReader is extremely fast as it uses a lower level engine than my object models can get at.
On the flip side, if you are dealing with less than 300,000 complex objects, this may be just what you wanted. Stay tuned for more research on this topic.