Recently I needed to serialize some data in C++ and it drove me nuts. Does anyone know some good solutions?
Serialiation is the process of loading and saving data. Generally if you have a good library or a good language it can be very easy to save a giant tree of data with very little work. For example in Perl if you have a hash of hashes of arrays of values etc, (some massively complicated structure) you can save it all to a file with like a couple of lines of code and load it back up with a couple of lines as well. With langauges like C you'd have to write hundreds or thousands of lines of code to first save all the data and then re−load it.
Java and C# come with serialization built into their standard libraries so it's relatively easy to load and save. Java in particular supposedly has lots of issues but I'm sure their are workarounds.
C++ has no standard serialization and C++ is still an evolving language which means there are newer *features* that are not always compatible with older code. So for example there are the boost libraries which are libraries often created by the very people that work on the C++ spec but which have not made it into the spec yet. One of those is a serialization library. Unfortnately it requires a relatively newer C++ feature called Runtime Type Indentification or RTTI. RTTI code is incompatible with non RTTI code and the stuff I'm writing is a plugin to another program. That program didn't use RTTI so I can't you RTTI in my plugin. So much for using the boost library. Supposedly they have some support for non RTTI but I tried the examples and they would not compile until I turned RTTI on. (Yes, I removed the class that was using RTTI but the compiler still complained that it couldn't include the header files for serialization unless I turned on RTTI). Worse, I tried some simple example with RTTI just to see if it worked and as far as I could tell it didn't actually work. I made a simple structure with a couple of floats, serialized the struct as per the docs and looked at the data, the floats were not in there. Maybe that's an issue with VC++ 2003 although the docs claimed it works with no problems.
I found the s11n library. It doesn't say if it requires RTTI or not. Unfortunately it does say they don't support VC++ and they don't support graphs which is something I need. A graph in data terms means you have things that point to each other. For example a house may have a reference to its landlord and the landlord might have a list of references to houses she owns. A system that doesn't support graphs could not save that data, it could only save the connection one way, either houses could save their landlord or landlords could save their houses but the connection between them going the opposite way would be lost.
I found the Eternity library. It also requires RTTI 😞
Someone suggested trying to pull out the CArchive feature of MFC (Microsoft Foundation Classes). They support serialization without RTTI (as RTTI didn't exist when MFC was created) but they required you to base every object you wanted to serialize on CObject and I really did not want to get into multiple inheritance issues.
Another minor requirement is that I'd like the serialization library to save in XML. XML is relatively easy to read for a human so having the option to load and save as XML means it's easy to load up the data in a text editor and check for problems. Java and C# both support this. It's one extra line of code in Perl. The boost library is supposed to support this as well but as I already pointed out it doesn't seem I can use that library.
To give you an example of how important a serialization library is, in my current project I have 2 tools. One tool writes a bunch of data, the second tool reads that data and processes it. If I had a working serialization library it would be as little as a single line of code in each tool to load and save the data. As I added new data to tool 1 it would automatically be loadable in tool 2. But, as I don't have a working serialization library, saving in tool 1 and loading in tool 2 took between 8 and 14 hours to implement including finding a non giant XML library and then writing support functions for that, then writing the code to use it to deserialize with all the various error checking etc and that was the first time only. Last week I added a bunch of new data and when it came time to load and save that it took another 6 hours. If I had a serialization library that time would be close to 0 hours. Maybe at most 30 minutes to write the few support functions needed. I learned a lot about serialization writing Tanjunka. In that I spent almost no time getting a fairly complicated data graph serialized.
Unfortunately, writing a good serialization library is not a trival task. Maybe to someone that's done it before but my best guess is that it would take me at least a couple of weeks to write a good, easy to use, non obtrusive serialiation library for C++. This is one of those unfun tradeoffs. I actually wasted a week trying out different solutions and fudging trying to write my own until I decided just to do it the old fashion way and manually write saving and loading functions because after a week it was clear to me it was going to take another week or 2 and I didn't have the time. At the same time, everytime I add even a little bit of data I've got to update both programs, something that would basically be handled automatically if I had a serialization library.
If you know of a library please point it out.
- it must serialize/deserialize to XML
- it must support STL containers
- it must support boost::shared_ptr (and containers of shared pointers)
- it must not require RTTI
- it must handle versioning.
- it must be easy to use, not massively cumbersome.
- it would be nice if it wasn't a bazillion lines of code (the boost serialization libraries compile to 7MEG!)