Categories

C++ Serialization Woes

Recently I needed to serialize some data in C++ and it drove me nuts. Does anyone know some good solutions?

Serialiation is the process of loading and saving data. Generally if you have a good library or a good language it can be very easy to save a giant tree of data with very little work. For example in Perl if you have a hash of hashes of arrays of values etc, (some massively complicated structure) you can save it all to a file with like a couple of lines of code and load it back up with a couple of lines as well. With langauges like C you’d have to write hundreds or thousands of lines of code to first save all the data and then re-load it.

Java and C# come with serialization built into their standard libraries so it’s relatively easy to load and save. Java in particular supposedly has lots of issues but I’m sure their are workarounds.

C++ has no standard serialization and C++ is still an evolving language which means there are newer *features* that are not always compatible with older code. So for example there are the boost libraries which are libraries often created by the very people that work on the C++ spec but which have not made it into the spec yet. One of those is a serialization library. Unfortnately it requires a relatively newer C++ feature called Runtime Type Indentification or RTTI.  RTTI code is incompatible with non RTTI code and the stuff I’m writing is a plugin to another program. That program didn’t use RTTI so I can’t you RTTI in my plugin. So much for using the boost library. Supposedly they have some support for non RTTI but I tried the examples and they would not compile until I turned RTTI on. (Yes, I removed the class that was using RTTI but the compiler still complained that it couldn’t include the header files for serialization unless I turned on RTTI). Worse, I tried some simple example with RTTI just to see if it worked and as far as I could tell it didn’t actually work. I made a simple structure with a couple of floats, serialized the struct as per the docs and looked at the data, the floats were not in there. Maybe that’s an issue with VC++ 2003 although the docs claimed it works with no problems.

I found the s11n library. It doesn’t say if it requires RTTI or not. Unfortunately it does say they don’t support VC++ and they don’t support graphs which is something I need. A graph in data terms means you have things that point to each other. For example a house may have a reference to its landlord and the landlord might have a list of references to houses she owns. A system that doesn’t support graphs could not save that data, it could only save the connection one way, either houses could save their landlord or landlords could save their houses but the connection between them going the opposite way would be lost.

I found the Eternity library. It also requires RTTI :-(

Someone suggested trying to pull out the CArchive feature of MFC (Microsoft Foundation Classes). They support serialization without RTTI (as RTTI didn’t exist when MFC was created) but they required you to base every object you wanted to serialize on CObject and I really did not want to get into multiple inheritance issues.

Another minor requirement is that I’d like the serialization library to save in XML. XML is relatively easy to read for a human so having the option to load and save as XML means it’s easy to load up the data in a text editor and check for problems. Java and C# both support this. It’s one extra line of code in Perl. The boost library is supposed to support this as well but as I already pointed out it doesn’t seem I can use that library.

To give you an example of how important a serialization library is, in my current project I have 2 tools. One tool writes a bunch of data, the second tool reads that data and processes it. If I had a working serialization library it would be as little as a single line of code in each tool to load and save the data. As I added new data to tool 1 it would automatically be loadable in tool 2. But, as I don’t have a working serialization library, saving in tool 1 and loading in tool 2 took between 8 and 14 hours to implement including finding a non giant XML library and then writing support functions for that, then writing the code to use it to deserialize with all the various error checking etc and that was the first time only. Last week I added a bunch of new data and when it came time to load and save that it took another 6 hours. If I had a serialization library that time would be close to 0 hours. Maybe at most 30 minutes to write the few support functions needed. I learned a lot about serialization writing Tanjunka. In that I spent almost no time getting a fairly complicated data graph serialized.

Unfortunately, writing a good serialization library is not a trival task. Maybe to someone that’s done it before but my best guess is that it would take me at least a couple of weeks to write a good, easy to use, non obtrusive serialiation library for C++. This is one of those unfun tradeoffs. I actually wasted a week trying out different solutions and fudging trying to write my own until I decided just to do it the old fashion way and manually write saving and loading functions because after a week it was clear to me it was going to take another week or 2 and I didn’t have the time. At the same time, everytime I add even a little bit of data I’ve got to update both programs, something that would basically be handled automatically if I had a serialization library.

If you know of a library please point it out.

  1. it must serialize/deserialize to XML
  2. it must support STL containers
  3. it must support boost::shared_ptr (and containers of shared pointers)
  4. it must not require RTTI
  5. it must handle versioning.
  6. it must be easy to use, not massively cumbersome.
  7. it would be nice if it wasn’t a bazillion lines of code (the boost serialization libraries compile to 7MEG!)

9 comments to C++ Serialization Woes

  • uk_designer_matt
    gamedev

    Gregg, I guarantee if you repost your problem above on to the GameDev.net forums you would have a solution by the end of the day :-)

  • Impossible?

    Personally, I prefer the good ole’ Load/Save functions. But wouldn’t it be impossible to do non-intrusive serialization without RTTI? Especially since it sounds like you’re not dealing with POD (plain old data) structures. I guess maybe if you registered member variables in the constructor or something. But then you’d be assuming everything you want to serialize is in a member variable. Or used some kind of crazy ODL and a processing tool to generate code. Maybe you’re asking for too much. :o )

  • It doesn’t have to be completely automated.  I don’t mind writing a serialize function for each class.  But, if you look at the non RTTI version of the boost serialization library it’s not as simple as just making a serialize member function it’s also a bunch of giant ugly unsightly registration and class info structures.  That’s what I’d like to avoid.  Eternity had registration down to 1 line.  Of course if boost had actually worked without RTTI I would have put up with the mess. 

  • Jim

    I don’t know what all your requirements mean, but I’ll offer what I know anyways:

    For a recent project I (nearly) completely automated C++ serialization by writing a code parser to generate all the save/load methods for me. All I needed to do was declare the save and load methods in each class and also call the registration macro after each class. And it saved the data to lisp s-expressions, which are semantically equivalent to XML.

    I didn’t support pointers, and I did use RTTI, but these both seem like fairly tractable problems. Handling pointers is just a matter of traversing a tree, with checks for aliasing. RTTI could be as trivial as … well, let’s say you add a type() method to the class. Hell, let’s say we make adding the save(), load() and type() methods another macro, to cut it down to two extra lines per class for serialization support.

    Then we could add this to the object registration macro:

    const char *ClassName::type() { return #ClassName; } — this way, you can use either the string or just the string pointer as a unique identifier, depending on whether you want to optimize for size or clarity. And it would register the class in the map of classes using that method, rather than typeid().

    Note that the registration macro has to be executed in only one module. This wasn’t a big deal for me, because I just put all my class definitions in that same module anyways, rather than a .h file. I could get away with this because the classes were never referred to anywhere in the code, they were only instantiated by parsed strings, at runtime.

  • guyinAMERICA
    so what game are you working on?
  • Well, it’s not really important to the question but, … I’m working on Yet another 3d exporter. :-(

    The data gets exported to a middle format and then another tool reads that data (and other data) and builds a game level.  The data exported includes 3d hierarchy info, 3d meshes, materials, textures, animations, attributes and many other things and basically if I had working serialization I could save “save(…)” in one tools and “load(…)” in the other and have almost zero work to deal with those parts.  As it is, anytime I add something new to the exporter I have to then write saving routines on the exporter and then loading routines on the level builder. 

  • jxn

    just found this link while googling for serialization…

    http://www.codeproject.com/cpp/xmlserialization.asp

    maybe it’ll help you. Good luck!

  • notes on s11n

    Hi!

    Your questions about s11n:

    What’s RTTI? Just kidding: no, i never rely on RTTI.

    Graph support: the library CAN do it, but it has no GENERIC algorithms to do it: each de/ser algo for graphs is unique, IMO. It has been done before, though.

    Despite common belief, the lack on built-in shallow pointer tracking is not the hinderance it would seem to be. The manual goes into great detail on this. :)

    Take care,

    —– stephan

  • If you know of a library please point it out.

    1. it must serialize/deserialize to XML
    2. it must support STL containers
    3. it must support boost::shared_ptr (and containers of shared pointers)
    4. it must not require RTTI
    5. it must handle versioning.
    6. it must be easy to use, not massively cumbersome.
    7. it would be nice if it wasn’t a bazillion lines of code (the boost serialization libraries compile to 7MEG!)

    Boost serialization fullfills all the above requirements.  For executables compiled and run without debug code – serializaiton adds in the range of 32K for simple classes.

    The package includes numerous examples and tests which illustrate and validate the functioning of the library.  I’m sure the conclusion that you could’t serialize a float was premature.

    Robert Ramey

Leave a Reply

 

 

 

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>