Today I was looking at some code on the internets and eventually ended up looking at XML Pull parsers. I’m not a huge fan of XML. I feel it is bloated, too easy to abuse, and too easy to get things wrong. (in a nutshell, it’s overloaded).
Well, every now and then we actually have to deal with XML and in the past I did a little research and ended up using SAX style systems in my code. They are easy to implement, but I always felt a little put off by the design. You end up building lots of little state machines when the XML goes recusive.
Today, I got out of my hole and discovered another form of XML Parsing called Pull-Parsing. These are also called Streaming XML Parsers. They have the usual problems (unable to validate perfectly), But they have a light memory footprint, approval from the XML gods as ‘the way’, they are the official built-in parser for C#, and they are fast.
The main reason I like the API is slightly different. I like the design. Many moons ago, when I wrote a compiler for a prototype-based language called Cel, I wrote a recursive decent parser. When I look at the XML-Pull API, it looks almost exactly like the lexer interface that I had in the parser. I got my design from the compiler Dragon book. It’s a good design that makes dealing with complexity much easier since they strike the proper balance between generator and a consumer states and responsibilites.
When I do my next project, and I use XML, (probably C#), I’ll use this.