Log in

No account? Create an account
The Microsoft Binary File Format Validator - Nick [entries|archive|friends|userinfo]

[ website | gagravarr.org ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

The Microsoft Binary File Format Validator [Jul. 18th, 2011|10:45 pm]
I first got involved in Apache POI back in November 2003 (or so a check of the archives tell me!), which is almost 8 years ago as a write this... Back then, documentation of the Microsoft Binary File Formats was pretty slim on the ground, and there was quite a bit of reverse engineering needed. If you'd told me then that in 2010 I'd be in Seattle at a Microsoft event on the public documentation on the file formats, I wouldn't have believed yet! Despite this, the Documentation is public (and has been for some time), and at the end of last year I did head out to Seattle for the Office Binary File Format Plugfest.

One of the neat things about the plugfest (beyond learning lots, meeting other people developing similar libraries to read the file formats, and reporting docs bugs in person to their authors) was the Binary File Format Validator Tool. One of the few frustrating things was that I couldn't share the tool with anyone else... Good news though, as it's now out of private beta, and you can go learn about it and grab a copy from here! (It's technically Windows only, but it seems to work just fine under Wine)

Short term, I think one of the biggest uses for the BFF tool for POI will be for those dreaded bug reports of "I've got a file that seems to show a bug, but I can't let you have the file". Hopefully now the user will be able to run the BFF against their input file, and check it's valid. If it's not, then we know it's probably not our fault that it can't be read, and the user can go and speak to whoever wrote the software that generated it. In the case that the input's fine, the BFF can be run on the POI generated output. If this flag's a problem, then it'll hopefully come with the nifty details and docs references to help us fix it. No more "it doesn't work but I don't know why", and instead "option 5 should only be 0x3 when there are fewer than 32 entries, then it has to be 0x7, but POI has 0x3 hard coded". Link to the spec, a link to the problem code, and maybe even a patch too. Well, we can dream.... :)

Longer term, we'll hopefully hook up the BFF to some sort of integration testing, to be run every once in a while, to alert us to when we're generating files with issues. That said, the BFF is stricted than Office is, so we'll need to fix up all the cases where Office lets us get away with something the spec doesn't technically allow, so we're warning free. This may take a little while to work through, so volunteers and patches are very welcome, please come help us!