The Microsoft Binary File Format Validator

I first got involved in Apache POI back in November 2003 (or so a check of the archives tell me!), which is almost 8 years ago as a write this... Back then, documentation of the Microsoft Binary File Formats was pretty slim on the ground, and there was quite a bit of reverse engineering needed. If you'd told me then that in 2010 I'd be in Seattle at a Microsoft event on the public documentation on the file formats, I wouldn't have believed yet! Despite this, the Documentation is public (and has been for some time), and at the end of last year I did head out to Seattle for the Office Binary File Format Plugfest.

One of the neat things about the plugfest (beyond learning lots, meeting other people developing similar libraries to read the file formats, and reporting docs bugs in person to their authors) was the Binary File Format Validator Tool. One of the few frustrating things was that I couldn't share the tool with anyone else... Good news though, as it's now out of private beta, and you can go learn about it and grab a copy from here! (It's technically Windows only, but it seems to work just fine under Wine)

Short term, I think one of the biggest uses for the BFF tool for POI will be for those dreaded bug reports of "I've got a file that seems to show a bug, but I can't let you have the file". Hopefully now the user will be able to run the BFF against their input file, and check it's valid. If it's not, then we know it's probably not our fault that it can't be read, and the user can go and speak to whoever wrote the software that generated it. In the case that the input's fine, the BFF can be run on the POI generated output. If this flag's a problem, then it'll hopefully come with the nifty details and docs references to help us fix it. No more "it doesn't work but I don't know why", and instead "option 5 should only be 0x3 when there are fewer than 32 entries, then it has to be 0x7, but POI has 0x3 hard coded". Link to the spec, a link to the problem code, and maybe even a patch too. Well, we can dream.... :)

Longer term, we'll hopefully hook up the BFF to some sort of integration testing, to be run every once in a while, to alert us to when we're generating files with issues. That said, the BFF is stricted than Office is, so we'll need to fix up all the cases where Office lets us get away with something the spec doesn't technically allow, so we're warning free. This may take a little while to work through, so volunteers and patches are very welcome, please come help us!


Having had a great time helping to organise a BarCamp in Oxford in June as part of Transfer Summit, another in Atlanta as part of ApacheCon, plus a few at Apache Retreats, it shouldn't be a bit surprise to hear that I'm helping with another in December.

What might be a bit different about this next on is that it'll be in Sydney! Yes, by popular demand, and thanks to several excellent and hard working volunteers on the ground, we're going to be having our first Apache event in Australia!

BarCampApacheSyndey is to be held on Saturday 11th December, at the The Darling Centre at the University of Sydney. In keeping with the Oxford BarCamps, we'll be doing a meal on the Friday night, then a full day's BarCamping on the Saturday.

If you're going to be in (or near!) Syndey around the weekend of Saturday 11th December, you can get more details on the event, plus sign up, on the website: See you then!

Upcoming Conferences

This weekend I was at the Apache Retreat in Hursley, near Winchester. As you would have hoped from a weekend-long conference with Tipis and the ground floor of a former stately home, it was a great event. See the photos for yourself! Several people asked me about what other conferences I was doing this year, so I thought I'd do a quick write up.

First up is the Alfresco Developer Conference in Paris, which is the 20th and 21st of October. It looks like I'm only going to be there for the Thursday now (owing to a Microsoft binary formats event), which is a shame as it's looking like a great two days. I'll be talking about two related Alfresco services, the Content Transformation Service and Metadata Extractor Service. I'll be looking at what these services do, how to make better use of them, and how to extend them. Quite a bit of this will be around the new work for 3.4 which uses Apache Tika to deliver support for a large number of new formats.

For more information on my talk, you can watch the video interview I've done as part of the conference promotion. More information on the event is at

Two weeks later is ApacheCon in Atlanta. ApacheCon runs for the whole week, from Monday 1st to Friday 5th November (which means I can't be in New York for the 2nd Alfresco DevConf, why do these things always have to clash?). It's looking like a great conference, and I'm going to be quite busy there, talking, organising and attending! The Monday and Tuesday see the BarCamp un-conference, which should feature loads of great sessions and discussions on new and existing Apache technologies.

The main sessions run Wednesday - Friday, and here I'm running the "Content Technologies" track. I'm kicking off the track with an overview session, which should give an introduction to a lot of the many and varied content related projects at Apache. Hopefully, it'll also whet the appetite of people for the rest of the track! The track will then continue with talks on various content areas, from the new CMIS standard + Apache Chemistry project which helps you build applications backed by content stores, through PDFBox and POI, to how convert your application to be content driven. The chances are, your application deals with content somewhere, so come along and learn how Apache projects can help you!

Finally, I'm hoping to help with running an Apache BarCamp in Sydney in December. The Apache committers in Sydney are currently in the final stages of booking the venue, so hopefully this can be confirmed and announced shortly...

Fringe 2010

Note - This post has taken a little longer than planned in the writing....

After a couple of days hiking near Glen Affric, I made it to the fringe for the Friday afternoon. My first few ideas for what to see were thwarted by shows having sold out, but by just gone 4pm I'd started my fringe trip for 2010!


First up was The Leeds Tealights at Underbelly, which was a random picked based on the 4* reviews on the fliers. It was a 4 person sketch show, 3 guys and a girl, which set up the dynamic for a fair few of the skits. Most of the skits were only a few minutes long, so they got through a lot of material during the hour. A couple of them were absolute gems, most were fairly good for most of the time, and only a few fell completely flat... Overall, I'd give it 3*, and it was a fun and lighthearted start to the fringe!

Next up was the stunning Tabu out on Leith Walk. The non-central location allowed them to have a large big top as their venue, setup for standing rather than seating. It was billed as a circus show, but turned out to be so much more. Firstly, there was the live band, providing a musical backdrop to the show, often accompanied on vocals by the performers not doing acrobatics at that moment. Secondly, it wasn't just the (amazing!) acrobatics, the pieces changed and were woven together which each other, and the music, to provide a narrative for the show. It wasn't a static show either, instead the whole audience was moved about between pieces to create the appropriate gaps and spaces for the next set. Finally, the acrobatics themselves - stunning! Very varied, covering most major circus and acrobatic styles, stunningly well executed. It was a wider range of things than Circa (from last year), so not every piece ended up as developed as in that, but with the music and the thematic links between pieces, it was a wonderful performance. 5*


My 2nd day at the fringe kicked off with Shakespeare For Breakfast at C. With free croissants, and being one of the few early starting plays, this is a usual favourite of ours. This year source of inspiration was King Lear, and their modern, silly take on it worked quite well. However, it's by no means the best production they've managed in all the years we've seen them... Good start to the day though, 3*
Was a zany version of King Lear, solid 3*, but last year was better

This is Belt Up's third year at the fringe (2 years ago they did The Red Room, and last year The Squat as C Soco). They were back at C Soco again this year, with a series of immersive theatre set in The Room Above. The first of their shows that we saw was The Boy James, which was a very intimate and immersive show. A boy, his older friend/self, the joys of childhood adventures shared, and the loss of growing up. Amazing stuff, an excellent 5*

Next up was another Belt Up show, The Second Star to the Right, carrying on with the JM Barrie theme. It was an almost hypnotic at times performance, with some lovely dance-like parts, as 4 actresses took us through bits of the Peter Pan story. 4*

Roam at Zoo Southside was a piece of modern dance by a company we'd seen in previous years. Unfortunately, this year's show had no real narrative, but that did still leave us with some amazing dancing and great music! The dance was great as ever, 4*

Vanishing Horizon at Zoo was largely back to theatre, though with some almost dance-like physical elements in places. A variety of interwoven stories covered the history of women aviators, family discovery, and the perils of trying to make a radio play... The stories fitted well together, and apart from a few iffy timings in the fictional bits (the historic factual parts seemed fine!), worked well. There was frequent, very impressive use of a large number of suitcases as props, and great physical parts using them to link everything together. Interesting and engaging theatre, 5*

For a lighter note, we opted for Amateur Transplants again this year, and were rewarded as they were on top form! This year's show featured both of them, and focused more on the excellent shorter funny songs, rather than fewer longer works. As ever, it was dark and silly medic humour, but hilariously done! No-one left through being offended either... Back up to their previous 5* standard again.

Finally was Geraldine Quinn (@geraldinequinn), for some musical Australian comedy. The show was about modern pop music, and featured some excellent, cutting songs and dance routines, exposing and ridiculing the faults of the pop industry. 4*


Carrying on with Belt Up shows, we began Sunday withBelt Up's Oddssey. This was another excellent immersive piece, in one of their smaller venues, and powerfully done. However, while the re-setting into a dystopian future was good, it did sometimes make it tricky to entirely follow, so largely one for people who already know the original! 4*

Hunchback of Notere Damn at Pleasance wasn't the only production of the story running, but was one of the good ones! Set in a smallish venue, it was a one man show, a powerful portrail of a poignant tale, 4*

More Belt up, this time Belt Up's Metamorphosis, with their take on Kafka's tale. It was interesting staging, but I didn't find it as well translated as last year's "The Trial", so it ended up being that little bit to odd for my tastes... (I realise that Kafka isn't supposed to be normal, but still!) I rate it 3*, but many others in my group gave it 5, take of that what you will!

Over to Pleasance for our first of two boxing shows - Beautiful Burnout. This proved to be an excellent mix of story, monologues, group pieces, physical theatre and some dance-like fights. It did a great job of showing all sides of it, and worked well no matter what your knowledge or feelings on boxing were to start. An excellent 5*

Another fun Australian musical comedy followed, in the form of Sammy J. This was our third time seeing him, and he remained as great as ever. True, you often question his sanity, but you laugh all the same! Songs and stories, probably almost all true, of his more recent past (last year's show did the earlier parts), and very funny. 5*

Putting It Together by Sondheim was a late night almost-musical show. Rather than being a full Sondheim work, it was a collection of his songs around the themes of love and marriage. The songs were done well, but the pick'n'mix feel that went with the lack of any real narrative was the slight snag. 4*


An early start, and for a full length play - History Boys at Greenside (again one of several productions). As one would have hoped from the reviews from previous years, this was a good story well told. The direction largely worked well, and it was a solid 4* production.

Then for something a bit darker - Blackout at Underbelly. This dark, depressing, excellent piece of physical theatre told of one young man's journey to mindless violence. Based on a real story and real interviews, it was dark but great. 4*

Pluck were back again this year, relocated to the Gilded Balloon. Once again the trio impressed us with some crazy slapstick whilst managing a fairly decent rendition of a classical music concert at the same time! It wasn't quite as good as their previous show, but is was still good fun - 4*

Belt Up's Lorca is Dead was I think their best show of the fringe. Belt Up are always surreal to some extent, but the chance to play the Paris set of surrealists, debating and retelling the life and death of Lorca really lets them into their own. Strange, but great! 5*

More Belt Up followed, with Belt Up's Atrium. More great immersive stuff, weaving in and out of the different fantasies of a dying writer dictating his memoirs. Some great silly costumes too, but I must remember not to be a tiny bit more reserved when sat at the front, lest anyone have to see another one of my 5 second Kafka looks...! Good show though, 5*


Ovid's Metamorphoses at #edfringe, told through the medium of 40s music, a few great bits with puppets, and some fun physical theatre! 4*

Others by Paper Birds, who for the previous two years did the stunningly powerful and amazing In A Thousand Pieces. This year they'd sent letters, and then questionnaires to a number of women they felt to be interesting, and different `(other) to them. The play again featured three actresses, moderately minimal props, but this time was light-hearted for parts. It was made up of the letters sent and received, the responses not received, the actress's ideas of what the responses might be, and the discussions that led from them. Parts showed the similarities between apparently different women, parts the unexpected differences, and parts the terrifying cluelessness of the actresses (I think, well, hope, largely hammed up!). In the end, it was a lighter piece than last year, but ended up not quite so powerful and thought-provoking. Still great stuff though, 4*

Hamlet! The Musical - You'd probably expect this to be dreadful, but it actually turns out to be silly and quite good fun! 3*

NewsRevue were on fairly good form once again. A few of the tory jokes seemed a bit rehashed from the 80s, but the Nick Clegg puppet worked well! Throw in some good characatures, and several Glee related songs, and you have a fun hour. 3*

Finally the last show of the Fringe - Shadow Boxing. This was a one man show, about not only being a boxer, but the personal growth and discovery that lay around it, and with some good twists as it went along. Not as good as Beautiful Burnout, but a solid 4* none the less.

And after 25 shows, that's it! During the course of the fringe, I tweeted short reviews, which seemed to go well (based on in-person feedback from many of my friends, and from twitter comments), so I'll probably do that again next year. The above are hopefully longer and more thought out reviews, though as they vary in being written between 1 hour and 2 weeks after the event, they do vary somewhat...!

Playing BBC HD and ITV HD on a DVBWorld DW2104

Having got fed up with the ropey reception on Freeview, due to ongoing problems with my local transmitter (engineering works, fires, too windy for helicopters, the list of excuses is impressive...), I decided to pick up a DVB-S adapter for my mythtv box.

The snag with having a very small, low power mythtv machine is that all the adapters need to be USB, which does limit the choices of linux compatible DVB-S adapters. However, a bit of browsing of the Linux TV wiki and some patience with ebay searches paid off, and I was able to pick up a DVBWorld HD 2104.

With the help of this site, I was able to get the firmware for it, and it was quickly up and running. I followed this guide to get things running, which largely worked for the SD channels.

However, despite the scan of the Astra 28.2E and Eurobird 1 satellites showing me both BBC HD and ITV HD, I was unable to get either working with MythTV or mplayer. Trying in mplayer I saw:
nick@minimyth:/dev/dvb/adapter1$ mplayer dvb://2@"BBC HD"
MPlayer UNKNOWN-4.4.3 (C) 2000-2010 MPlayer Team
Can't open joystick device /dev/input/js0: No such file or directory
Can't init input joystick
mplayer: could not connect to socket
mplayer: No such file or directory
Failed to open LIRC support. You will not be able to use your remote control.

Playing dvb://2@BBC HD.
dvb_tune Freq: 10847000
Cache fill: 18.85% (1581056 bytes)   
TS file format detected.
VIDEO MPEG2(pid=5500) AUDIO MPA(pid=5502) NO SUBS (yet)!  PROGRAM N. 0

But nothing played. Using szap myself to capture a bit of the stream myself, and trying with that, I got sound but no picture:
mplayer /tmp/HD
MPlayer UNKNOWN-4.4.3 (C) 2000-2010 MPlayer Team
Can't open joystick device /dev/input/js0: No such file or directory
Can't init input joystick
mplayer: could not connect to socket
mplayer: No such file or directory
Failed to open LIRC support. You will not be able to use your remote control.

Playing /tmp/HD.
Cache fill:  0.00% (0 bytes)   
TS file format detected.
VIDEO MPEG2(pid=5500) AUDIO MPA(pid=5502) NO SUBS (yet)!  PROGRAM N. 0
MPEG: FATAL: EOF while searching for sequence header.
Video: Cannot read properties.
Opening audio decoder: [mp3lib] MPEG layer-2, layer-3
AUDIO: 48000 Hz, 2 ch, s16le, 256.0 kbit/16.67% (ratio: 32000->192000)
Selected audio codec: [mp3] afm: mp3lib (mp3lib MPEG layer-2, layer-3)
AO: [alsa] 48000Hz 2ch s16le (2 bytes per sample)
Video: no video
Starting playback...

The key thing to spot here is that mplayer thinks it has MPEG2 video. However, both BBC HD and ITV HD are H264 when broadcast over Freesat! After some googling, it turns out that there's something up with how the dvb-s scan program outputs the channel lines. The easy option is to tell mplayer to use a workaround for this, by passing in an extra option to mplayer - -demuxer lavf

Thus, the easy way to play BBC HD on the DVBWorld card (my 2nd DVB adapter), the command is:
mplayer -demuxer lavf dvb://2@"BBC HD"

This largely seems to work fine, though the sound sometimes drifts which needs a quit and restart to work.

However, it is possible to also hack channels.conf to contain the correct details to "just work(TM)". This seems a bit black magic, but you need to run mplayer with "-identify", pick the PMT_PID (via trial and error...), and add this into the channels.conf video entry with a plus. Thus, my channels.conf entries for the HD freesat channels are:
BBC HD:10847:v:0:22000:5500+259:5502:6940
ITV1 HD:10832:h:0:22000:2362+288:2369:10000
ITV1 HD:10935:v:0:22000:513+289:641:3851

By adding in the correct PMT PIDs, I can then just do mplayer dvb://2@"BBC HD" and it picks the correct video and starts playing! Still has sound drifts though, which I think might be slightly worse than with lavf but I've yet to double-blind test...

Next up, time to make MythTV believe the stream is H264 too!

Mentoring and Incubation at the Apache Software Foundation

I'd like to give my personal view on the Apache Incubator, and how I see my role as a mentor of an incubating project with that. This post explains my views, and while it ought to chime nicely with official policy, there might be the odd error, and so this isn't official foundation policy. But first, a bit of history on how I ended up mentoring a project.

I've been using Apache software for a very long time now, and I'd guess I first played with the webserver back in something like 1996 or 1997, which seems a very long time ago.... I'd say I first properly got involved in Apache with POI when I started with Torchbox. A check of the archives shows I first started contributing patches back in 2003. I stayed around on the POI list after that, contributing more patches and helping out with user questions, and then in 2005 everyone had got fed up of committing my patches without changes, and I was granted committership.

In 2006, I attended my first ApacheCon, and had a great time whilst learning lots. In late 2006, I was elected to the Jakarta PMC, and discovered a whole new world as I learnt about PMCs, the board, how the ASF works and so on. This didn't put me off, and in May 2007 I helped POI leave the Jakarta umbrella, becoming PMC chair. At the 2007 ApacheCon, I'd gone to lots of foundation related talks, and began to learn in detail about "The Apache Way", which is partly how I ended up as the new POI chair. Roll forward to 2009 and I found myself giving talks on "The Apache Way", having in the mean time made member. Late last year I also joined the new Apache Community Development project, helping out with mentoring, outreach and so on.

It's been quite a long journey, from my first involements in apache, through my first commits and on to today. It has taken me a long time to learn about "The Apache Way", to really understand what it means, and get to a position where I'm able to try to help others to learn about it. It's really great to be able to stand up in front of an audience, and talk to them about open source, open development, and apache-style open development, and see that moment when someone gets it, understands part of why it's so powerful. Well, I say stand in front, but quite often it could just be sat in a pub, or even on the grass in a circle, enjoying the Irish sunshine!

Since starting at Alfresco, I've also become involved in the Apache Incubator, and I'm currently a mentor of the Apache Chemistry incubating project. Having explained how I got here, I'd like to look at what I think this role of mentor involves, and using Chemistry as an example.

Firstly, what isn't it? Well, I'd say a mentor isn't a committer. Sure, a mentor can also be a committer, and it's great when that's the case. However, I'd see those as two distinct roles, and you may sometimes need to switch hats. It's great if mentors can commit code, since it helps ensure that if there is any pain from the incubation process, one of the mentors quickly feels it too, but I don't think it's in any way required.

What is in the role of mentor then? I'd say the most important, and over-arching role is to teach the podling (incubating project) about The Apache Way. You can't do this in one go though (see above - it took me at least 4 years to get to the point that I was happy enough with my understanding that I could pass it on!), but you need to be trying to pass it on as and when you can. Partly this means watching the mailing lists, and offering insight and advice when things happen. Or don't happen. Possibly especially when the don't happen...

Within the incubator, various things need to happen whilst the project is there, and others before graduation can occur. As a mentor, you need to help everyone understand why they need to do something, as well as helping them do it. For example, the requirement that as much as possible of the project's communications should be on-list, and off-list things should be relayed back. It's no good just saying "no", you need to explain why, explain what gets missed if you don't, explain the benefits of doing it The Apache Way. With Chemistry, the big test for this was with in-person meetup in Munich back in March. Many of the committers were there, but certainly not all. I explained on-list beforehand why it was important to keep the list updated, then spoke again about it at the meeting. During and after the meeting, everyone who couldn't make it seemed very happy with how it had worked out, and how their views had been included. We also now have a record of those discussions to check back on if an architecture question surfaces again in future. Overall, it seemed to work well!

Three things that often cause issues within the incubator are IP clearance, releases and new committers. With all of these, the podling needs to do some work, and then the incubator PMC needs to sign off on this. Firstly in this then, the role of the mentor is to help the podling do the thing right, pointing that at documentation as needed, advising them on their process, and reviewing what they've done. When everything is fine, you then need to put on your IPMC hat, and approve it. Next, you need to prod your friends within the IPMC to ensure that a quorum approves the action, so the podling isn't stuck waiting. Finally, you need to ensure that the podling understands why and how to do it, because after graduation they'll need to do the same thing again, but without the oversight, so they need to be able to get it right themselves :)

Where does that leave us for defining the role of the mentor? Overall, you're there to help the project learn the apache way, both in what to do and why they should be doing that. You should be helping them when problems come up, and trying to spot problems and head them off before they develop. You should be helping the project to do the steps needed to run and graduate (clearance, releases etc), giving them guidance, advice and voting. You should be there to answer questions. If it all works, then the closer graduation gets, the less you'll need to do, as the closer the project will be to running itself happily in The Apache Way!

One thing I should probably point out at this juncture is about in-person vs remote mentoring. You sign up to mentor the whole podling project (while ComDev do have a formal 1:1 mentoring program, the incubator is about mentoring the project), so much of that mentoring needs to be remote, using mailing lists. However, with Chemistry, I found it very helpful to also have a chance to meet and advice many of the project committers in person. Much as the ASF is a global organisation with proceedures and a way of working that handles everyone being disparate and remote, some parts of teaching the Apache Way work best in person. That's partly why I've helped set up the Local Mentors Program (aka take a new contributor out for a drink to help explain the whole thing to them). With that in mind, I'm tempted to recommend that where possible, at least one of the mentors meets at least some of the committers at least once, probably ideally at an Apache BarCamp or conference.

There are many different ways to run a project, be that an open source project or a closed one. I'm a big believer that open development is the right way for many projects, but I'm also fairly well aware of the cases where it may not be the best. This isn't a post about project management and methodologies, though buy me a beer and I'll happily talk at length on the subject! Instead, I want to finish off with my view on why the Apache Incubator exists.

Within the ASF, all projects should be running to The Apache Way. Now, the Apache Way doesn't cover everything, just certain key areas, so each project is free to make their own decisions on how to do many things. It's part of the beauty of the ASF that different projects do try out different things, and share what works well, even if sometimes this leads to a week of discussion on members@.... However, most projects outside of the ASF don't run to the Apache Way. So, for a project to join the ASF, we need to ensure that the licensing is correct, the IP is properly cleared, and that the project runs itself The Apache Way. That's where the Apache Incubator steps in.

So, my view of the incubator is that it's somewhere to do the IP clearance, it's somewhere to sort out licensing and dependencies, but mostly it's a place to learn the Apache Way. Projects come in, they learn, and then the choose to either tweak themselves to fit the Apache Way, or they say "no thanks, that's not for us" and leave to go on their own way. Seems simple enough, doesn't it? :)

Who drives, directs and supports projects within the Apache Software Foundation

Following on from my previous post on CLAs and Release Votes, I thought I'd do another one on the Apache Software Foundation. This time, it's looking at different roles in directing and supporting a project, and how the ASF deals with corporate contributions.

Note - this is a personal view. While I am a member of and PMC chair within the ASF, it's my view, not an official foundation statement. It's based on lots of discussions, talks and mailing list posts, but please do shout if you think I've got something wrong...

When looking at an Apache project, people will often make distinctions between four kinds of people:

  • Users - anyone who just uses the software, without really interacting with the project beyond that

  • Contributors - anyone who helps out with answering questions, contributing simple patches or tests cases, often but not always on the project mailing lists

  • Committers - people able to commit changes to SVN, both their own changes, and those coming in from the community from Contributors

  • PMC Members - those people tasked with keeping the project on-track, and directed by the ASF board to look after the project on behalf of everyone

It's often said that there is a pyramid of people within an Apache project. At the bottom are very large numbers of users, above them a smaller number of contributors powering the project, then a smaller number of committers making changes to the code, then a smaller number of PMC members overseeing it. If you can get yourself to a The Apache Way talk somewhere (ApacheCon, Apache BarCamp etc), you'll hear a lot more on this, but the important thing is that a project's success is driven by this large base.

Two common questions about an Apache project are: who drives and directs the project, and who keeps it going?

The answer to the 2nd is easy, but perhaps not the one you may think. It's the contributors! Without all the contributors out there, answering the questions of others, helping write examples, providing test cases and the odd patch, evangelising, your project won't grow and maintain itself. This is why you see some projects (eg httpd) rewarding those people who contribute documentation and examples, and not just those who contribute code. Almost all new committers come from the pool of contributors. Most users will hit a problem at some point, and it's through the work of the contributors in blog posts, mailing list answers, documentation or examples that help them solve their issue and carry on. Of course, on every project there are committers doing the same thing, but in most cases they started doing all that before as contributors, and it's the work done as contributors that keeps the project going!

Now, what about who drives the project? On most projects, at least some of the people, and possibly almost all of them will be there in their role as employees of a company. How should committers (and pmc members) handle the need to wear two hats, those of an employee and as a committer to the project?

Within the project, new features are added by whoever does the work. This might seem obvious, but it's worth remembering that there's no magic coding fairy secretly committing while we all sleep - all new features and bug fixes come from someone spending time at a keyboard working on them. In some cases, that's someone working on it in their own time. More often though, it's someone working on company time, usually on a project that is important to their employer.

So, we can see that on many projects, the new features in the project will be those which matter to the companies providing the employees to work on them. But, does that mean that those companies get to set the direction of the project? In some foundations, the answer would be yes, but at Apache, the answer is a little more complex. It's maybe, but only if the community agrees!

How does this work then? Generally, people should announce in advance what they're working on, and seek feedback on it. That could mean opening a JIRA for the new feature, describing it, then later attaching the patch to it. It could be a quick post to the mailing list, followed by a later commit. For a very uncontentious change, it could even be committing the patch and describing it in the commit message, assuming that your project allows commit-then-review for that sort of change. Generally though, for a company-driven change, a committer will announce what the new feature they're planning to work on.

Once the new feature has been described, what then? It could be that everyone feels this is a useful addition, and +1's it. That's the ideal case - your company's needs closely match those of everyone else, and everyone's happy! Much more commonly, people will say things like "that isn't what I'd choose to work on first, but it's useful, so go for it" or "I wouldn't use that, and wouldn't code it, but I can see the value for others, so go for it". These would be more of a +0, hinting that while the change is good for the project, it isn't something that others care deeply about (which is perhaps why no-one has done it before...) In all these cases though, the community have agreed that the change that your employeer needs fits with their needs too.

In a few cases, you'll get someone saying "that's a stupid idea, you should be working on the thing that I need instead!". That's usually the point that you need someone else to step in (so as not to start a flame war), and explain that Apache is a participatory meritocracy. If there's a feature you want, you have three choices - code it yourself, pay someone to code it for you, or inspire someone to code it for you. No-one's under any obligation to write something for you, everything's done by people who want to do so!

Now, what about the case when you describe a new feature that you want to work on (either because your employer wants you to, or even just you want to in your own time), and the community isn't keen? That might be because it will have wide ranging changes, or introduce new dependencies, or will work using a new methodology, or half a dozen other things. Well, firstly, the thing to do is to hold fire on your changes. Instead, you need to make your case, probably on the mailing list, and explain why you feel your change is the right one to do. With any luck, people will suggest alternate ways of doing things without the problems, and you can get the new feature you need whilst also having the community support you.

What happens though if after that discussion, you haven't convinced the community? Generally, that's the time to try it anyway, but not on the main codebase! Let everyone know you're trying it out, and do so on a branch, or in an external SCM, depending on the scope of the change. Try to let the community see what you're up to. Then, when you're done, come back and make your case again. Hopefully now things will be easier, because you can point to real code, and say things like "there had been concern that the new configuration system would slow down the core, but if you try it with this benchmark, you'll see it's actually 5% faster". It could be that when people see the new code, they'll agree, you have a quick vote, commit the change, and everyone's happy :)

And if even having seen the code, the community doesn't agree? Well, sorry, but you'll need to maintain a vendor fork at this point. After all, your company needs it, and the community doesn't want it, so it'll be up to you to maintain. Next time, try to make the case to the community better...

In general though, the project moves in the direction of those with the time to code. When those people are working on company time, the project will tend to move in the direction that that company wants it to. However, that's only the case while the community agrees with the direction, and if the company tries to push something the community doesn't want, that all changes.

Apache - powered by the people who do the work, but directed for and by the community!

CLAs and Release Votes within the Apache Software Foundation

There has been a fair bit of discussion of late, both within the Apache Software Foundation and outside of it, about whether some of the processes and proceedures are a help or a hindrance. With this in mind, I thought I'd write something about how I see things.

Note - this is a personal view. While I am a member of and PMC chair within the ASF, it's my view, not an official foundation statement. It's based on lots of discussions, talks and mailing list posts, but please do shout if you think I've got something wrong...

There are three main things that people seem to moan about around the ASF. The first I won't really dwell on, that of "why can't I use ${latest_tool_of_the_moment}. As the hard-worked ASF infrastructure crew will happily tell you, you can use anything that can be supported, maintained, bug fixed and generally looked after 24/7 by volunteers scattered around the globe. If you want something new, be prepared to help look after it!

The other two main things seem to be around CLAs, and release voting. These two offer some very important protections for everyone - users, developers, committers and the ASF. The need for these are documented at, but as there's a lot to read there, I'll try to give my view on why they matter.

Firstly, a question. What are two of the easiest ways to close down an open source project?

The answers, I feel, are to either sue the release manager, or to sue whoever holds the source code.

For the first, think how long your project would last if the release manager was sued for a very large sum of money, and anyone else who might consider signing up was threatened with a similar lawsuit too? With no releases, one key community member out of action through a lawsuit, and everyone else worried, the answer is alas not all that long. Personally, I think it's great that I'm both allowed (and encouraged!) to work on open source at work, whilst also well enough paid that I can afford to spend part of my free time coding, mentoring etc within open source. However, it does also mean I have enough to loose... And having been an Apache release manager in the past (for Apache POI), I'm very glad that the ASF protected me. But how?

If you do a release of some open source code on your own, and someone has it in for your project, then you could be at risk. One of the key reasons for founding the ASF was to provide protection for this. What you want, should something bad like this to ever happen, is for some ASF lawyers to turn up in court and say "stop suing this poor contributor, sue us if you're going to sue anyone", and have the judge agree. For that to work, the release needs to have been done by the ASF. That's where the release votes and rules come in.

What you're after, therefore, is for the ASF to stand up and say "that's our release". The ASF wants lots of good open source software to be out there and being used, so wants to support your release, but tat the same time, doesn't want to take needless risks. The release requirements are therefore basically that enough people who provide oversight have OK'd the release, and that as far as is possible, all the code in the release is allowed to be. Therefore, we see the requirement that 3 PMC members have +1'd the vote, the source has appropriate headers, notice files and licenses are clear, the build is open and repeatable, and dependencies are all ok. (The PMC, or project management committee, is the ASF board's eyes and ears on the project). Largely above and beyond that, what each project wants to do in terms of releases is up to them. Things like betas, release candidates and numbering all lies within the project, based on how the community wishes things to work.

Now, on to the CLAs, or Contributor License Agreements. There are two of these, the iCLA (for individuals), and the CCLA (for companies). These are actually fairly simple documents, and all relate to the core Apache License v2.0. For an individual, it is basically "I understand what it is to release software under the apache license, and what rights I wave to do so, and I'm happy and able to do so". For a company, it's basically "I understand what it is to release software under the apache license, and what rights I wave to do so, and I'm happy for my employees to do so". These provide the foundation, and the software users (the community) with the protection that someone can't turn around a year later and say "I know I said I was contributing under the apache licenese, but I've changed my mind, and now I'm going to sue you".

Many other foundations are happy to accept contributions under their licenses (be that the ASL, GPL or whatever) without requiring that those making significant contributions formally state (via a CLA) that they understand the license, and are OK with their contributions being under it. However, within the ASF, we like to be very sure on these things (which is part of what allows our software to be used so widely without lots of legal checks being required first). As such, we ask people to file a CLA before they contribute, just to confirm they're happy with how their contributions will be used.

So, we see two key parts of Apache policy, the release process (checks and votes), and the CLAs, are both there to protect the project and the community. They happen to protect the ASF too, but they're mostly about protecting everyone!

As an aside, if much of this is new to you, I can't recommend enough attending a "The Apache Way" talk somewhere. You'll find them at all Apache Cons, at all Apache Bar Camps, and many other things, and are a great way to learn about things like this, both what the proceedures are, and why they exist. It's much of how I learnt these things!

Compiling for your N900 on debian

I've just got my Nokia N900, and while the phone-phone synchronisation pulled my calendar entries and contacts over from my N95 just fine, it didn't do the sms's.

Since it runs linux, the fact that the official software doesn't support it isn't the end of the world. Handily, someone's written a small sms importer program that reads in the sms's from the S60 pc-suite csv export, and loads them onto the phone.

Only snag is the pre-built version didn't support long sms's, so I needed to patch and re-compile. That means needing an arm cross-compiler, the build environment etc.

Handily, that's fairly easy to setup. The best guide I found was, which got me almost all the way there.

One thing to note is that you need to install the "Nokia Binaries" if you want to do much development, which includes various key system components and dev libraries that aren't open source. has what I found to be a slightly easier guide to follow on that.

Now I'm about ready to try my newly compiled smsimporter. A couple of commands that I've found to be helpful are:
  • sb2 -eR apt-get install [pkg] - installs the given arm native package in the build root

  • maemo-sdk enter devel - enter a shell suitable for compiling for native arm

Hopefully I'll be knocking up a few more little bits of code on the weekend :)

Nearly everything runs linux

My new phone runs Linux, which is great. So, I thought I'd do a quick roundup of all my computing devices, and see how close I am to all Linux:

  • Desktop - Debian Stable

  • New work laptop - Ubuntu Karmic

  • Dell Mini laptop - Ubuntu Karmic (netbook edition)

  • Media PC - Ubuntu Karmic (mythbuntu edition)

  • Nokia N900 - Maemo Linux

  • 2nd wireless access point - tomato (linux)

  • Apple TV - stripped down OSX, so Unix but not Linux

  • ADSL + wireless router - not sure, but not apparently linux

  • Nintendo Wii - not linux

I think that's about it for computing devices, so not bad all in all. When 802.11n adsl routers that run Linux come down in price a bit more, I'll ditch the current cheap'n'chearful (but with a few annoying bugs) no-name router I have for one of them, and I'll almost be there!