|ETag and Gzip support for JoBo
||[Feb. 22nd, 2005|04:20 pm]
JoBo, a very nice Java spidering toolkit. There hasn't been that many updates to it of late, but it is a very nice base to work with.At work, we make use of |
I've recently updated it to support ETags (for conditional fetching, so you can get a 304 response if you have the latest version), and gzip compression. The changes are really small, which shows how good an architecture JoBo has.
I've emailed the changes to the author, but haven't heard back. So, if anyone does want copies of the patches in the mean time, let me know. (It just involves a new callback in HttpDocManager, and some new methods in HttpTool to allow the setting of etag and gzip related headers).
In other programming news, I've been trying to add PowerPoint support to POI. I've got some fairly basic code now, which allows you to find sheets, and extract the text (including unicode text) from them. The code is in bugzilla, but it ought to go into the tree too. With any luck, I might even get made a committer, and get to add lots more support :)