Log in

No account? Create an account
Spam Filtering from Perl - Nick [entries|archive|friends|userinfo]

[ website | gagravarr.org ]
[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Spam Filtering from Perl [Mar. 1st, 2004|09:14 pm]
[music |The Who - Quadrophenia - Get Out And Stay Out]

One of the things we do with uni-lifesaving.org.uk is offer email aliases and lists, to make it much easier for clubs to communicate with each other.

The level of spam passing through these aliases is increasing at an alarming rate, which is both pesky and inevitable. Since the machine hosting the domain isn't one over which I have root control, the usual idea of turning on SpamAssassin filtering in exim is out. Instead, I decided to write a quick perl script to be run from the aliases file to handle talking to SpamAssassin.

I thought I had a simple model. Replace an entry like "foo: bar@wibble.somewhere" with one like "foo: | email-filter.pl bar@wibble.somewhere". The perl program would check it hadn't filtered it already (eg for nested aliases), filter through SpamAssassin, and either deliver an email, or chuck it into a mbox file for an admin to check.

I quickly knocked up the framework, and came to flesh it out with code to really filter and write out email. I then hit a few problems: Mail::SpamAssassin->new() pretty much only accepts a Mail::Audit object (it ought to accept a Mail::SpamAssassin::Message object, but most calls to that cause it to barf with "unimplemented base method" everywhere). Mail::Audit assumes you will be using it, and only it. It ought to accept you passing it the email when you create it, but I can't make that work. Oh, and it assumes you'll then use it for delivery. To cap it off, most of the nitty gritty modules I'd need to use lack POD documentation, so you're left trying to read obscure bits of OO perl...

So, my cunning plan to grab the email myself on STDIN, then hand it off looks unworkable (I wanted to do this to check the file size, and check for the already filtered flag without doing too much handling). I will almost certainly have to use Mail::Audit to get SpamAssassin to filter. Then I have to prise the content back out of Mail::Audit so I can use it how I want.

Oh, and why can't I seem to find a nice "append this message to my mbox file, don't try to read it or anything like that, just lock and write" module on CPAN? Lots to read, lots to read and write with more processing that you want, but nothing to do a simple append, bah...

This is one annoying case of all the perl modules fitting too closely together. If you want to use it in the default case, all's well. For anything else, it all comes apart :(

Update: I am aware that Mail::Audit appends to MBox's, but I can't make it play nicely with accepting mail from other than STDIN, so that's out. I also know about Mail::SpamAssassin::NoMailAudit, but again I suffered from an inability to make it play nicely with input from other than STDIN....