Saturday, September 8, 2007

Wanna help science? Study your greylists' innards!

If somebody, say five years ago, had told me that I would be spending a little time, every day, studying data about what invalid addresses some unknown miscreants are making up in my domains, I would have thought them to be slighly off their rockers.

Yet here I am, actually maintaining a publicly available list of addresses which do not stand a chance of becoming valid, ever. It all started with a log data anomaly - I noticed an increase in the number of failed delivery messages to non-existent addresses in our domains. I had expected that the bounces to invalid addresses would appear for a short period only, but for one reason or the other it looks like it's here to stay, with some dips and peaks like the ehtrib.org flood.

The list is apparently working as intended too. These addresses are on my local greytrap list, and I have started seeing addresses I put in there as all uppercase turn up in my logs in all lowercase variants. Fun to watch, sort of.

Anyway, the supply of new bogus addresses proved to be larger than I had expected. So to get a handle on just what is happening I ended up doing periodic dumps of the live greylist data. This is really easy to do if you're using spamd as your greylister, your basic command is

$ sudo spamdb | grep GREY

and you redirect to a file, pipe to mail, or whatever you like.

Now if you're a bit like me, looking for patterns in the noise like this makes you feel a little weirder than usual and possibly lead you to think of a Clive Barker novel (specifically the bits about the dead letter file in The Great and Secret Show) and you wonder why this is worth doing at all. After all there is precious little spam that actuall reaches my users, so like I said earlier, for us spamd users it really looks like spam is a solved problem. I guess I'm just a bit fascinated by the pure irrationality of the spammers' behavior.

From the data I collect here in my tiny corner of the world to browse when time allows there may be useful information lurking somewhere.

Typical entries show things like the host 202.152.33.43 tried to send with a From: address jcejft@charter.com to dkqvujfn@datadok.no and sdenuuu@datadok.no. Using a few common networking commands we see that there is no reason why charter.com email should come from the IP range belonging to idola.net.id, and as the admin of datadok.no I know these two addresses have never been deliverable. Most likely the admin at charter.com can tell you if that from address is deliverable, but I keep wondering how much of the spam out there is stuffed into the pipe with bogus From: and To: addresses both. Or in other words, purely useless noise, never to be delivered anywhere.

On a side note, with one or more of the spammer operations trying to sneak through using sender and recipient addresses in the target domain, I assume it is just a matter of time before I see a tuple with both sender and recipient addresses already in my spamtraps list. When that happens, I think I will feel inclined to let my friends have a round of refreshments on my tab.

It's obvious that there are a handful of spammer operations that have decided to use datadok.no (and to a lesser extent, dataped.no and ehtrib.org) From: addresses on the spam they send, apparently in an attempt to cover their tracks. I will probably never know why they decided to do that, but I wonder why they keep it up and for that matter how many other domains are seeing this, with bounces from strange places, directed at non-existent, fairly obviously generated bogus addresses.

So if you are seeing similar stupidity in your logs and if you are running a sensible greylister such as spamd, I would be interested in hearing from you so we can compare notes.

Out there in meatspace, EuroBSDCon 2007 is coming up. I'll be there with the PF tutorial on Wednesday. This Friday's deadline for an updated manuscript had totally slipped my mind (I blame the book and a few other, less rational, factors), but hopefully the 24 who signed up for the session will find it useful anyhow - there will be new bits and as much interesting stuff as I can manage. I'll be around for the rest of the conference too, but unfortunately I'll have to give the Legonland trip a miss.

Be seeing you in Copenhagen! The book is getting closer to finished, I promise!

No comments:

Post a Comment

Note: Comments are moderated. On-topic messages will be liberated from the holding queue at semi-random (hopefully short) intervals.

I invite comment on all aspects of the material I publish and I read all submitted comments. I occasionally respond in comments, but please do not assume that your comment will compel me to produce a public or immediate response.

Please note that comments consisting of only a single word or only a URL with no indication why that link is useful in the context will be immediately recycled so those poor electrons get another shot at a meaningful existence.

If your suggestions are useful enough to make me write on a specific topic, I will do my best to give credit where credit is due.