A user forwarded me a particular annoying bit of spam the other day that I realised is going to be quite hard to combat.
The email was sent from a Hotmail account. Clearly the spammers have broken the Hotmail CAPTCHA process (again), and thus are signing up 10,000’s or more accounts to send their spam. The main issue is that it means there’s no easy “source IP” to test against RBLs for blocking or scoring purposes. Hotmail does add a “X-Originating-IP” header, but that’s non-standard and for the cases I’ve seen, the IPs are not on any known black lists.
This actually seems quite an effective process for spammers. Using new spambot compromised machines to only send via reputable services like Hotmail, Yahoo, etc. Basically I believe most RBLs are built using systems that only check against the original incoming SMTP connection (either at the SMTP stage, or via some feedback process that later scans back through the Received headers). They generally don’t look at custom headers like "X-Originating-IP". So even if spam checking software does check that header, not much RBL building software will, so as long as the spammer can keep those IPs so they’re only used for sending via other "trusted" services, the IPs will probably stay off RBLs for a long time.
Given the constant battle Hotmail, Yahoo, Gmail, etc have stopping mass signups, CAPTCHAs days seem numbered. Already in some cases, Google have started requiring SMS verification for new gmail accounts, I expect this trend to spread to other services and companies over time as the CAPTCHA systems employed to try and stop abuse appear to be less and less effective every day.
- The email contained a bunch of random text. Also not unusual, but it makes any content analysis basically impossible
- The email contained a link to a public Google Docs page. Again, clearly spammers have broken the Google CAPTCHA process to signup masses of Google Docs accounts and fill with their spam landing pages. Again this means that URIBLs are ineffective against these types of emails because they can’t go and block Google Docs domains.
The net result was that the emails in question contained very little information to block against. Some composite rules could be created (eg from a Hotmail account, with a Google Docs link in it), but they’re clearly far too broad and likely to result in many false positives.
At the moment, the main things we can do about this are:
- Report the emails as spam to providers like Spamcop and others. This should both end up reflecting badly on the services that are being abused, but should also encourage improvements to make sure they do look for X-Originating-IP headers and the like to help build IP RBLs
- Report the Google Docs pages as abuse. I’d hope Google have good internal systems to handle this, so that if a bunch of pages are reported as abuse, they can track down similar pages and disable them and the associated signups as well