Re: TRGP, diat the octopus
Dec. 30th, 2003 12:02 pmwrench ceil vinyl bellow cheater irreproachable shulman bismark auxiliary sanderson melanie euridyce
analyst beatify carpet vestigial mathematician guy chrysler cartilage pentane pm termite uk regretting ironic alluvium glory ingrown advise bellum deacon dashboard deportation intercom taught voluntarism ferret come nanette quart
Sometimes, ya just gotta open spam.
Hey hackers - anyone know why one gets these lists of random, rather erudite words in spam, now?
analyst beatify carpet vestigial mathematician guy chrysler cartilage pentane pm termite uk regretting ironic alluvium glory ingrown advise bellum deacon dashboard deportation intercom taught voluntarism ferret come nanette quart
Sometimes, ya just gotta open spam.
Hey hackers - anyone know why one gets these lists of random, rather erudite words in spam, now?
no subject
Date: 2003-12-30 09:05 am (UTC)no subject
Date: 2003-12-30 09:11 am (UTC)The batch of random crap at the start/end of messages started in Usenet spam but it used to be a block of random characters rather than random words (this stopped the "this identical message has been posted X thousand times" test). The detection software got better and spammers escalated to lists of words since it looks like useful text (without the common keywords for spam) and makes it look more like a real email and not spam.
no subject
Date: 2003-12-30 09:38 am (UTC)no subject
Date: 2003-12-30 09:08 am (UTC)Arms race, anyone?
-Spike
PS. I'm waiting for them to start using this sort of thing.
no subject
Date: 2003-12-30 09:30 am (UTC)I've been seeing spam with markov-chain randomtext for at least 12 months now. :(
no subject
Date: 2003-12-30 10:12 am (UTC)no subject
Date: 2003-12-30 10:24 am (UTC)More than you probably ever wanted to know about spam fighting.
Date: 2003-12-30 09:14 am (UTC)I'm pretty sure it is an attempt to get around anti-spam software, since most everything in spam other than the ad itself is there for that reason. Most anti-spam software looks for keywords in the message but also looks at the rest of the message to try to prevent "false positives" (mail that gets blocked but isn't spam). Unlike medical tests, false positives are worse than false negatives; blocking that important mail from your client is worse than "you need another round of tests" from the doctor.
So the spammers both break up the keywords (or spell them in 133t-speak) and pack the message with important-sounding words (and names) to try to fool the software. And the battle gets escalated to the next level.
It will be interesting to see if the anti-spam law that just passed will be effective at all.
whee!
Date: 2003-12-30 10:21 am (UTC)deportation intercom taught voluntarism
vestigial mathematician guy
and
regretting ironic alluvium glory
Re: whee!
Date: 2003-12-30 11:14 am (UTC)no subject
Date: 2003-12-30 10:46 am (UTC)no subject
Date: 2003-12-30 07:22 pm (UTC)This is the style of statistical analisys people have been mentioning.
Most ammusingly, the filters best capable of catching all of these (without false positives), seem to be the bayesian filters they're targeted at. I believe the best, currently, is spamprobe (which does multi-word tokens, and parses most of the header).
This stuff has been one of my obsessions lately, so if you have more questions feel free to ask. I've been running spamassassin an spamprobe together with some damn impressive results: http://www.chaosreigns.com/sa_sp/
Spamprobe has been perfect for a while, spamassassin misses some spam (with its own bayesian abilities disabled because they're inferior to SP's and I don't believe it's worth maintaining 2 bayesian databases), but I like having two for extra reliability. My goal is to never have to read my spam folder, and only read the folders which contain good mail, and the ones containing the mail SA and SP disagree on.
Somebody really needs to write a tutorial on bayesian statistics targeted at people writing spam filters.
spam spam spam spam
Date: 2003-12-30 09:05 pm (UTC)luv James.