Why doesn't word salad work?

Spammers have started using ” word salad” to get past spam filters. Spammers either randomly use a lot of unspammy dictionary words or they just load up the email with made up words. Sometimes it may also be as a news article or section of text from a book.

Using lots of uncommon words might be effective against some other filters if they consider unknown words as non-spammy. But instead of treating them as non-spammy, POPFile considers unknown words as unlikely to be in any bucket by assigning them a very small value depending on the size of the bucket. The key thing is unseen words are rougly equally weighted (based on bucket size) between the various buckets, so their effect is neutral.

In POPFile everyone's spam and nonspam words are specific to their own email so loading messages with word salad is not effective. Spammers aren't able to find words that are going to be non-spammy for everyone. Often simple words that seem non-spammy are actually spammy. By coincidence, our simple example word is “simple.” It was brought up in discussion of word salad and had widely varying spamminess. For four out of seven users who checked, it was a good spam indicator.

User Status of the word 'simple'
Brian very spammy, 0.82 probability
James low probability
Jeremiah spammy
Jim spammy
Joseph far higher probability in school mail, 0.64
Robbie spammy with 0.81 probability
Troy didn't appear at all in any bucket

John indicated that a _random_ (essentially worst-case or “brute force”) word-salad attack worked in some small percentage of cases in his presentation at the 2004 MIT Spam Conference.

The main point is that may be a possibility to get a small percentage of messages through to a small percentage of people by using lots of word salad. But then, how are they going to advertized their enlargment pills? Its not very effective spam if it doesn't include a URL so that will still be there. And don't forget email headers also heavily contribute to classification in POPFile.

Also See: NewWords

 
faq/wordsalad.txt · Last modified: 2009/09/26 21:19 by xuesheng

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License