This shows you the differences between two versions of the page.

faq:commonheaders [2008/02/08 19:49] (current)
Line 1: Line 1:
 +===== Why are common headers names marked as a specific bucket? =====
 +You may have noticed that the names of many email headers (//Received//, //Content-Type//, //Date//, //Return-Path//, //Message-Id//) that are common to almost all email are most likely found in a certain bucket.  This is not a problem even if most of them are considered spam words.  In a mature corpus they are not going to be significantly weighted toward any single bucket.  Just because a word is colored for a certain bucket doesn't mean it is not used in classification for other buckets.  And in combination with a few words that more strongly indicate the actual classification the correct bucket is chosen.
 +On some headers the case of the header may indicate something useful.  For example, //header:Message-ID// and //header:Message-Id// or //header:MIME-Version// and //header:Mime-Version// may give you different results.  The //To//, //From//, and //Subject// headers are on the ignore list because they are always there and always in the same form so really aren't useful in classification.
 +In the //Recieved// header's case the number of //Recieved// headers is important.   Depending on how your email is setup a lot of recieved lines may indicate spam, newsletter, or a certain email account.  reach your email server.  So the more recieved headers the more likely the message is to be spam.
faq/commonheaders.txt · Last modified: 2008/02/08 19:49 (external edit)

Should you find anything in the documentation that is incomplete, unclear, outdated or just plain wrong, please let us know and leave a note in the Documentation Forum.

Recent changes RSS feed Donate Driven by DokuWiki
The content of this wiki is protected by the GNU Fee Documentation License