How Apple Mail’s Junk Filter works
Apple Mail’s built-in Junk Filter is a joy to use and one of the app’s most welcome features. But how does it work?
An article on O’Reilly’s mac devcenter web site describes the technology behind the filter and how the filter works. Be amazed by reading about vector representation, semantic analysis and all the science that powers the filter, or just learn how to fine-tune the filter for your own needs.
Joe Kissel provides another explanation of the Junk filter and the tech behind it on the TidBits website.
Tags: Apple Mail, junk filter, latent semantic analysis, lsa, mail.app, spamRelated posts

October 11th, 2005 at 9:11 pm
[...] It uses Bayesian filters which detect spam on a word-for-word frequency basis. This can complement Apple Mail’s inbuilt spam filter which uses a slightly different approach, known as “Latent Semantic Analysis”. [...]
October 12th, 2005 at 3:30 pm
[...] “Under the Hood” is a series of entries like the ones on Mail’s Junk filter and Apple Mail: the Early Years in which I share my attempts to understand better what’s going on behind the scenes in Apple Mail. Corrections welcome. [...]
November 17th, 2005 at 6:59 am
[...] SpamSieve is a third party spam catcher that supplements Apple Mail’s inbuilt filter. It uses Bayesian filtering technology, which differs from Mail.app’s own systen (see “How does Apple Mail’s Junk filter work?”). [...]
December 24th, 2005 at 2:49 am
[...] I’ve blogged two excellent descriptions of the way this filter in Apple Mail before, but today came across a third explanation with the imposing title, “Bayesian Nets, Latent Semantics, Despamming and other speculations”. [...]
July 26th, 2006 at 9:01 am
3 questions:
1. can any of these tools be told to apply its “current configuration” against all mailboxes and subboxes per account regardless of date so that even older email can be “weeded/cleaned”? if mail.app can, it isn’t intuitive. its not mentioned as a selling point anywhere. when dealing with years and years of mail, this is key.
2. does mail.app continue to learn/train itself from where it last left off? why not? why does apple not warn users that turning it back on will reset it if that is what happens?
3. what do i have to do (what plugin to get), to pull off a multi-parameter search from within mail.app so that i can find all mail: a) from xyz and b) to abc and c) after date nn and d) keyword “blah” in subject line? boolean and wildcard ala “google” would be nice too.
* i find spotlight top to bottom another over-rated generally useless mac app so please exclude it from any comparisons/suggestions