'The very first release of Gmail simply used spamassassin on the backend'
Excellent. Confirming what I'd heard from a few other sources, too ;)

This is a well-written history of the anti-spam war so far, from Mike Hearn, writing with the Google/Gmail point of view:

Brief note about my background, to establish credentials: I worked at
Google for about 7.5 years. For about 4.5 of those I worked on the Gmail
abuse team, which is very tightly linked with the spam team (they use the
same software, share the same on-call rotations etc).

Reading this kind of stuff is awesome for me, since it's a nice picture of a fun problem to work on -- the Gmail team took the right ideas about how to fight spam, and scaled them up to the 10s-of-millions DAU mark. Nicely done.

The second half is some interesting musings on end-to-end encrypted communications and how it would deal with spam. Worth a read...
gmail  google  spam  anti-spam  filtering  spamassassin  history 
september 2014 by jm
SpamAssassin 3.4.0 released
Good to see the guys cracking on without me ;)

'2014-02-11: SpamAssassin 3.4.0 has been released adding native support for IPv6, improved DNS Blocklist technology and support for massively-scalable Bayesian filtering using the Redis backend.'
antispam  open-source  spamassassin  apache 
february 2014 by jm
Peter Norvig writes a program to play regex golf with arbitrary lists
In response to XKCD 1313. This is excellent. It's reminiscent of my SpamAssassin SOUGHT-ruleset regexp-discovery algorithm, described in , albeit without the BLAST step intended to maximise pattern length and minimise false positives
python  regex  xkcd  blast  rule-discovery  spamassassin  rules  regexps  regular-expressions  algorithms  peter-norvig 
january 2014 by jm
Abusing hash kernels for wildly unprincipled machine learning
what, is this the first time our spam filtering approach of hashing a giant feature space is hitting mainstream machine learning? that can't be right!
ai  machine-learning  python  data  hashing  features  feature-selection  anti-spam  spamassassin 
april 2013 by jm
_Fast Cache for Your Text: Accelerating Exact Pattern Matching with Feed-Forward Bloom Filters_ [PDF]
intriguing application of a Bloom Filter optimised for modern CPUs (2-level, with a cache-partitioned first level), providing massive speedups vs GNU grep or trie-based approaches like Aho-Corasick -- or possibly re2c, as used in "sa-compile". On the other hand, a perl implementation of Rabin-Karp, which is similar, didn't perform as well. Still, may be worth investigating
bloom-filters  grep  filtering  spamassassin  sa-compile  text-matching  caches  aho-corasick  from delicious
september 2010 by jm
Exploring the Spam Arms Race to Characterize Spam Evolution
from last week's CEAS conference; research comparing SpamAssassin releases against the evolution of the surrounding spam environment. Nice work, I always wanted to write up something like this (via JD)
spam  anti-spam  ceas  conference  papers  research  spamassassin  adversarial-classification  evolution  arms-race  via:jd  from delicious
july 2010 by jm
ScamNailer - Anti-Phishing Filter
a generated set of SpamAssassin rules containing known-phisher addresses
scams  phishing  spear-phishing  spamassassin  rules  anti-phishing  from delicious
april 2010 by jm
'a Perl script that generates an spreadsheet which loads up SpamAssassin configuration and known spam and ham messages. Once loaded, you can tweak individual SpamAssassin scores in the spreadsheet itself and see their effect on spam/ham classification in real-time. The script also shows you the number of false positives and negatives for a set of scores in real-time.' by Raj Mathur <raju at>
spamtune  spamassassin  rules  scores  optimization  tweaking  openoffice  from delicious
april 2010 by jm
The SAY2K10 bug []
LWN follows up on the FH_DATE_PAST_20XX fiasco. 'It would appear that what SpamAssassin needs is some dedicated maintenance talent which is not dependent on evening hours put in by developers committed to other projects.' I wish
spamassassin  say2k10  bugs  maintainance  lwn  commentary  from delicious
january 2010 by jm
RegExr: Online Regular Expression Testing Tool
a very nice interactive editor in Flash, supporting lots of the usual perlish stuff. via Joe
via:jdrumgoole  regexps  regular-expressions  spamassassin  rule-dev  flash  regex  flex  utilities  from delicious
december 2009 by jm
Anti Spear-phishing SpamAssassin ruleset
from Julian "MailScanner" Field (via the SA users list)
spamassassin  anti-spam  rulesets  sa-update  phishing  blocklists 
august 2009 by jm
glTail.rb - realtime logfile visualization
'View real-time data and statistics from any logfile on any server with SSH, in an intuitive and entertaining way', supporting postfix/spamd/clamd logs among loads of others. very cool if a little silly
dataviz  visualization  tail  gltail  opengl  linux  apache  spamd  spamassassin  logs  statistics  sysadmin  analytics  animation  analysis  server  ruby  monitoring  logging  logfiles 
july 2009 by jm

