jm + anti-spam   43

SpamAssassin is back []
The SpamAssassin 3.4.2 release was the first from that project in well over three years. At the 2018 Open Source Summit Europe, Giovanni Bechis talked about that release and those that will be coming in the near future. It would seem that, after an extended period of quiet, the SpamAssassin project is back and has rededicated itself to the task of keeping junk out of our inboxes.

This is good to see! Also, newsy thread:
spamassassin  open-source  oss  anti-spam 
4 weeks ago by jm
GCHQ's Spam Problem
'“Spam emails are a large proportion of emails seen in SIGINT [signals intelligence],” reads part of a dense document from the Snowden archive, published by Boing Boing on Tuesday. “GCHQ would like to reduce the impact of spam emails on data storage, processing and analysis.”' (circa 2011). Steganography, anyone? (via Tony Finch)
spam  anti-spam  gchq  funny  boing-boing  sigint  snowden  surveillance 
february 2016 by jm
Exclusive: Snowden intelligence docs reveal UK spooks' malware checklist / Boing Boing
This is an excellent essay from Cory Doctorow on mass surveillance in the post-Snowden era, and the difference between HUMINT and SIGINT. So much good stuff, including this (new to me) cite for, "Goodhart's law", on secrecy as it affects adversarial classification:
The problem with this is that once you accept this framing, and note the happy coincidence that your paymasters just happen to have found a way to spy on everyone, the conclusion is obvious: just mine all of the data, from everyone to everyone, and use an algorithm to figure out who’s guilty. The bad guys have a Modus Operandi, as anyone who’s watched a cop show knows. Find the MO, turn it into a data fingerprint, and you can just sort the firehose’s output into ”terrorist-ish” and ”unterrorist-ish.”

Once you accept this premise, then it’s equally obvious that the whole methodology has to be kept from scrutiny. If you’re depending on three ”tells” as indicators of terrorist planning, the terrorists will figure out how to plan their attacks without doing those three things.

This even has a name: Goodhart's law. "When a measure becomes a target, it ceases to be a good measure." Google started out by gauging a web page’s importance by counting the number of links they could find to it. This worked well before they told people what they were doing. Once getting a page ranked by Google became important, unscrupulous people set up dummy sites (“link-farms”) with lots of links pointing at their pages.
adversarial-classification  classification  surveillance  nsa  gchq  cory-doctorow  privacy  snooping  goodharts-law  google  anti-spam  filtering  spying  snowden 
february 2016 by jm
UK's ICO spam regulator even more toothless now
We appealed this decision, but on June 2014 the Upper Tribunal agreed with the First-tier Tribunal, cancelling our monetary penalty notice against Niebel and McNeish, and largely rendering our power to issue fines for breaches of PECR involving spam texts redundant.

This is pretty terrible. The UK appears to have the weakest anti-spam regime in Europe due to the lack of powers given to ICO.
ico  anti-spam  uk  law  regulation  spam  sms 
september 2014 by jm
SI336 - current Irish anti-spam law
"European Communities (Electronic Communications Networks and Services) (Privacy and Electronic Communications) Regulations 2011". Spam is covered under 13.1, "Unsolicited communications", on page 16 of this PDF
spam  anti-spam  law  ireland  eu  ec  sms  email  si336  privacy  regulation 
september 2014 by jm
'The very first release of Gmail simply used spamassassin on the backend'
Excellent. Confirming what I'd heard from a few other sources, too ;)

This is a well-written history of the anti-spam war so far, from Mike Hearn, writing with the Google/Gmail point of view:

Brief note about my background, to establish credentials: I worked at
Google for about 7.5 years. For about 4.5 of those I worked on the Gmail
abuse team, which is very tightly linked with the spam team (they use the
same software, share the same on-call rotations etc).

Reading this kind of stuff is awesome for me, since it's a nice picture of a fun problem to work on -- the Gmail team took the right ideas about how to fight spam, and scaled them up to the 10s-of-millions DAU mark. Nicely done.

The second half is some interesting musings on end-to-end encrypted communications and how it would deal with spam. Worth a read...
gmail  google  spam  anti-spam  filtering  spamassassin  history 
september 2014 by jm
Fighting spam with BotMaker
Some vague details of the antispam system in use at Twitter.
The main challenges in supporting this type of system are evaluating rules with low enough latency that they can run on the write path for Twitter’s main features (i.e., Tweets, Retweets, favorites, follows and messages), supporting computationally intense machine learning based rules, and providing Twitter engineers with the ability to modify and create new rules instantaneously.
spam  realtime  scaling  twitter  anti-spam  botmaker  rules 
august 2014 by jm
Next clothing retailer loses appeal over €100 fine in ‘spam’ case - Crime & Law News from Ireland & Abroad | The Irish Times - Wed, Mar 19, 2014
as TJ McIntyre noted: '€100 fine for a repeat spammer. Data Protection Commissioner calls this "strong protection". With a straight face.'

Next will doubtless fork over the 100 Euros out of the petty cash drawer, then carry on regardless. This isn't a useful fine. What a farce...
cheap  farce  dpc  data-protection  privacy  anti-spam  next  spam  convictions  fines  ireland 
march 2014 by jm
Storm at - London Storm Meetup 2013-06-18
Not just a Storm success story. Interesting slides indicating where a startup *stopped* using Storm as realtime wasn't useful to their customers
storm  realtime  hadoop  cascading  python  cep  anti-spam  events  architecture  distcomp  low-latency  slides  rabbitmq 
october 2013 by jm
Even the NSA is finding it hard to cope with spam
3 new Snowden leaks, covering acquisition of Yahoo address books, buddy lists, and email account activity, and how spammer activity required intervention to avoid losing useful data in the noise
spam  spammers  nsa  snowden  leaks  anti-spam  yahoo  im  mail 
october 2013 by jm
How not to stop spammers
Spam Arrest is a company that sells an anti-spam service. They attempted to sue some spammers and, as has been widely reported, lost badly. This case emphasizes three points that litigious antispammers seem not to grasp:

Under CAN SPAM, a lot of spam is legal.
Judges hate plaintiffs who try to be too clever, and hate sloppy preparation even more.
Never, ever, file a spam suit in Seattle.
anti-spam  spam  law  seattle  us  can-spam  spamarrest  sentient-jets 
september 2013 by jm
Persuading David Simon (Pinboard Blog)
Maciej Ceglowski with a strongly-argued rebuttal of David Simon's post about the NSA's PRISM. This point in particular is key:
The point is, you don't need human investigators to find leads, you can have the algorithms do it [based on the call graph or network of who-calls-who]. They will find people of interest, assemble the watch lists, and flag whomever you like for further tracking. And since the number of actual terrorists is very, very, very small, the output of these algorithms will consist overwhelmingly of false positives.
false-positives  maciej  privacy  security  nsa  prism  david-simon  accuracy  big-data  filtering  anti-spam 
june 2013 by jm
Spamalot reigns: the spoils of Ireland’s EU kingship | The Irish Times - Thu, Jun 13, 2013
The spam presidency. As European citizens are made the miserable targets of unimpeded “direct marketing”, that may be how Ireland’s stint in the EU presidency seat is recalled for years to come.
Under the guiding hand of Minister for Justice Alan Shatter, the Council of the European Union has submitted proposals for amendments to a proposed new data protection regulation, all of which overwhelmingly favour business and big organisations, not citizens.
The most obviously repugnant and surprising element in the amendments is a watering down of existing protections for EU citizens against the willy-nilly marketing Americans are forced to endure. In the US there are few meaningful restrictions on what businesses can do with people’s personal information when pitching products and services at them.
In the EU, this has always been strictly controlled; information gathered for one purpose cannot be used by a business to sell whatever it wants – unless you have opted in to receive such solicitations. This means you are not constantly bombarded by emails and junk mail, nor do you get non-stop phone calls from telemarketers.
Under the proposed amendments to the draft data protection regulation, direct marketing would become a legal form of data processing. In effect, this would legitimise spam email, junk print mail and marketing calls. This unexpected provision signals just how successful powerful corporate lobbyists have been in convincing ministers that business matters more than privacy or giving citizens reasonable control over their personal information.
Far worse is contained in other amendments, which in effect turn the original draft of the regulation upside down.

Fantastic article from Karlin Lillington in today's Times on the terrible amendments proposed for the EU's data protection law.
eu  law  prism  data-protection  privacy  ireland  ec  marketing  spam  anti-spam  email 
june 2013 by jm
Council of the European Union Releases Draft Compromise Text on the Proposed EU Data Protection Regulation
Oh god. this sounds like an impending privacy and anti-spam disaster. "business-focussed":
Overall, the [Irish EC Presidency’s] draft compromise text can be seen as a more business-focused, pragmatic approach. For example, the Presidency has drafted an additional recital (Recital 3a), clarifying the right to data protection as a qualified right, highlighting the principle of proportionality and importance of other competing fundamental rights, including the freedom to conduct a business.

and some pretty serious relaxation of how consent for use of personal data is measured:

The criterion for valid consent is amended from “explicit” to “unambiguous,” except in the case of processing special categories of data (i.e., sensitive personal data) (Recital 25 and Article 9(2)). This reverts to the current position under the Data Protection Directive and is a concession to the practical difficulty of obtaining explicit consent in all cases.

The criteria for valid consent are further relaxed by the ability to obtain consent in writing, orally or in an electronic manner, and where technically feasible and effective, valid consent can be given using browser settings and other technical solutions. Further, the requirement that the controller bear the burden of proof that valid consent was obtained is limited to a requirement that the controller be able to “demonstrate” that consent was obtained (Recital 32 and Article 7(1)). The need for “informed” consent is also relaxed from the requirement to provide the full information requirements laid out in Article 14 to the minimal requirements that the data subject “at least” be made aware of: (1) the identity of the data controller, and (2) the purpose(s) of the processing of their personal data (Recitals 33 and 48).
anti-spam  privacy  data-protection  spam  ireland  eu  ec  regulation 
june 2013 by jm
EDRI's comments on EU proposals to reform privacy law
Amendments 762, 764 and 765 in particular seem to move portions of the law from "confirmed opt-in required" to "opt-out is ok" -- which sounds like a risk where spam and unsolicited actions on a person's data are concerned
law  privacy  anti-spam  eu  spam  edri 
june 2013 by jm
EU Council deals killer blow to privacy reforms
'In an extraordinary result for corporate lobbying, direct marketing would by default be considered a legitimate data process and would therefore – by default – be lawful.'
eu  politics  data-protection  privacy  anti-spam  spam  eu-council  direct-marketing 
june 2013 by jm
Abusing hash kernels for wildly unprincipled machine learning
what, is this the first time our spam filtering approach of hashing a giant feature space is hitting mainstream machine learning? that can't be right!
ai  machine-learning  python  data  hashing  features  feature-selection  anti-spam  spamassassin 
april 2013 by jm
Clairvoyant Squirrel: Large Scale Malicious Domain Classification
Storm-based service to detect malicious DNS domain usage from streaming pcap data in near-real-time. Uses string features in the DNS domain, along with randomness metrics using Markov analysis, combined with a Random Forest classifier, to achieve 98% precision at 10,000 matches/sec
storm  distributed  distcomp  random-forest  classifiers  machine-learning  anti-spam  slides 
february 2013 by jm
Practical machine learning tricks from the KDD 2011 best industry paper
Wow, this is a fantastic paper. It's a Google paper on detecting scam/spam ads using machine learning -- but not just that, it's how to build out such a classifier to production scale, and make it operationally resilient, and, indeed, operable.

I've come across a few of these ideas before, and I'm happy to say I might have reinvented a few (particularly around the feature space), but all of them together make extremely good sense. If I wind up working on large-scale classification again, this is the first paper I'll go back to. Great info! (via Toby diPasquale.)
classification  via:codeslinger  training  machine-learning  google  ops  kdd  best-practices  anti-spam  classifiers  ensemble  map-reduce 
july 2012 by jm
IETF expedited publication of RFC6449 before J.D. Falk passed away
I had no idea JD was sick. Very saddened to hear about this, he was a nice guy and a great member of the anti-spam community :(
jd-falk  death  cancer  rfcs  ietf  anti-spam  people 
november 2011 by jm
feedback loop n-gram analyzer
'a simple parser of ARF compliant FBL complaints, which normalizes the email complaints and generates a 6-tuple n-gram version of the message. These n-grams are stored in a Redis database, keyed by the file in which they can be found. An inverse index also exists that allow you to find all messages containing a particular n-gram word.'
anti-spam  spam  fbl  feedback  filtering  n-grams  similarity  hashing  redis  searching 
september 2011 by jm
good wiki tracking spam operations, their current campaigns, who's doing it etc.
wiki  spam  anti-spam  from delicious
february 2011 by jm
Changes at
DNSWL will charge for subscriptions to "heavy" users and anti-spam vendors
dnswl  dns  whitelists  dnsbls  filtering  anti-spam  from delicious
october 2010 by jm
Exploring the Spam Arms Race to Characterize Spam Evolution
from last week's CEAS conference; research comparing SpamAssassin releases against the evolution of the surrounding spam environment. Nice work, I always wanted to write up something like this (via JD)
spam  anti-spam  ceas  conference  papers  research  spamassassin  adversarial-classification  evolution  arms-race  via:jd  from delicious
july 2010 by jm
Tesco fined for sending junk e-mail
first successful conviction under Irish anti-spam laws -- for a whopping, er, 2,000 Euros. at least it only took 2 complaints from 2 customers each (via Brian Nisbet)
dpc  anti-spam  ireland  law  tesco  prosecutions  convictions  via:bnisbet  from delicious
july 2010 by jm
Hadoop and the fight against shape-shifting spam
Yahoo! anti-spam engineers talk about their extensive use of Hadoop and scale
hadoop  yahoo  anti-spam  filtering  from delicious
june 2010 by jm
BBC News - How spam filters dictated Canadian magazine's fate
the Canadian mag "The Beaver" is changing its name due to broken filters' false positives. Bennett Haselton reckons that there's no incentive to fix FPs, which as Henry Stern notes isn't the case
anti-spam  false-positives  beaver  canadia  canada  bbc  from delicious
march 2010 by jm
TLS-encrypted spam
the Rustock botnet is now attempting TLS encryption of spam delivery sessions
tls  rustock  botnets  anti-spam  mailchannels  ssl  encryption  from delicious
march 2010 by jm
RFC 5782 - DNS Blacklists and Whitelists
John Levine gets DNS*Ls standardized, at last. we should really check SpamAssassin to see if it's compliant, I guess ;)
dnsbls  anti-spam  dnswl  dnsbl  rfcs  standards  via:fanf  from delicious
february 2010 by jm
Inside View from Ireland: Analysing Electronic Forensics Evidence
fascinating note from Bernie Goldbach: 'MORE THAN 20 YEARS ago, I worked with message traffic and the work told me the importance of verifying source material.'
bernie  spam  anti-spam  authentication  spoofing  security  phishing  from delicious
february 2010 by jm
a custom pastebin for spam messages. cool
spamalyser  spam  anti-spam  paste  pastebin  web  from delicious
february 2010 by jm
_Botnet Judo: Fighting Spam with Itself_
reverse-engineering the output of spam templates. paper isn't published yet, but sounds very interesting, particularly since it overlaps with the SpamAssassin SOUGHT ruleset's methodology, a little, it sounds like. looking forward to reading it
spam  anti-spam  botnets  templates  toread  papers  from delicious
january 2010 by jm
MAAWG notes drop in spam levels
'MAAWG [..] says that spam and malicious emails dropped to 89 percent in the second quarter from 90.4 percent in the first quarter of 2009.'
spam  anti-spam  maawg  press-releases  isps  internet  abuse  from delicious
october 2009 by jm
IAMA person who sends "spam" email for a living
Reddit mass-interview of a spammer. apparently he's working on IPv6 support
reddit  spam  anti-spam  interviews  ipv6  iama  spammers  from delicious
october 2009 by jm
Anti Spear-phishing SpamAssassin ruleset
from Julian "MailScanner" Field (via the SA users list)
spamassassin  anti-spam  rulesets  sa-update  phishing  blocklists 
august 2009 by jm

related tags

3.3.0  abuse  accuracy  adversarial-classification  ai  anti-spam  aol  apache  architecture  arms-race  asf  authentication  bbc  beaver  bernie  best-practices  big-data  blocklists  boing-boing  botmaker  botnets  brian-krebs  can-spam  canada  canadia  cancer  cascading  ceas  cep  cheap  classification  classifiers  conference  convictions  cory-doctorow  data  data-protection  david-simon  dbl  death  direct-marketing  distcomp  distributed  dns  dnsbl  dnsbls  dnswl  doh  dpc  dynamic-dolphin  ec  edri  email  encryption  ensemble  eu  eu-council  events  evolution  false-positives  farce  fbl  feature-selection  features  feedback  fiction  filtering  fines  funny  gchq  gmail  goodharts-law  google  hadoop  hashing  history  iama  ico  ietf  im  internet  interviews  ipv6  ireland  isps  jam-software  jd-falk  kdd  law  layoffs  leaks  low-latency  maawg  machine-learning  maciej  mail  mailchannels  map-reduce  marketing  n-grams  next  nsa  open-source  ops  oss  papers  paste  pastebin  people  phishing  politics  postmaster  press-releases  prism  privacy  prosecutions  python  rabbitmq  random-forest  realtime  reddit  redis  registrars  regulation  releases  research  rfcs  rules  rulesets  rustock  sa-update  scaling  scifi  scott-richter  searching  seattle  security  sentient-jets  short-stories  si336  sigint  similarity  slides  sms  smtp  snooping  snowden  software  spam  spamalyser  spamarrest  spamassassin  spamhaus  spammers  spoofing  spying  ssl  standards  storm  surveillance  templates  tesco  tls  toread  training  twitter  uk  us  via:bnisbet  via:codeslinger  via:fanf  via:gcluley  via:jd  via:lee-maguire  web  whitelisting  whitelists  whoa  wiki  win32  windows  yahoo 

Copy this bookmark: