jm + classifiers   3

When DNNs go wrong – adversarial examples and what we can learn from them
Excellent paper.
[The] results suggest that classifiers based on modern machine learning techniques, even those that obtain excellent performance on the test set, are not learning the true underlying concepts that determine the correct output label. Instead, these algorithms have built a Potemkin village that works well on naturally occuring data, but is exposed as a fake when one visits points in space that do not have high probability in the data distribution.
ai  deep-learning  dnns  neural-networks  adversarial-classification  classification  classifiers  machine-learning  papers 
23 days ago by jm
Clairvoyant Squirrel: Large Scale Malicious Domain Classification
Storm-based service to detect malicious DNS domain usage from streaming pcap data in near-real-time. Uses string features in the DNS domain, along with randomness metrics using Markov analysis, combined with a Random Forest classifier, to achieve 98% precision at 10,000 matches/sec
storm  distributed  distcomp  random-forest  classifiers  machine-learning  anti-spam  slides 
february 2013 by jm
Practical machine learning tricks from the KDD 2011 best industry paper
Wow, this is a fantastic paper. It's a Google paper on detecting scam/spam ads using machine learning -- but not just that, it's how to build out such a classifier to production scale, and make it operationally resilient, and, indeed, operable.

I've come across a few of these ideas before, and I'm happy to say I might have reinvented a few (particularly around the feature space), but all of them together make extremely good sense. If I wind up working on large-scale classification again, this is the first paper I'll go back to. Great info! (via Toby diPasquale.)
classification  via:codeslinger  training  machine-learning  google  ops  kdd  best-practices  anti-spam  classifiers  ensemble  map-reduce 
july 2012 by jm

Copy this bookmark: