jm + data-mining   4

Top 10 data mining algorithms in plain English
This is a phenomenally useful ML/data-mining resource post -- 'the top 10 most influential data mining algorithms as voted on by 3 separate panels in [ICDM '06's] survey paper', but with a nice clear intro and description for each one. Here's the algorithms covered:
1. C4.5
2. k-means
3. Support vector machines
4. Apriori
5. EM
6. PageRank
7. AdaBoost
8. kNN
9. Naive Bayes
10. CART
svm  k-means  c4.5  apriori  em  pagerank  adaboost  knn  naive-bayes  cart  ml  data-mining  machine-learning  papers  algorithms  unsupervised  supervised 
may 2015 by jm
Schneier on Security: Why Data Mining Won't Stop Terror
A good reference URL to cut-and-paste when "scanning internet traffic for terrorist plots" rears its head:
This unrealistically accurate system will generate 1 billion false alarms for every real terrorist plot it uncovers. Every day of every year, the police will have to investigate 27 million potential plots in order to find the one real terrorist plot per month. Raise that false-positive accuracy to an absurd 99.9999 percent and you're still chasing 2,750 false alarms per day -- but that will inevitably raise your false negatives, and you're going to miss some of those 10 real plots.


Also, Ben Goldacre saying the same thing: http://www.badscience.net/2009/02/datamining-would-be-lovely-if-it-worked/
internet  scanning  filtering  specificity  statistics  data-mining  terrorism  law  nsa  gchq  false-positives  false-negatives 
january 2015 by jm
Former NSA Boss: We Don't Data Mine Our Giant Data Collection, We Just Ask It Questions
'Well, that's - no, we're going to use it. But we're not going to use it in the way that some people fear. You put these records, you store them, you have them. It's kind of like, I've got the haystack now. And now let's try to find the needle. And you find the needle by asking that data a question. I'm sorry to put it that way, but that's fundamentally what happens. All right. You don't troll through the data looking for patterns or anything like that. The data is set aside. And now I go into that data with a question that - a question that is based on articulable(ph), arguable, predicate to a terrorist nexus.'


Yep, that's data mining.
data-mining  questions  haystack  needle  nsa  usa  politics  privacy  data-protection  michael-hayden 
june 2013 by jm
Graylog2
'Free open source self-hosted log management and exception tracking', loggly-style.  Basically, a nifty web data-mining UI on your syslogs (via adulau)
logging  syslog  sysadmin  mongodb  opensource  via:adulau  logs  web  ui  data-mining  from delicious
january 2011 by jm

Copy this bookmark:



description:


tags: