jm + false-positives   36

Google Intrusion Detection Problems
'We have lost access to multiple critical data stores because Google has an automated threat detection system that is incapable of handling false positives.'
google  security  cloud  false-positives  intrusion-detection  automation  fail 
august 2016 by jm
MRI software bugs could upend years of research - The Register
In their paper at PNAS, they write: “the most common software packages for fMRI analysis (SPM, FSL, AFNI) can result in false-positive rates of up to 70%. These results question the validity of some 40,000 fMRI studies and may have a large impact on the interpretation of neuroimaging results.”

For example, a bug that's been sitting in a package called 3dClustSim for 15 years, fixed in May 2015, produced bad results (3dClustSim is part of the AFNI suite; the others are SPM and FSL). That's not a gentle nudge that some results might be overstated: it's more like making a bonfire of thousands of scientific papers.

Further: “Our results suggest that the principal cause of the invalid cluster inferences is spatial autocorrelation functions that do not follow the assumed Gaussian shape”.

The researchers used published fMRI results, and along the way they swipe the fMRI community for their “lamentable archiving and data-sharing practices” that prevent most of the discipline's body of work being re-analysed. ®
fmri  science  mri  statistics  cluster-inference  autocorrelation  data  papers  medicine  false-positives  fps  neuroimaging 
july 2016 by jm
The NSA’s SKYNET program may be killing thousands of innocent people
Death by Random Forest: this project is a horrible misapplication of machine learning. Truly appalling, when a false positive means death:

The NSA evaluates the SKYNET program using a subset of 100,000 randomly selected people (identified by their MSIDN/MSI pairs of their mobile phones), and a a known group of seven terrorists. The NSA then trained the learning algorithm by feeding it six of the terrorists and tasking SKYNET to find the seventh. This data provides the percentages for false positives in the slide above.

"First, there are very few 'known terrorists' to use to train and test the model," Ball said. "If they are using the same records to train the model as they are using to test the model, their assessment of the fit is completely bullshit. The usual practice is to hold some of the data out of the training process so that the test includes records the model has never seen before. Without this step, their classification fit assessment is ridiculously optimistic."

The reason is that the 100,000 citizens were selected at random, while the seven terrorists are from a known cluster. Under the random selection of a tiny subset of less than 0.1 percent of the total population, the density of the social graph of the citizens is massively reduced, while the "terrorist" cluster remains strongly interconnected. Scientifically-sound statistical analysis would have required the NSA to mix the terrorists into the population set before random selection of a subset—but this is not practical due to their tiny number.

This may sound like a mere academic problem, but, Ball said, is in fact highly damaging to the quality of the results, and thus ultimately to the accuracy of the classification and assassination of people as "terrorists." A quality evaluation is especially important in this case, as the random forest method is known to overfit its training sets, producing results that are overly optimistic. The NSA's analysis thus does not provide a good indicator of the quality of the method.
terrorism  surveillance  nsa  security  ai  machine-learning  random-forests  horror  false-positives  classification  statistics 
february 2016 by jm
Your Relative's DNA Could Turn You Into A Suspect
Familial DNA searching has massive false positives, but is being used to tag suspects:
The bewildered Usry soon learned that he was a suspect in the 1996 murder of an Idaho Falls teenager named Angie Dodge. Though a man had been convicted of that crime after giving an iffy confession, his DNA didn’t match what was found at the crime scene. Detectives had focused on Usry after running a familial DNA search, a technique that allows investigators to identify suspects who don’t have DNA in a law enforcement database but whose close relatives have had their genetic profiles cataloged. In Usry’s case the crime scene DNA bore numerous similarities to that of Usry’s father, who years earlier had donated a DNA sample to a genealogy project through his Mormon church in Mississippi. That project’s database was later purchased by Ancestry, which made it publicly searchable—a decision that didn’t take into account the possibility that cops might someday use it to hunt for genetic leads.

Usry, whose story was first reported in The New Orleans Advocate, was finally cleared after a nerve-racking 33-day wait — the DNA extracted from his cheek cells didn’t match that of Dodge’s killer, whom detectives still seek. But the fact that he fell under suspicion in the first place is the latest sign that it’s time to set ground rules for familial DNA searching, before misuse of the imperfect technology starts ruining lives.
dna  familial-dna  false-positives  law  crime  idaho  murder  mormon  genealogy  ancestry.com  databases  biometrics  privacy  genes 
october 2015 by jm
FBI admits flaws in hair analysis over decades
Wow, this is staggering.
The Justice Department and FBI have formally acknowledged that nearly every examiner in an elite FBI forensic unit gave flawed testimony in almost all trials in which they offered evidence against criminal defendants over more than a two-decade period before 2000. [....]

The review confirmed that FBI experts systematically testified to the near-certainty of “matches” of crime-scene hairs to defendants, backing their claims by citing incomplete or misleading statistics drawn from their case work. In reality, there is no accepted research on how often hair from different people may appear the same. Since 2000, the lab has used visual hair comparison to rule out someone as a possible source of hair or in combination with more accurate DNA testing. Warnings about the problem have been mounting. In 2002, the FBI reported that its own DNA testing found that examiners reported false hair matches more than 11 percent of the time.
fbi  false-positives  hair  dna  biometrics  trials  justice  experts  crime  forensics  inaccuracy  csi 
april 2015 by jm
"Cuckoo Filter: Practically Better Than Bloom"
'We propose a new data structure called the cuckoo filter that can replace Bloom filters for approximate set membership
tests. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than
Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have
lower space overhead than space-optimized Bloom filters. Our experimental results also show that cuckoo filters outperform previous data structures that extend Bloom filters to support deletions substantially in both time and space.'
algorithms  paper  bloom-filters  cuckoo-filters  cuckoo-hashing  data-structures  false-positives  big-data  probabilistic  hashing  set-membership  approximation 
march 2015 by jm
How to Catch a Terrorist - The New Yorker
This is spot on --
By flooding the system with false positives, big-data approaches to counterterrorism might actually make it harder to identify real terrorists before they act. Two years before the Boston Marathon bombing, Tamerlan Tsarnaev, the older of the two brothers alleged to have committed the attack, was assessed by the city’s Joint Terrorism Task Force. They determined that he was not a threat. This was one of about a thousand assessments that the Boston J.T.T.F. conducted that year, a number that had nearly doubled in the previous two years, according to the Boston F.B.I. As of 2013, the Justice Department has trained nearly three hundred thousand law-enforcement officers in how to file “suspicious-activity reports.” In 2010, a central database held about three thousand of these reports; by 2012 it had grown to almost twenty-eight thousand. “The bigger haystack makes it harder to find the needle,” Sensenbrenner told me. Thomas Drake, a former N.S.A. executive and whistle-blower who has become one of the agency’s most vocal critics, told me, “If you target everything, there’s no target.”
terrorism  false-positives  filtering  detection  jttf  nsa  fbi  surveillance  gchq 
january 2015 by jm
Schneier on Security: Why Data Mining Won't Stop Terror
A good reference URL to cut-and-paste when "scanning internet traffic for terrorist plots" rears its head:
This unrealistically accurate system will generate 1 billion false alarms for every real terrorist plot it uncovers. Every day of every year, the police will have to investigate 27 million potential plots in order to find the one real terrorist plot per month. Raise that false-positive accuracy to an absurd 99.9999 percent and you're still chasing 2,750 false alarms per day -- but that will inevitably raise your false negatives, and you're going to miss some of those 10 real plots.


Also, Ben Goldacre saying the same thing: http://www.badscience.net/2009/02/datamining-would-be-lovely-if-it-worked/
internet  scanning  filtering  specificity  statistics  data-mining  terrorism  law  nsa  gchq  false-positives  false-negatives 
january 2015 by jm
Applying cardiac alarm management techniques to your on-call
An ops-focused take on a recent story about alarm fatigue, and how a Boston hospital dealt with it. When I was in Amazon, many of the teams in our division had a target to reduce false positive pages, with a definite monetary value attached to it, since many teams had "time off in lieu" payments for out-of-hours pages to the on-call staff. As a result, reducing false-positive pages was reasonably high priority and we dealt with this problem very proactively, with a well-developed sense of how to do so. It's interesting to see how the outside world is only just starting to look into its amelioration. (Another benefit of a TOIL policy ;)
ops  monitoring  sysadmin  alerts  alarms  nagios  alarm-fatigue  false-positives  pages 
september 2014 by jm
3 Rules of thumb for Bloom Filters
I often need to do rough back-of-the-envelope reasoning about things, and I find that doing a bit of work to develop an intuition for how a new technique performs is usually worthwhile. So, here are three broad rules of thumb to remember when discussing Bloom filters down the pub:

One byte per item in the input set gives about a 2% false positive rate.

The optimal number of hash functions is about 0.7 times the number of bits per item.

3 - The number of hashes dominates performance.

But see also http://stackoverflow.com/a/9554448 , http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/esa06.pdf (thanks Tony Finch!)
bloom-filters  algorithm  probabilistic  rules  reasoning  via:norman-maurer  false-positives  hashing  coding 
august 2014 by jm
Punished for Being Poor: Big Data in the Justice System
This is awful. Totally the wrong tool for the job -- a false positive rate which is miniscule for something like spam filtering, could translate to a really horrible outcome for a human life.
Currently, over 20 states use data-crunching risk-assessment programs for sentencing decisions, usually consisting of proprietary software whose exact methods are unknown, to determine which individuals are most likely to re-offend. The Senate and House are also considering similar tools for federal sentencing. These data programs look at a variety of factors, many of them relatively static, like criminal and employment history, age, gender, education, finances, family background, and residence. Indiana, for example, uses the LSI-R, the legality of which was upheld by the state’s supreme court in 2010. Other states use a model called COMPAS, which uses many of the same variables as LSI-R and even includes high school grades. Others are currently considering the practice as a way to reduce the number of inmates and ensure public safety. (Many more states use or endorse similar assessments when sentencing sex offenders, and the programs have been used in parole hearings for years.) Even the American Law Institute has embraced the practice, adding it to the Model Penal Code, attesting to the tool’s legitimacy.



(via stroan)
via:stroan  statistics  false-positives  big-data  law  law-enforcement  penal-code  risk  sentencing 
august 2014 by jm
NSA: Linux Journal is an "extremist forum" and its readers get flagged for extra surveillance
DasErste.de has published the relevant XKEYSCORE source code, and if you look closely at the rule definitions, you will see linuxjournal.com/content/linux* listed alongside Tails and Tor. According to an article on DasErste.de, the NSA considers Linux Journal an "extremist forum". This means that merely looking for any Linux content on Linux Journal, not just content about anonymizing software or encryption, is considered suspicious and means your Internet traffic may be stored indefinitely.


This is, sadly, entirely predictable -- that's what happens when you optimize the system for over-sampling, with poor oversight.
false-positives  linuxjournal  linux  terrorism  tor  tails  nsa  surveillance  snooping  xkeyscore  selectors  oversight 
july 2014 by jm
Microsoft Security Essentials reporting false positives on the Bitcoin blockchain
Earlier today, a virus signature from the virus "DOS/STONED" was uploaded into the Bitcoin blockchain, which allows small snippets of text to accompany user transactions with bitcoin.  Since this is only the virus signature and not the virus itself, there apparently is no danger to users in any way.  However, MSE recognizes the signature for the virus and continuously reports it as a threat, and every time it deletes the file, the bitcoin client will simply re-download the missing blockchain.


What a heinous prank! Hilarity ensues (via gwire)
via:gwire  av  antivirus  false-positives  fp  blockchain  microsoft  bitcoin  pranks  viruses 
may 2014 by jm
How the NSA Plans to Infect 'Millions' of Computers with Malware - The Intercept
The implants being deployed were once reserved for a few hundred hard-to-reach targets, whose communications could not be monitored through traditional wiretaps. But the documents analyzed by The Intercept show how the NSA has aggressively accelerated its hacking initiatives in the past decade by computerizing some processes previously handled by humans. The automated system – codenamed TURBINE – is designed to “allow the current implant network to scale to large size (millions of implants) by creating a system that does automated control implants by groups instead of individually.” In a top-secret presentation, dated August 2009, the NSA describes a pre-programmed part of the covert infrastructure called the “Expert System,” which is designed to operate “like the brain.”


Great. Automated malware deployment to millions of random victims. See also the "I hunt sysadmins" section further down...
malware  gchq  nsa  oversight  infection  expert-systems  turbine  false-positives  the-intercept  surveillance 
march 2014 by jm
"A reason to hang him": how mass surveillance, secret courts, confirmation bias and the FBI can ruin your life - Boing Boing
This is bananas. Confirmation bias running amok.
Brandon Mayfield was a US Army veteran and an attorney in Portland, OR. After the 2004 Madrid train bombing, his fingerprint was partially matched to one belonging to one of the suspected bombers, but the match was a poor one. But by this point, the FBI was already convinced they had their man, so they rationalized away the non-matching elements of the print, and set in motion a train of events that led to Mayfield being jailed without charge; his home and office burgled by the FBI; his client-attorney privilege violated; his life upended.
confirmation-bias  bias  law  brandon-mayfield  terrorism  fingerprints  false-positives  fbi  scary 
february 2014 by jm
Death by Metadata
The side-effects of algorithmic false-positives get worse and worse.
What’s more, he adds, the NSA often locates drone targets by analyzing the activity of a SIM card, rather than the actual content of the calls. Based on his experience, he has come to believe that the drone program amounts to little more than death by unreliable metadata. “People get hung up that there’s a targeted list of people,” he says. “It’s really like we’re targeting a cell phone. We’re not going after people – we’re going after their phones, in the hopes that the person on the other end of that missile is the bad guy.”
false-positives  glenn-greenwald  drones  nsa  death-by-metadata  us-politics  terrorism  sim-cards  phones  mobile-phones 
february 2014 by jm
Sky parental controls break many JQuery-using websites
An 11 hour outage caused by a false positive in Sky's anti-phishing filter; all sites using the code.jquery.com CDN for JQuery would have seen errors.
Sky still appears to be blocking code.jquery.com and all files served via the site, and more worryingly is that if you try to report the incorrect category, once signing in on the Sky website you an error page. We suspect the site was blocked due to being linked to by a properly malicious website, i.e. code.jquery.com and some javascript files were being used on a dodgy website and every domain mentioned was subsequently added to a block list.


(via Tony Finch)
via:fanf  sky  filtering  internet  uk  anti-phishing  phish  jquery  javascript  http  web  fps  false-positives 
january 2014 by jm
UK porn filter blocks game update that contained 'sex' in URL
Staggeringly inept. The UK national porn filter blocks based on a regexp match of the URL against /.*sex.*/i -- the good old "Scunthorpe problem". Better, it returns a 404 response. This is also a good demonstration of how web filtering has unintended side effects, breaking third-party software updates with its false positives.
The update to online strategy game League of Legends was disrupted by the internet filter because the software attempted to access files that accidentally include the word “sex” in the middle of their file names. The block resulted in the update failing with “file not found” errors, which are usually created by missing files or broken updates on the part of the developers.
uk  porn  filtering  guardian  regular-expressions  false-positives  scunthorpe  http  web  league-of-legends  sex 
january 2014 by jm
Nominet now filtering .uk domain registrations for 'sex-crime content'
Amazing. Massive nanny-stateism of the 'something must be done' variety, with a 100% false-alarm hit rate, and it's now policy.
'Nominet have made a decision, based on a report by Lord Macdonald QC, that recommends that they check any domain registration that signals sex crime content or is in itself a sex crime. This is screening of domains within 48 hours of registration, and de-registration. The report says that such domains should be reported to the police.' [....]

'The report itself states [...] that in 2013 Nominet checked domains for key words used by the IWF, and as a result reported tens of thousands of domains to IWF for checking, all of which were false positives. Not one was, in fact, related to child sex abuse.'
filtering  nominet  false-positives  nanny-state  uk  sex-crimes  false-alarms  domains  iwf 
january 2014 by jm
MP Claire Perry tells UK that worrying about filter overblocking is a "load of cock"
the bottom line appears to be "think of the children" -- in other words, any degree of overblocking is acceptable as long as children cannot access porn:

The debate and letter confuse legal, illegal and potentially harmful content, all of which require very different tactics to deal with. Without a greater commitment to evidence and rational debate, poor policy outcomes will be the likely result. There's a pattern, much the same as the Digital Economy Act, or the Snooper's Charter. Start with moral panic; dismiss evidence; legislate; and finally, watch the policy unravel, either delivering unintended harms, even to children in this case, or simply failing altogether.


See https://www.openrightsgroup.org/blog/2013/talktalk-wordpress for a well-written exploration of a case of overblocking and its fallout. Talk Talk, one UK ISP, has filters which incorrectly dealt with IWF data and blocked WordPress.com's admin interface, resulting in all blogs there become unusable for their owners for over a week, with seemingly nobody able to diagnose and fix the problem competently.
filtering  overblocking  uk  politics  think-of-the-children  porn  cam  claire-perry  open-rights-group  false-positives  talk-talk  networking  internet  wordpress 
december 2013 by jm
_An Improved Construction For Counting Bloom Filters_
'A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-based alternative based on d-left hashing called a d-left CBF (dlCBF). The dlCBF offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more. We describe the construction of dlCBFs, provide an analysis, and demonstrate their effectiveness experimentally'
bloom-filter  data-structures  algorithms  counting  cbf  storage  false-positives  d-left-hashing  hashing 
september 2013 by jm
Massive Overblocking Hits Hundreds Of UK Sites | Techdirt
Customers of UK ISPs Virgin Media and Be Broadband found they were unable to access hundreds of sites, including the Radio Times and Zooniverse, due to a secret website-blocking court order from the Premier League. PC Pro believe that 3 other ISPs' customers were also affected.

According to customers reverse-engineering, it looks like the court order incorrectly demanded the blocking of "http-redirection-a.dnsmadeeasy.com", a HTTP redirector operated by the DNS operator DNSMadeEasy.
The fact that the court could issue an order which didn’t see this coming and that the ISPs would act on it without checking that what they were doing was sensible is, in my opinion, extremely worrying.
overblocking  censorship  org  uk  sky  be-broadband  virgin-media  dnsmadeeasy  filtering  premier-league  false-positives  isps 
august 2013 by jm
Persuading David Simon (Pinboard Blog)
Maciej Ceglowski with a strongly-argued rebuttal of David Simon's post about the NSA's PRISM. This point in particular is key:
The point is, you don't need human investigators to find leads, you can have the algorithms do it [based on the call graph or network of who-calls-who]. They will find people of interest, assemble the watch lists, and flag whomever you like for further tracking. And since the number of actual terrorists is very, very, very small, the output of these algorithms will consist overwhelmingly of false positives.
false-positives  maciej  privacy  security  nsa  prism  david-simon  accuracy  big-data  filtering  anti-spam 
june 2013 by jm
Introducing Kale « Code as Craft
Etsy have implemented a tool to perform auto-correlation of service metrics, and detection of deviation from historic norms:
at Etsy, we really love to make graphs. We graph everything! Anywhere we can slap a StatsD call, we do. As a result, we’ve found ourselves with over a quarter million distinct metrics. That’s far too many graphs for a team of 150 engineers to watch all day long! And even if you group metrics into dashboards, that’s still an awful lot of dashboards if you want complete coverage. Of course, if a graph isn’t being watched, it might misbehave and no one would know about it. And even if someone caught it, lots of other graphs might be misbehaving in similar ways, and chances are low that folks would make the connection.

We’d like to introduce you to the Kale stack, which is our attempt to fix both of these problems. It consists of two parts: Skyline and Oculus. We first use Skyline to detect anomalous metrics. Then, we search for that metric in Oculus, to see if any other metrics look similar. At that point, we can make an informed diagnosis and hopefully fix the problem.


It'll be interesting to see if they can get this working well. I've found it can be tricky to get working with low false positives, without massive volume to "smooth out" spikes caused by normal activity. Amazon had one particularly successful version driving severity-1 order drop alarms, but it used massive event volumes and still had periodic false positives. Skyline looks like it will alarm on a single anomalous data point, and in the comments Abe notes "our algorithms err on the side of noise and so alerting would be very noisy."
etsy  monitoring  service-metrics  alarming  deviation  correlation  data  search  graphs  oculus  skyline  kale  false-positives 
june 2013 by jm
Interpol filter scope creep: ASIC ordering unilateral website blocks
Bloody hell. This is stupidity of the highest order, and a canonical example of "filter creep" by a government -- secret state censorship of 1200 websites due to a single investment scam site.

The Federal Government has confirmed its financial regulator has started requiring Australian Internet service providers to block websites suspected of providing fraudulent financial opportunities, in a move which appears to also open the door for other government agencies to unilaterally block sites they deem questionable in their own portfolios.

The instrument through which the ISPs are blocking the Interpol list of sites is Section 313 of the Telecommunications Act. Under the Act, the Australian Federal Police is allowed to issue notices to telcos asking for reasonable assistance in upholding the law. [...] Tonight Senator Conroy’s office revealed that the incident that resulted in Melbourne Free University and more than a thousand other sites being blocked originated from a different source — financial regulator the Australian Securities and Investment Commission.

On 22 March this year, ASIC issued a media release warning consumers about the activities of a cold-calling investment scam using the name ‘Global Capital Wealth’, which ASIC said was operating several fraudulent websites — www.globalcapitalwealth.com and www.globalcapitalaustralia.com. In its release on that date, ASIC stated: “ASIC has already blocked access to these websites.”
scams  australia  filtering  filter-creep  false-positives  isps  asic  fraud  secrecy 
may 2013 by jm
paperplanes. Monitoring for Humans
A good contemplation of the state of ops monitoring, post-#monitorama. At one point, he contemplates the concept of automated anomaly detection:
This leads to another interesting question: if I need to create activity to measure it, and if my monitoring system requires me to generate this activity to be able to put a graph and an alert on it, isn't my monitoring system wrong? Are all the monitoring systems wrong? [...]

We spend an eternity looking at graphs, right after an alert was triggered because a certain threshold was crossed. Does that alert even mean anything, is it important right now? It's where a human operator still has to decide if it's worth the trouble or if they should just ignore the alert. As much as I enjoy staring at graphs, I'd much rather do something more important than that.

I'd love for my monitoring system to be able to tell me that something out of the ordinary is currently happening. It has all the information at hand to make that decision at least with a reasonable probability.


I like the concept of Holt-Winters-style forecasting and confidence bands etc., but my experience is that the reality is that anomalies often aren't sufficiently bad news -- ie. when an anomalous event occurs, it may not indicate an outage. Anomaly detection is hard to turn into a reliable alarm. Having said that, I have seen it done (and indeed our team has done it!) where there is sufficiently massive volume to smooth out the "normal" anomalies, and leave real signs of impact.

Still, this is something that Baron Schwartz (ex-Percona) has been talking about too, so there are some pretty smart people thinking about it and it has a bright future.
monitoring  networks  holt-winters  forecasting  confidence-bands  anomaly-detection  ops  monitorama  baron-schwartz  false-positives 
march 2013 by jm
NASA's Mars Rover Crashed Into a DMCA Takedown
An hour or so after Curiosity’s 1.31 a.m. EST landing in Gale Crater, I noticed that the space agency’s main YouTube channel had posted a 13-minute excerpt of the stream. Its title was in an uncharacteristic but completely justified all caps: “NASA LANDS CAR-SIZE ROVER BESIDE MARTIAN MOUNTAIN.”

When I returned to the page ten minutes later, [...] the video was gone, replaced with an alien message: “This video contains content from Scripps Local News, who has blocked it on copyright grounds. Sorry about that.” That is to say, a NASA-made public domain video posted on NASA’s official YouTube channel, documenting the landing of a $2.5 billion Mars rover mission paid for with public taxpayer money, was blocked by YouTube because of a copyright claim by a private news service.
dmca  google  fail  nasa  copyright  false-positives  scripps  youtube  video  mars 
august 2012 by jm
Dutch grepping Facebook for welfare fraud
'The [Dutch] councils are working with a specialist Amsterdam research firm, using the type of computer software previously deployed only in counterterrorism, monitoring [LinkedIn, Facebook and Twitter] traffic for keywords and cross-referencing any suspicious information with digital lists of social welfare recipients.

Among the giveaway terms, apparently, are “holiday” and “new car”. If the automated software finds a match between one of these terms and a person claiming social welfare payments, the information is passed on to investigators to gather real-life evidence.' With a 30% false positive rate, apparently -- let's hope those investigations aren't too intrusive!
grep  dutch  holland  via:tjmcintyre  privacy  facebook  twitter  linkedin  welfare  dole  fraud  false-positives  searching 
september 2011 by jm
Online censorship now bordering on the ridiculous in Turkey - Reporters Without Borders
'access to websites containing words on the list would in theory be suspended and it would be impossible to create new ones containing them. However, it is not clear how and to what extent the directive will be implemented in practice. The TIB could decide to suppress or block pages for just one blacklisted word. ... The list, which borders on the ridiculous, includes words such as “etek” (skirt), “baldiz” (sister-in-law) and “hayvan” (animals). It poses serious problems for access to online information. If words such as “free” and “pic” are censored, countless references to freedom and everyday photos will be eliminated from the Turkish Internet.' Incredible (via Danny)
via:mala  repression  internet  turkey  censorship  filtering  false-positives 
april 2011 by jm
Virgin and NTL filtering fail
'Virgin and NTL [in the UK] blocked [del.icio.us] for years' due to a false positive -- joshua
del.icio.us  false-positives  filtering  uk  isps  virgin  ntl  fail  via:hackernews  from delicious
april 2011 by jm
Lucene Utilities and Bloom Filters - Greplin:tech
'Storing 50,000 2.5KB items in a traditional hash set requires over 125MB, but if you're willing to accept a 1-in-10,000 false positive rate on lookups, [this] bloom filter requires under 500KB' - interesting variation on the basic concept.  Java, Apache-licensed
search  bloom-filters  greplin  open-source  apache  false-positives  from delicious
april 2011 by jm
U.S. Government Shuts Down 84,000 Websites, ‘By Mistake’ | TorrentFreak
DHS/ICE domain seizures suffer a serious false positive problem, resulting in the seizure and shutting down of 84,000 subdomains of a free DNS provider, replacing them with a banner accusing the site of trafficking in child porn. whoops!
dhs  ice  censorship  internet  domains  dns  seizure  false-positives  child-porn  from delicious
february 2011 by jm
Tony Finch - Some notes on Bloom filters
more good Bloom Filter tips. he says: 'I take a slightly different tack, starting with a target population in mind which determines the size of the filter. Also there's a minor error regarding performance in the corte.si post. You only need to calculate two hash functions, and use a linear combination of them to index the Bloom filter. This simplifies the coding a lot, and if hash calculation dominates filter indexing, it's also a lot faster.'
bloom-filters  tips  coding  via:fanf  false-positives  from delicious
november 2010 by jm
BBC News - How spam filters dictated Canadian magazine's fate
the Canadian mag "The Beaver" is changing its name due to broken filters' false positives. Bennett Haselton reckons that there's no incentive to fix FPs, which as Henry Stern notes isn't the case
anti-spam  false-positives  beaver  canadia  canada  bbc  from delicious
march 2010 by jm
O2 Ireland blocking sites listed in the UK IWF list
supposedly should only list child porn sites, but sounds like it's got frequent false positives on file upload/download services nowadays
fps  o2  blocking  ireland  contract  false-positives  iwf  uk  law  from delicious
october 2009 by jm

related tags

accuracy  ai  alarm-fatigue  alarming  alarms  alerts  algorithm  algorithms  ancestry.com  anomaly-detection  anti-phishing  anti-spam  antivirus  apache  approximation  asic  australia  autocorrelation  automation  av  baron-schwartz  bbc  be-broadband  beaver  bias  big-data  biometrics  bitcoin  blockchain  blocking  bloom-filter  bloom-filters  brandon-mayfield  cam  canada  canadia  cbf  censorship  child-porn  claire-perry  classification  cloud  cluster-inference  coding  confidence-bands  confirmation-bias  contract  copyright  correlation  counting  crime  csi  cuckoo-filters  cuckoo-hashing  d-left-hashing  data  data-mining  data-structures  databases  david-simon  death-by-metadata  del.icio.us  detection  deviation  dhs  dmca  dna  dns  dnsmadeeasy  dole  domains  drones  dutch  etsy  expert-systems  experts  facebook  fail  false-alarms  false-negatives  false-positives  familial-dna  fbi  filter-creep  filtering  fingerprints  fmri  forecasting  forensics  fp  fps  fraud  gchq  genealogy  genes  glenn-greenwald  google  graphs  grep  greplin  guardian  hair  hashing  holland  holt-winters  horror  http  ice  idaho  inaccuracy  infection  internet  intrusion-detection  ireland  isps  iwf  javascript  jquery  jttf  justice  kale  law  law-enforcement  league-of-legends  linkedin  linux  linuxjournal  machine-learning  maciej  malware  mars  medicine  microsoft  mobile-phones  monitorama  monitoring  mormon  mri  murder  nagios  nanny-state  nasa  networking  networks  neuroimaging  nominet  nsa  ntl  o2  oculus  open-rights-group  open-source  ops  org  overblocking  oversight  pages  paper  papers  penal-code  phish  phones  politics  porn  pranks  premier-league  prism  privacy  probabilistic  random-forests  reasoning  regular-expressions  repression  risk  rules  scams  scanning  scary  science  scripps  scunthorpe  search  searching  secrecy  security  seizure  selectors  sentencing  service-metrics  set-membership  sex  sex-crimes  sim-cards  sky  skyline  snooping  specificity  statistics  storage  surveillance  sysadmin  tails  talk-talk  terrorism  the-intercept  think-of-the-children  tips  tor  trials  turbine  turkey  twitter  uk  us-politics  via:fanf  via:gwire  via:hackernews  via:jzawodny  via:mala  via:norman-maurer  via:stroan  via:tjmcintyre  video  virgin  virgin-media  viruses  web  welfare  wordpress  xkeyscore  youtube 

Copy this bookmark:



description:


tags: