jm + via:fanf   81

Undercover operation 'Close Pass' reduced cyclist injuries by 20% in a year

An initiative to protect cyclists from dangerous overtaking has been praised, after reducing the amount of cyclists killed or seriously injured on the roads by 20% over the last year.
Operation 'Close Pass' was devised by West Midlands Police as a low cost way of preventing accidents caused by motorists who are driving too close for comfort.


(Via Tony Finch)
cycling  via:fanf  safety  overtaking  roads  bikes 
7 days ago by jm
Hyperscan
a high-performance multiple regex matching library. It follows the regular expression syntax of the commonly-used libpcre library, yet functions as a standalone library with its own API written in C. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions, as well as matching of regular expressions across streams of data. Hyperscan is typically used in a DPI library stack.

Hyperscan began in 2008, and evolved from a commercial closed-source product 2009-2015. First developed at Sensory Networks Incorporated, and later acquired and released as open source software by Intel in October 2015. 

Hyperscan is under a 3-clause BSD license. We welcome outside contributors.


This is really impressive -- state of the art in parallel regexp matching has improved quite a lot since I was last looking at it.

(via Tony Finch)
via:fanf  regexps  regular-expressions  text  matching  pattern-matching  intel  open-source  bsd  c  dpi  scanning  sensory-networks 
6 weeks ago by jm
terrible review for Solidity as a programming environment in HN
"Solidity/EVM is by far the worst programming environment I have ever encountered. It would be impossible to write even toy programs correctly in this language, yet it is literally called "Solidity" and used to program a financial system that manages hundreds of millions of dollars."


Via Tony Finch
blockchain  ethereum  programming  coding  via:fanf  funny  fail  floating-point  money  json  languages  bugs  reliability 
9 weeks ago by jm
Chris's Wiki :: blog/sysadmin/UnderstandingIODNSIssue
On the ns-a1.io security screwup for the .io CCTLD:
Using data from glue records instead of looking things up yourself is common but not mandatory, and there are various reasons why a resolver would not do so. Some recursive DNS servers will deliberately try to check glue record information as a security measure; for example, Unbound has the harden-referral-path option (via Tony Finch). Since the original article reported seeing real .io DNS queries being directed to Bryant's DNS server, we know that a decent number of clients were not using the root zone glue records. Probably a lot more clients were still using the glue records, through.


(via Tony Finch)
via:fanf  dns  security  dot-io  cctlds  glue-records  delegation 
10 weeks ago by jm
WHAT WENT WRONG IN BRITISH AIRWAYS DATACENTER IN MAY 2017?
A SPOF UPS. There was a similar AZ-wide outage in one of the Amazon DUB datacenters with a similar root cause, if I recall correctly -- supposedly redundant dual UPS systems were in fact interdependent, in that case, and power supply switchover wasn't clean enough to avoid affecting the servers.
Minutes later power was restored was resumed in what one source described as “uncontrolled fashion.” Instead of gradual restore, all power was restored at once resulting in a power surge.   BA CEO Cruz told BBC Radio this power surge  caused network hardware to fail. Also server hardware was damaged because of the power surge.

It seems as if the UPS was the single point of failure for power feed of the IT equipment in Boadicea House . The Times is reporting that the same UPS was powering both Heathrow based datacenters. Which could be a double single point of failure if true (I doubt it is)

The broken network  stopped the exchange of messages between different BA systems and application. Without messaging, there is no exchange of information between various applications. BA is using Progress Software’s Sonic [enterprise service bus].


(via Tony Finch)
postmortems  ba  airlines  outages  fail  via:fanf  datacenters  ups  power  progress  esb  j2ee 
may 2017 by jm
Spammergate: The Fall of an Empire
Featuring this interesting reactive-block evasion tactic:
In that screenshot, a RCM co-conspirator describes a technique in which the spammer seeks to open as many connections as possible between themselves and a Gmail server. This is done by purposefully configuring your own machine to send response packets extremely slowly, and in a fragmented manner, while constantly requesting more connections.
Then, when the Gmail server is almost ready to give up and drop all connections, the spammer suddenly sends as many emails as possible through the pile of connection tunnels. The receiving side is then overwhelmed with data and will quickly block the sender, but not before processing a large load of emails.


(via Tony Finch)
via:fanf  spam  antispam  gmail  blocklists  packets  tcp  networking 
march 2017 by jm
Falsehoods Programmers Believe About CSVs
Much of my professional work for the last 10+ years has revolved around handing, importing and exporting CSV files. CSV files are frustratingly misunderstood, abused, and most of all underspecified. While RFC4180 exists, it is far from definitive and goes largely ignored.

Partially as a companion piece to my recent post about how CSV is an encoding nightmare, and partially an expression of frustration, I've decided to make a list of falsehoods programmers believe about CSVs. I recommend my previous post for a more in-depth coverage on the pains of CSVs encodings and how the default tooling (Excel) will ruin your day.


(via Tony Finch)
via:fanf  csv  excel  programming  coding  apis  data  encoding  transfer  falsehoods  fail  rfc4180 
january 2017 by jm
Jeff Erickson's Algorithms, Etc.
This page contains lecture notes and other course materials for various algorithms classes I have taught at the University of Illinois, Urbana-Champaign. The notes are numbered in the order I cover the material in a typical undergraduate class, wtih notes on more advanced material (indicated by the symbol ♥) interspersed appropriately. [...] In addition to the algorithms notes I have been maintaining since 1999, this page also contains new notes on "Models of Computation", which cover a small subset of the material normally taught in undergraduate courses in formal languages and automata. I wrote these notes for a new junior-level course on "Algorithms and Models of Computation" that Lenny Pitt and I developed, which is now required for all undergraduate computer science and computer engineering majors at UIUC.


Via Tony Finch
via:fanf  book  cs  algorithms  jeff-erickson  uiuc 
november 2016 by jm
[Cryptography] Bridge hand record generator cracked
'How to cheat at Bridge by breaking the tournament card-dealing random number generator', via Tony Finch
crypto  security  rngs  prngs  random  bridge  cards  via:fanf 
september 2016 by jm
Suspension Losses Confirmed
high bike tire pressures are not faster, counterintuitively. I never knew! (via Tony Finch)
cycling  research  bicycles  via:fanf 
june 2016 by jm
Dan Luu reviews the Site Reliability Engineering book
voluminous! still looks great, looking forward to reading our copy (via Tony Finch)
via:fanf  books  reading  devops  ops  google  sre  dan-luu 
april 2016 by jm
RFC 7754 - Technical Considerations for Internet Service Blocking and Filtering
The Internet is structured to be an open communications medium. This
openness is one of the key underpinnings of Internet innovation, but
it can also allow communications that may be viewed as undesirable by
certain parties. Thus, as the Internet has grown, so have mechanisms
to limit the extent and impact of abusive or objectionable
communications. Recently, there has been an increasing emphasis on
"blocking" and "filtering", the active prevention of such
communications. This document examines several technical approaches
to Internet blocking and filtering in terms of their alignment with
the overall Internet architecture. When it is possible to do so, the
approach to blocking and filtering that is most coherent with the
Internet architecture is to inform endpoints about potentially
undesirable services, so that the communicants can avoid engaging in
abusive or objectionable communications. We observe that certain
filtering and blocking approaches can cause unintended consequences
to third parties, and we discuss the limits of efficacy of various
approaches.


(via Tony Finch)
via:fanf  blocking  censorship  filtering  internet  rfcs  rfc  isps 
march 2016 by jm
Excellent post from Matthew Green on the Juniper backdoor
For the past several years, it appears that Juniper NetScreen devices have incorporated a potentially backdoored random number generator, based on the NSA's Dual_EC_DRBG algorithm. At some point in 2012, the NetScreen code was further subverted by some unknown party, so that the very same backdoor could be used to eavesdrop on NetScreen connections. While this alteration was not authorized by Juniper, it's important to note that the attacker made no major code changes to the encryption mechanism -- they only changed parameters. This means that the systems were potentially vulnerable to other parties, even beforehand. Worse, the nature of this vulnerability is particularly insidious and generally messed up.

[....] The end result was a period in which someone -- maybe a foreign government -- was able to decrypt Juniper traffic in the U.S. and around the world. And all because Juniper had already paved the road.

One of the most serious concerns we raise during [anti-law-enforcement-backdoor] meetings is the possibility that encryption backdoors could be subverted. Specifically, that a back door intended for law enforcement could somehow become a backdoor for people who we don't trust to read our messages. Normally when we talk about this, we're concerned about failures in storage of things like escrow keys. What this Juniper vulnerability illustrates is that the danger is much broader and more serious than that. The problem with cryptographic backdoors is not that they're the only way that an attacker can break intro our cryptographic systems. It's merely that they're one of the best. They take care of the hard work, the laying of plumbing and electrical wiring, so attackers can simply walk in and change the drapes.


(via Tony Finch)
via:fanf  crypto  backdoors  politics  juniper  dual-ec-drbg  netscreen  vpn 
december 2015 by jm
Hyperscan
a high-performance multiple regex matching library. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data.


Via Tony Finch
via:fanf  regexps  regex  dpi  hyperscan  dfa  nfa  hybrid-automata  text-matching  matching  text  strings  streams 
october 2015 by jm
Gene patents probably dead worldwide following Australian court decision
The court based its reasoning on the fact that, although an isolated gene such as BRCA1 was "a product of human action, it was the existence of the information stored in the relevant sequences that was an essential element of the invention as claimed." Since the information stored in the DNA as a sequence of nucleotides was a product of nature, it did not require human action to bring it into existence, and therefore could not be patented.


Via Tony Finch.
via:fanf  australia  genetics  law  ipr  medicine  ip  patents 
october 2015 by jm
qp tries: smaller and faster than crit-bit tries
interesting new data structure from Tony Finch. "Some simple benchmarks say qp tries have about 1/3 less memory overhead and are about 10% faster than crit-bit tries."
crit-bit  popcount  bits  bitmaps  tries  data-structures  via:fanf  qp-tries  crit-bit-tries  hacks  memory 
october 2015 by jm
Codeface
a good collection of coding fonts (via Tony Finch)
via:fanf  fonts  coding  ui 
june 2015 by jm
AV vendors still relying on MD5 to identify malware
oh dear. I can see how this happened -- in many cases they may not still have samples to derive new sums from :(
md5  hashing  antivirus  malware  security  via:fanf  bugs 
june 2015 by jm
Five different ways to handle leap seconds with NTP
Without switching to chronyd, ntpd -x sounds not too suboptimal:
With ntpd, the kernel backward step is used by default. With ntpd versions before 4.2.6, or 4.2.6 and later patched for this bug, the -x option (added to /etc/sysconfig/ntpd) can be used to disable the kernel leap second correction and ignore the leap second as far as the local clock is concerned. The one-second error gained after the leap second will be measured and corrected later by slewing in normal operation using NTP servers which already corrected their local clocks.


It's all pretty messy though :(
ntpd  ntp  chronyd  clocks  time  synchronization  via:fanf  linux  leap-seconds 
june 2015 by jm
'Uncertain<T>: A First-Order Type for Uncertain Data' [paper, PDF]
'Emerging applications increasingly use estimates such as sensor
data (GPS), probabilistic models, machine learning, big
data, and human data. Unfortunately, representing this uncertain
data with discrete types (floats, integers, and booleans)
encourages developers to pretend it is not probabilistic, which
causes three types of uncertainty bugs. (1) Using estimates
as facts ignores random error in estimates. (2) Computation
compounds that error. (3) Boolean questions on probabilistic
data induce false positives and negatives.
This paper introduces Uncertain<T>, a new programming
language abstraction for uncertain data. We implement a
Bayesian network semantics for computation and conditionals
that improves program correctness. The runtime uses sampling
and hypothesis tests to evaluate computation and conditionals
lazily and efficiently. We illustrate with sensor and
machine learning applications that Uncertain<T> improves
expressiveness and accuracy.'

(via Tony Finch)
via:fanf  uncertainty  estimation  types  strong-typing  coding  probability  statistics  machine-learning  sampling 
december 2014 by jm
Fixing tethering on Android KitKat
Google made a change in Android 4.4 which allows operators to know when users are using tethering and conveniently block tethered devices from accessing internet. This can be fixed permanently using the following procedure.


Well this is stupid. (via Tony Finch)
via:fanf  tethering  android  mobile 
december 2014 by jm
OSTree
"git for operating system binaries".

OSTree is a tool for managing bootable, immutable, versioned filesystem trees. It is not a package system; nor is it a tool for managing full disk images. Instead, it sits between those levels, offering a blend of the advantages (and disadvantages) of both.

You can use any build system you like to place content into it on a build server, then export an OSTree repository via static HTTP. On each client system, "ostree admin upgrade" can incrementally replicate that content, creating a new root for the next reboot. This provides fully atomic upgrades. Any changes made to /etc are propagated forwards, and all local state in /var is shared.

A key goal of the project is to complement existing package systems like RPM and Debian packages, and help further their evolution. In particular for example, RPM-OSTree (linked below) has as a goal a hybrid tree/package model, where you replicate a base tree via OSTree, and then add packages on top.
os  gnome  git  linux  immutable  deployment  packaging  via:fanf 
december 2014 by jm
Mandos
'a system for allowing servers with encrypted root file systems to reboot unattended and/or remotely.' (via Tony Finch)
via:fanf  mandos  encryption  security  server  ops  sysadmin  linux 
october 2014 by jm
Mail-in-a-Box
'turns a fresh cloud computer into a working mail server. You get contact synchronization, spam filtering, and so on. On your phone, you can use apps like K-9 Mail and CardDAV-Sync free beta to sync your email and contacts between your phone and your box.'

(via Tony Finch)
via:fanf  mail  diy  hosting  webmail  ops 
september 2014 by jm
Revisiting How We Put Together Linux Systems
Building a running OS out of layered btrfs filesystems. This sounds awesome.
Instantiating a new system or OS container (which is exactly the same in this scheme) just consists of creating a new appropriately named root sub-volume. Completely naturally you can share one vendor OS copy in one specific version with a multitude of container instances.

Everything is double-buffered (or actually, n-fold-buffered), because usr, runtime, framework, app sub-volumes can exist in multiple versions. Of course, by default the execution logic should always pick the newest release of each sub-volume, but it is up to the user keep multiple versions around, and possibly execute older versions, if he desires to do so. In fact, like on ChromeOS this could even be handled automatically: if a system fails to boot with a newer snapshot, the boot loader can automatically revert back to an older version of the OS.


(via Tony Finch)
via:fanf  linux  docker  btrfs  filesystems  unionfs  copy-on-write  os  hacking  unix 
september 2014 by jm
Facebook's drop-in replacement for std::vector
Fixes some low-hanging fruit, performance-wise.

'Simply replacing std::vector with folly::fbvector (after having included the folly/FBVector.h header file) will improve the performance of your C++ code using vectors with common coding patterns. The improvements are always non-negative, almost always measurable, frequently significant, sometimes dramatic, and occasionally spectacular.'

(via Tony Finch)
c++  facebook  performance  algorithms  vectors  via:fanf  optimization 
september 2014 by jm
The poisoned NUL byte, 2014 edition
A successful exploit of Fedora glibc via a single NUL overflow (via Tony Finch)
via:fanf  buffer-overflows  security  nul  byte  exploits  google  project-zero 
august 2014 by jm
Tor exit node operator prosecuted in Austria
'The operator of an exit node is guilty of complicity, because he enabled others to transmit content of an illegal nature through the service.'

Via Tony Finch.
austria  tor  security  law  liability  internet  tunnelling  eu  via:fanf 
july 2014 by jm
Google's Pegasus
a power-management subsystem for warehouse-scale computing farms. "It adjusts the power-performance settings of servers so that the overall workload barely meets its latency constraints for user queries."
pegasus  power-management  power  via:fanf  google  latency  scaling 
june 2014 by jm
Whiteboard Picture Cleaner

This [shell one-liner] will take a picture of a whiteboard and use parts of the ImageMagick library with sane defaults to clean it up tremendously.: convert "$1" -morphology Convolve DoG:15,100,0 -negate -normalize -blur 0x1 -channel RBG -level 60%,91%,0.1 "$2"


Some kind soul has put up a quickie web UI here: http://api.o2b.ru/whiteboardcleaner
graphics  tools  whiteboard  imagemagick  text  images  cleanup  gimp  photoshop  via:fanf 
june 2014 by jm
An analysis of Facebook photo caching
excellent analysis of caching behaviour at scale, from the FB engineering blog (via Tony Finch)
via:fanf  caching  facebook  architecture  photos  images  cache  fifo  lru  scalability 
may 2014 by jm
Akamai's "Secure Heap" patch wasn't good enough
'Having the private keys inaccessible is a good defense in depth move.
For this patch to work you have to make sure all sensitive values are stored in
the secure area, not just check that the area looks inaccessible. You can't do
that by keeping the private key in the same process. A review by a security
engineer would have prevented a false sense of security. A version where the
private key and the calculations are in a separate process would be more
secure. If you decide to write that version, I'll gladly see if I can break
that too.'

Akamai's response: https://blogs.akamai.com/2014/04/heartbleed-update-v3.html -- to their credit, they recognise that they need to take further action.

(via Tony Finch)
via:fanf  cryptography  openssl  heartbleed  akamai  security  ssl  tls 
april 2014 by jm
Stalled SCP and Hanging TCP Connections
a Cisco fail.
It looks like there’s a firewall in the middle that’s doing additional TCP sequence randomisation which was a good thing, but has been fixed in all current operating systems. Unfortunately, it seems that firewall doesn’t understand TCP SACK, which when coupled with a small amount of packet loss and a stateful host firewall that blocks invalid packets results in TCP connections that stall randomly. A little digging revealed that firewall to be the Cisco Firewall Services Module on our Canterbury network border.


(via Tony Finch)
via:fanf  cisco  networking  firewalls  scp  tcp  hangs  sack  tcpdump 
april 2014 by jm
A looming breakthrough in indistinguishability obfuscation
'The team’s obfuscator works by transforming a computer program into what Sahai calls a “multilinear jigsaw puzzle.” Each piece of the program gets obfuscated by mixing in random elements that are carefully chosen so that if you run the garbled program in the intended way, the randomness cancels out and the pieces fit together to compute the correct output. But if you try to do anything else with the program, the randomness makes each individual puzzle piece look meaningless. This obfuscation scheme is unbreakable, the team showed, provided that a certain newfangled problem about lattices is as hard to solve as the team thinks it is. Time will tell if this assumption is warranted, but the scheme has already resisted several attempts to crack it, and Sahai, Barak and Garg, together with Yael Tauman Kalai of Microsoft Research New England and Omer Paneth of Boston University, have proved that the most natural types of attacks on the system are guaranteed to fail. And the hard lattice problem, though new, is closely related to a family of hard problems that have stood up to testing and are used in practical encryption schemes.'

(via Tony Finch)
obfuscation  cryptography  via:fanf  security  hard-lattice-problem  crypto  science 
february 2014 by jm
Sky parental controls break many JQuery-using websites
An 11 hour outage caused by a false positive in Sky's anti-phishing filter; all sites using the code.jquery.com CDN for JQuery would have seen errors.
Sky still appears to be blocking code.jquery.com and all files served via the site, and more worryingly is that if you try to report the incorrect category, once signing in on the Sky website you an error page. We suspect the site was blocked due to being linked to by a properly malicious website, i.e. code.jquery.com and some javascript files were being used on a dodgy website and every domain mentioned was subsequently added to a block list.


(via Tony Finch)
via:fanf  sky  filtering  internet  uk  anti-phishing  phish  jquery  javascript  http  web  fps  false-positives 
january 2014 by jm
Backblaze Blog » What Hard Drive Should I Buy?
Because Backblaze has a history of openness, many readers expected more details in my previous posts. They asked what drive models work best and which last the longest. Given our experience with over 25,000 drives, they asked which ones are good enough that we would buy them again. In this post, I’ll answer those questions.
backblaze  backup  hardware  hdds  storage  disks  ops  via:fanf 
january 2014 by jm
The trouble with timestamps
Timestamps, as implemented in Riak, Cassandra, et al, are fundamentally unsafe ordering constructs. In order to guarantee consistency you, the user, must ensure locally monotonic and, to some extent, globally monotonic clocks. This is a hard problem, and NTP does not solve it for you. When wall clocks are not properly coupled to the operations in the system, causal constraints can be violated. To ensure safety properties hold all the time, rather than probabilistically, you need logical clocks.
clocks  time  distributed  databases  distcomp  ntp  via:fanf  aphyr  vector-clocks  last-write-wins  lww  cassandra  riak 
october 2013 by jm
Good SSL for your website is absurdly difficult in practice
Yet again, security software fails on packaging and UI. via Tony Finch
security  ssl  tls  packaging  via:fanf 
september 2013 by jm
seeing into the UV spectrum after Cataract Surgery with Crystalens
I've been very happy so far with the Crystalens implant for Cataract Surgery [...] one unexpected/interesting aspect is I see a violet glow that others do not - perhaps I'm more sensitive to the low end of the visible light spectrum.


(via Tony Finch)
via:fanf  science  perception  augmentation  uv  light  sight  cool  cataracts  surgery  lens  eyes 
june 2013 by jm
The Patent Protection Racket
Joel On Software weighs in (via Tony Finch):
The fastest growing industry in the US right now, even during this time of slow economic growth, is probably the patent troll protection racket industry.
joel-on-software  patents  swpats  shakedown  extortion  us-politics  patent-trolls  via:fanf 
april 2013 by jm
bloomd
a high-performance C server which is used to expose bloom filters and operations over them to networked clients. It uses a simple ASCII protocol which is human readable, and similar to memcached.
(via Tony Finch)
via:fanf  memcached  bloomd  open-source  bloom-filters 
march 2013 by jm
Confusion reigns over three “hijacked” ccTLDs
This kind of silliness is only likely to increase as the number of TLDs increases (and they become more trivial).
What seems to be happening here is that [two companies involved] have had some kind of dispute, and that as a result the registrants and the reputation of three countries’ ccTLDs have been harmed. Very amateurish.
tlds  domains  via:fanf  amateur-hour  dns  cctlds  registrars  adamsnames 
march 2013 by jm
It’s the Sugar, Folks
A study published in the Feb. 27 issue of the journal PLoS One links increased consumption of sugar with increased rates of diabetes by examining the data on sugar availability and the rate of diabetes in 175 countries over the past decade. And after accounting for many other factors, the researchers found that increased sugar in a population’s food supply was linked to higher diabetes rates independent of rates of obesity. In other words, according to this study, obesity doesn’t cause diabetes: sugar does.

The study demonstrates this with the same level of confidence that linked cigarettes and lung cancer in the 1960s. As Rob Lustig, one of the study’s authors and a pediatric endocrinologist at the University of California, San Francisco, said to me, “You could not enact a real-world study that would be more conclusive than this one.”
nytimes  health  food  via:fanf  sugar  eating  diabetes  papers  medicine 
february 2013 by jm
Exponentially decaying lists
'log scale for lists; Decaying lists allow to manage large range of values. A decaying list grows logarithmically with the number of items. It follows that some items are dropped when other are inserted.' (via Tony Finch)
via:fanf  clojure  algorithms  decay  backoff  half-life  data-structures 
february 2013 by jm
IPMI: Freight Train To Hell
'Intel's Intelligent Platform Management Interface (IPMI), which is implemented and added onto by all server vendors, grant system administrators with a means to manage their hardware in an Out of Band (OOB) or Lights Out Management (LOM) fashion. However there are a series of design, utilization, and vendor issues that cause complex, pervasive, and serious security infrastructure problems.

The BMC is an embedded computer on the motherboard that implements IPMI; it enjoys an asymmetrical relationship with its host, with the BMC able to gain full control of memory and I/O, while the server is both blind and impotent against the BMC. Compromised servers have full access to the private IPMI network

The BMC uses reusable passwords that are infrequently changed, widely shared among servers, and stored in clear text in its storage. The passwords may be disclosed with an attack on the server, over the network network against the BMC, or with a physical attack against the motherboard (including after the server has been decommissioned.)

IT's reliance on IPMI to reduce costs, the near-complete lack of research, 3rd party products, or vendor documentation on IPMI and the BMC security, and the permanent nature of the BMC on the motherboard make it currently very difficult to defend, fix or remediate against these issues.'

(via Tony Finch)
via:fanf  security  ipmi  power-management  hardware  intel  passwords  bios 
february 2013 by jm
Fast Packed String Matching for Short Patterns [paper, PDF]
'Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other
fields, like NLP, information retrieval and computational biology. In the last two decades a general trend has appeared
trying to exploit the power of the word RAM model to speed-up the
performances of classical string matching algorithms. [...]
In this paper we use specialized word-size packed string matching instructions, based on the Intel streaming SIMD extensions (SSE) technology, to design very fast string matching algorithms in the case of short patterns.' Reminds me of http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm , but taking advantage of SIMD extensions, which should make things nice and speedy, at the cost of tying it to specific hardware platforms. (via Tony Finch)
rabin-karp  algorithms  strings  string-matching  papers  via:fanf 
january 2013 by jm
Efficient In-Memory Indexing with Generalized Prefix Trees [PDF]
'Efficient data structures for in-memory indexing gain in importance due to
(1) the exponentially increasing amount of data, (2) the growing main-memory capacity, and (3) the gap between main-memory and CPU speed. In consequence, there are
high performance demands for in-memory data structures. Such index structures are
used -- with minor changes -- as primary or secondary indices in almost every DBMS.
Typically, tree-based or hash-based structures are used, while structures based on
prefix-trees (tries) are neglected in this context. For tree-based and hash-based structures, the major disadvantages are inherently caused by the need for reorganization
and key comparisons. In contrast, the major disadvantage of trie-based structures in
terms of high memory consumption (created and accessed nodes) could be improved.
In this paper, we argue for reconsidering prefix trees as in-memory index structures
and we present the generalized trie, which is a prefix tree with variable prefix length
for indexing arbitrary data types of fixed or variable length. The variable prefix length
enables the adjustment of the trie height and its memory consumption. Further, we
introduce concepts for reducing the number of created and accessed trie levels. This
trie is order-preserving and has deterministic trie paths for keys, and hence, it does
not require any dynamic reorganization or key comparisons. Finally, the generalized
trie yields improvements compared to existing in-memory index structures, especially
for skewed data. In conclusion, the generalized trie is applicable as general-purpose
in-memory index structure in many different OLTP or hybrid (OLTP and OLAP) data
management systems that require balanced read/write performance.' (via Tony Finch)
via:fanf  prefix-trees  tries  data-structures 
january 2013 by jm
The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases [PDF]
'Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data structures like balanced binary search trees are not efficient on modern hardware, because they do not optimally utilize on-CPU caches. Hash tables, also often used for main-memory indexes, are fast but only support point queries. To overcome these shortcomings, we present ART, an adaptive radix tree (trie) for efficient indexing in main memory. Its lookup performance surpasses highly tuned, read-only search trees, while supporting very efficient insertions and deletions as well. At the same time, ART is very space efficient and solves the problem of excessive worst-case space consumption, which plagues most radix trees, by adaptively choosing compact and efficient data structures for internal nodes. Even though ART’s performance is comparable to hash tables, it maintains the data in sorted order, which enables additional operations like range scan and prefix lookup.' (via Tony Finch)
via:fanf  data-structures  trees  indexing  cache-aware  tries 
january 2013 by jm
HAT-trie: A Cache-conscious Trie-based Data Structure for Strings [PDF]
'Tries are the fastest tree-based data structures for managing strings in-memory, but are space-intensive. The burst-trie is almost as fast but reduces space by collapsing trie-chains into buckets. This is not however, a cache-conscious approach and can lead to poor performance on current processors. In this paper, we introduce the HAT-trie, a cache-conscious trie-based data structure that is formed by carefully combining existing components. We evaluate performance using several real-world datasets and against other highperformance data structures. We show strong improvements in both time and space; in most cases approaching that of the cache-conscious hash table. Our HAT-trie is shown to be the most efficient trie-based data structure for managing variable-length strings in-memory while maintaining sort order.' (via Tony Finch)
via:fanf  data-structures  tries  cache-aware  trees 
january 2013 by jm
Patent trolls want $1,000 for using scanners
We are truly living in the future -- a dystopian future, but one nonetheless. A patent troll manages to obtain "gobbledigook" patents on using a scanner to scan to PDF, then attempts to shake down a bunch of small companies before eventually running into resistance, at which point it "forks" into a bunch of algorithmically-named shell companies, spammer-style, sending the same demands. Those demands in turn contain this beauty of Stockholm-syndrome-inducing prose:

'You should know also that we have had a positive response from the business community to our licensing program. As you can imagine, most businesses, upon being informed that they are infringing someone’s patent rights, are interested in operating lawfully and taking a license promptly. Many companies have responded to this licensing program in such a manner. Their doing so has allowed us to determine that a fair price for a license negotiated in good faith and without the need for court action is a payment of $900 per employee. We trust that your organization will agree to conform your behavior to respect our patent rights by negotiating a license rather than continuing to accept the benefits of our patented technology without a license. Assuming this is the case, we are prepared to make this pricing available to you.'


And here's an interesting bottom line:

The best strategy for target companies? It may be to ignore the letters, at least for now. “Ignorance, surprisingly, works,” noted Prof. Chien in an e-mail exchange with Ars.

Her study of startups targeted by patent trolls found that when confronted with a patent demand, 22 percent ignored it entirely. Compare that with the 35 percent that decided to fight back and 18 percent that folded. Ignoring the demand was the cheapest option ($3,000 on average) versus fighting in court, which was the most expensive ($870,000 on average).

Another tactic that clearly has an effect: speaking out, even when done anonymously. It hardly seems a coincidence that the Project Paperless patents were handed off to a web of generic-sounding LLCs, with demand letters signed only by “The Licensing Team,” shortly after the “Stop Project Paperless” website went up. It suggests those behind such low-level licensing campaigns aren’t proud of their behavior. And rightly so.
patents  via:fanf  networks  printing  printers  scanning  patent-trolls  project-paperless  adzpro  gosnel  faslan 
january 2013 by jm
29c3 HashDOS presentation slides (PDF)
Summary: MurmurHash still vulnerable, likewise Cityhash and Python's hash -- use SipHash
via:fanf  cityhash  siphash  hash  dos  security  hashdos  murmurhash 
january 2013 by jm
Authentication is machine learning
This may be the most insightful writing about authentication in years:
<p>
From my brief time at Google, my internship at Yahoo!, and conversations with other companies doing web authentication at scale, I’ve observed that as authentication systems develop they gradually merge with other abuse-fighting systems dealing with various forms of spam (email, account creation, link, etc.) and phishing. Authentication eventually loses its binary nature and becomes a fuzzy classification problem.</p><p>This is not a new observation. It’s generally accepted for banking authentication and some researchers like Dinei Florêncio and Cormac Herley have made it for web passwords. Still, much of the security research community thinks of password authentication in a binary way [..]. Spam and phishing provide insightful examples: technical solutions (like Hashcash, DKIM signing, or EV certificates), have generally failed but in practice machine learning has greatly reduced these problems. The theory has largely held up that with enough data we can train reasonably effective classifiers to solve seemingly intractable problems.
</p>


(via Tony Finch.)
passwords  authentication  big-data  machine-learning  google  abuse  antispam  dkim  via:fanf 
december 2012 by jm
Knots on Mars! (and a few thoughts on NASA's knots)
amazing post from the International Guild of Knot Tyers Forum:

While a few of the folks here are no doubt aware, it might surprise most people to learn that knots tied in cords and thin ribbons have probably traveled on every interplanetary mission ever flown. If human civilization ends tomorrow, interplanetary landers, orbiters, and deep space probes will preserve evidence of both the oldest and newest of human technologies for millions of years.

Knots are still used in this high-tech arena because cable lacing has long been the preferred cable management technique in aerospace applications. That it remains so to this day is a testament to the effectiveness of properly chosen knots tied by skilled craftspeople. It also no doubt has a bit to do with the conservative nature of aerospace design and engineering practices. Proven technologies are rarely cast aside unless they no longer fulfill requirements or there is something substantially better available.

While the knots used for cable lacing in general can be quite varied -- in some cases even a bit idiosyncratic -- NASA has in-house standards for the knots and methods used on their spacecraft. These are specified in NASA Technical Standard NASA-STD-8739.4 -- Crimping, Interconnecting Cables, Harnesses, and Wiring. As far as I've been able to identify in the rover images below, all of the lacings shown are one of two of the several patterns specified in the standard.

The above illustration shows the so-called "Spot Tie". It is a clove hitch topped by two half-knots in the form of a reef (square) knot. In addition to its pure binding role, it is also used to affix cable bundles to tie-down point.


Some amazing scholarship on knot technology in this post -- lots to learn! (via Tony Finch, iirc)
via:fanf  mars  nasa  science  knots  tying  rope  cables  cabling  geek  aerospace  standards 
september 2012 by jm
A Closer Look: Email-Based Malware Attacks
'The average detection rate for these samples was 24.47 percent, while the median detection rate was just 19 percent.' That is *atrocious*. (via Tony Finch)
via:fanf  fail  malware  filtering  av  smtp  email  viruses 
june 2012 by jm
Analyzing Flame's MD5 Collision Attack [slides, PDF]
really detailed slide deck by Alex Sotirov, Co-Founder and Chief Scientist, Trail of Bits, Inc. (via Tony Finch) Plenty of security fail by MS, and also: PKI is clearly too hard
via:fanf  flame  security  malware  md5  collisions  hashing  pki  tls  ssl  microsoft 
june 2012 by jm
UK Channel 4 News Demo – Contactless Payment Cards – viaForensics
'During an interview with the Channel 4 correspondent we were able to touch his wallet with an Android phone while he was distracted and capture his credit card details.' ... 'viaForensics found that there are many cards in circulation, including recently issued cards, which are giving up the full card number, expiry, surname and initials.' Barclays security fail hits the headlines (via Tony Finch)
via:fanf  channel-4  news  barclays-bank  uk  banking  nfc  wireless  android  via-forensics  contactless-cards 
may 2012 by jm
A one-line software patent – and a fix
Just another sad story of how software patenting made a standard useless. "I had once hoped that JBIG-KIT would help with the exchange of scanned documents on the Internet, facilitate online inter-library loans, and make paper archives more accessible to users all over the world. However, the impact was minimal: no web browser dared to directly support a standardized file format covered by 23 patents, the last of which expired today. About 25 years ago, large IT research organizations discovered standards as a gold mine, a vehicle to force users to buy patent licenses, not because the technology is any good, but because it is required for compatibility. This is achieved by writing the standards very carefully such that there is no way to come up with a compatible implementation that does not require a patent license, an art that has been greatly perfected since."
via:fanf  patents  jbig1  swpats  scanning  standards  rand  frand  licensing 
april 2012 by jm
Fake Unicode Consortium
featuring such codepoints as "I USED TO BE A LATIN CAPITAL LETTER K LIKE YOU THEN I TOOK AN ARROW IN THE KNEE", "BACK TO THE FUTURE", "ENTERING HYPERSPACE", "LATIN CAPITAL LETTER Q TAKING A NAP", and "LOVE HOTEL". no wait, that one's real (via Tony Finch, with comments by Michael Everson!)
unicode  humor  codepoints  i18n  fonts  skyrim  hyperspace  funny  via:fanf 
march 2012 by jm
Microsoft's Azure Feb 29th, 2012 outage postmortem
'The leap day bug is that the GA calculated the valid-to date by simply taking the current date and adding one to its year. That meant that any GA that tried to create a transfer certificate on leap day set a valid-to date of February 29, 2013, an invalid date that caused the certificate creation to fail.' This caused cascading failures throughout the fleet. Ouch -- should have been spotted during code review
azure  dev  dates  leap-years  via:fanf  microsoft  outages  post-mortem  analysis  failure 
march 2012 by jm
EFF Wins Protection for Time Zone Database
'The Electronic Frontier Foundation (EFF) is pleased to announce that a copyright lawsuit threatening an important database of time zone information has been dismissed. The astrology software company that filed the lawsuit, Astrolabe, has also apologized and agreed to a 'covenant not to sue' going forward, which will help protect the database from future baseless legal actions and disruptions.

Software engineers around the world depend on the time zone database to make sure that time-stamps for email and other files work correctly no matter where you are. However, last September, Astrolabe filed a lawsuit against Arthur David Olson and Paul Eggert – the researchers who coordinated the database's development for decades – because the database includes information from an atlas in which Astrolabe claimed to own copyright. But facts – like what time the sun rises – are not copyrightable. EFF, along with co-counsel Adam Kessel and Olivia Nguyen at the Boston office of Fish & Richardson P.C, promptly signed on to defend Olson and Eggert and protect this essential tool. In January, EFF advised Astrolabe that Olson and Eggert would move for sanctions if Astrolabe did not withdraw its complaint. Today's dismissal followed.'
copyright  eff  timezones  via:fanf  time  unix  olson 
february 2012 by jm
_Intellectual property rights and innovation: Evidence from the human genome_ (PDF)
'Do intellectual property (IP) rights on existing technologies hinder subsequent
innovation? Using newly-collected data on the sequencing of the human genome by
the public Human Genome Project and the private rm Celera, this paper estimates
the impact of Celera's gene-level IP on subsequent scienti c research and product
development. Genes initially sequenced by Celera were held with IP for up to two
years, but moved into the public domain once re-sequenced by the public e ort.
Across a range of empirical speci cations, I nd evidence that Celera's IP led to
reductions in subsequent scienti c research and product development on the order of
20 to 30 percent. Taken together, these results suggest that Celera's short-term IP
had persistent negative e ects on subsequent innovation relative to a counterfactual
of Celera genes having always been in the public domain.' (via Tony Finch)
via:fanf  genetics  ip  copyright  open-source  celera  patents  papers  pdf 
february 2012 by jm
The Captain of the Costa Concordia is Totally Screwed [OP/ED]
'For the most senior officer on board, the one who had been entrusted with the care and safety of this magnificent ship, his job was far from over. In fact the Captain had just added a new job title to his resume, that of ON SCENE COMMANDER. But apparently he didn’t realize it because he took off in a lifeboat, leaving this giant steaming pile to be picked up by the Italian police and Coast Guard who are continuing to search for survivors, and prevent looters from gaining access. The Captain didn’t just take off in a lifeboat, he left the entire scene completely.' oh dear. (via Tony Finch)
via:fanf  disaster  ineptitude  maritime  boats  tourism  giglio  sea  sinking  liners  safety 
january 2012 by jm
BufferBloat: What's Wrong with the Internet? - ACM Queue
'A discussion with Vint Cerf, Van Jacobson, Nick Weaver, and Jim Gettys' -- the big guns! Great discussion (via Tony Finch)
via:fanf  bufferbloat  networking  buffers  buffering  performance  load  tcp  ip 
december 2011 by jm
Bayes' theorem ruled inadmissible in UK law courts
Bayes' theorem, and 'similar statistical analysis', ruled inadmissible in UK law courts (via Tony Finch)
uk  law  guardian  via:fanf  bayes  maths  statistics  legal 
october 2011 by jm
LRB · James Meek · In the Sorting Office
'The postwoman is paid a pittance to deliver corporate mail. She hasn’t done her job well, yet so few people have complained about missed deliveries that she hasn’t been found out. Across the world, postal services are being altered like this: optimised to deliver the maximum amount of unwanted mail at the minimum cost to businesses. In the internet age private citizens are sending less mail than they used to, but that’s only part of the story of postal decline. The price of driving down the cost of bulk mailing for a handful of big organisations is being paid for by the replacement of decently paid postmen with casual labour and the erosion of daily deliveries.' (via Tony Finch)
via:fanf  post  mail  postal-service  holland  dutch  postmen  work  jobs  business  politics  lrb 
april 2011 by jm
ImperialViolet - Revocation doesn't work
OCSP doesn't work -- the browser vendors have failed to implement it safely
security  ssl  https  tls  ocsp  revocation  crl  via:fanf  from delicious
march 2011 by jm
good Hacker News thread on djb's "redo"
YA make-replacement build system. the thread is better than the linked article, btw
hacker-news  via:fanf  make  build  djb  redo  compilation  building  coding  open-source  from delicious
january 2011 by jm
One of the ICE domain seizures was a legit mp3 blog, posting legal promo mp3s
At least one of the sites seized by DHS was an mp3 blog which posted authorised, promotional mp3s, sent from record label VPs and artists -- ie. none of the supposedly "infringing" files, actually were infringing. (via Tony Finch)
mp3  music  piracy  law  ice  dhs  filesharing  copyright  copyfight  techdirt  via:fanf  seizure  mp3blogs  from delicious
december 2010 by jm
Tony Finch - Some notes on Bloom filters
more good Bloom Filter tips. he says: 'I take a slightly different tack, starting with a target population in mind which determines the size of the filter. Also there's a minor error regarding performance in the corte.si post. You only need to calculate two hash functions, and use a linear combination of them to index the Bloom filter. This simplifies the coding a lot, and if hash calculation dominates filter indexing, it's also a lot faster.'
bloom-filters  tips  coding  via:fanf  false-positives  from delicious
november 2010 by jm
Claimed HDCP master key leak could be fatal to DRM scheme
ouch - master key for HDMI now available, if true (via tony finch)
via:fanf  hdmi  hdcp  video  drm  from delicious
september 2010 by jm
Why Our Civilization's Video Art and Culture is Threatened by the MPEG-LA
incredible. Almost every single modern camera capable of recording video now requires that you obtain a license from MPEG-LA to use recorded footage for commercial purposes. These clauses are currently not enforced, but could be. Horrifying (via Tony Finch)
via:fanf  patents  mpeg2  codec  compression  consumer-rights  copyright  legal  law  mpeg  h264  mpegla  codecs  from delicious
may 2010 by jm
Where Tcl and Tk Went Wrong
from David Welton. what, the lack of support for GNOME UI standards was *deliberate*? bad choice if so
gnome  david-welton  languages  via:fanf  scripting  gui  tk  tcl  from delicious
march 2010 by jm
FastMail and sessions
a clever HTTP session-management trick (via Tony Finch)
via:fanf  web  http  sessions  cookies  fastmail  from delicious
march 2010 by jm
RFC 5782 - DNS Blacklists and Whitelists
John Levine gets DNS*Ls standardized, at last. we should really check SpamAssassin to see if it's compliant, I guess ;)
dnsbls  anti-spam  dnswl  dnsbl  rfcs  standards  via:fanf  from delicious
february 2010 by jm
« earlier      
per page:    204080120160

related tags

abuse  adamsnames  adzpro  aerospace  airlines  akamai  algorithms  amateur-hour  analysis  android  antarctica  anti-phishing  anti-spam  antispam  antivirus  aphyr  apis  architecture  augmentation  australia  austria  authentication  av  azure  ba  backblaze  backdoors  backoff  backup  banking  barclays-bank  bayes  bicycles  big-data  bikes  bios  bitmaps  bits  blockchain  blocking  blocklists  bloom-filters  bloomd  boats  book  books  bridge  bsd  btrfs  buffer-overflows  bufferbloat  buffering  buffers  bugs  build  building  business  byte  c  c++  cables  cabling  cache  cache-aware  caching  cards  cas  cassandra  cataracts  cctlds  celera  censorship  cfengine  channel-4  chronyd  cisco  cityhash  cleanup  clocks  clojure  codec  codecs  codepoints  coding  cold  collisions  compilation  compression  concurrency  consumer-rights  contactless-cards  containerization  containers  cookies  cool  copy-on-write  copyfight  copyright  cpus  crit-bit  crit-bit-tries  crl  crypto  cryptography  cs  csv  cvs  cycling  dan-luu  data  data-structures  databases  datacenters  dates  david-welton  decay  delegation  deployment  dev  devops  dfa  dhs  diabetes  disaster  discoveryd  disks  distcomp  distributed  diy  djb  dkim  dns  dnsbl  dnsbls  dnswl  docker  domains  dos  dot-io  dpi  drm  dual-ec-drbg  dutch  eating  eff  email  encoding  encryption  esb  estimation  ethereum  eu  excel  exploits  extortion  eyes  facebook  fail  failure  false-positives  falsehoods  faslan  fastmail  fifo  filesharing  filesystems  filtering  firewalls  flame  floating-point  fonts  food  fps  frand  funny  geek  genetics  giglio  gimp  git  glue-records  gmail  gnome  gnu  google  gosnel  graphics  guardian  gui  h264  hacker-news  hacking  hacks  half-life  hangs  hard-lattice-problem  hardware  hash  hashdos  hashing  hdcp  hdds  hdmi  health  heartbleed  holland  hosting  http  https  humor  hybrid-automata  hyperscan  hyperspace  i18n  ice  icecube  imagemagick  images  immutable  indexing  ineptitude  infrastructure  intel  internet  ip  ipmi  ipr  isps  j2ee  javascript  jbig1  jeff-erickson  job  jobs  joel-on-software  jquery  json  juniper  kaspersky  knots  languages  last-write-wins  latency  law  leap-seconds  leap-years  legal  lens  liability  licensing  light  liners  linux  load  lrb  lru  lww  lxc  mac  machine-learning  mail  make  malware  mandos  maritime  mars  matching  maths  md5  mdnsresponder  medicine  memcached  memory  microsoft  mobile  money  mp3  mp3blogs  mpeg  mpeg2  mpegla  multicore  murmurhash  music  nasa  netscreen  networking  networks  news  nfa  nfc  ntp  ntpd  nul  nytimes  obfuscation  ocsp  olson  open-source  openssl  ops  optimization  os  osx  outages  overtaking  packaging  packets  papers  parallel  passwords  patent-trolls  patents  pattern-matching  pdf  pegasus  perception  performance  phish  photos  photoshop  piracy  pki  politics  popcount  post  post-mortem  postal-service  postmen  postmortems  power  power-management  prefix-trees  printers  printing  prngs  probability  programming  progress  project-paperless  project-zero  qp-tries  rabin-karp  rand  random  reading  redo  regex  regexps  registrars  regular-expressions  reliability  research  revocation  rfc  rfc4180  rfcs  riak  rngs  roads  rope  rsync  sack  safety  sampling  scalability  scaling  scanning  science  scp  scripting  sea  security  seizure  sensory-networks  server  sessions  shakedown  shell  sight  sinking  siphash  sky  skyrim  smtp  south-pole  spam  sre  ssh  ssl  standards  statistics  storage  streams  string-matching  strings  strong-typing  sugar  surgery  swpats  synchronization  sysadmin  tcl  tcp  tcpdump  techdirt  tethering  text  text-matching  time  timezones  tips  tk  tlds  tls  tools  tor  tourism  transfer  trees  tries  trojans  tunnelling  tying  types  ui  uiuc  uk  uncertainty  unicode  unionfs  unix  ups  us-politics  uv  vector-clocks  vectors  via-forensics  via:fanf  video  virus  viruses  vpn  web  webmail  whiteboard  wipac  wireless  work 

Copy this bookmark:



description:


tags: