5197
guava-retrying
Apache-licensed open source java lib to implement retrying behaviour cleanly.
a general purpose method for retrying arbitrary Java code with specific stop, retry, and exception handling capabilities that are enhanced by Guava's predicate matching. It also includes an exponential backoff WaitStrategy that might be useful for situations where more well-behaved service polling is preferred.
retries  retrying  resiliency  fault-tolerance  java  open-source  guava 
february 2013
Cycling in Dublin City: the numbers
7.6% of the Dublin commuter population "mainly cycle". some interesting stats here
statistics  dublin  ireland  cycling  commuting  travel 
february 2013
the talking cat of azawad
Utilizing an iPhone/Android App known as “Talking Tom Cat”, the tool has been transformed into a new media mouthpiece, addressing very specific particulars of the conflict that are glossed over by international media: alliances between MNLA and Ansar Dine, critiques of hypocrisy of the MUJAO factions, and ousting of corrupt politicians.
apps  wtf  politics  talking-tom-cat  bizarre  tuareg  africa  via:neilmajor 
february 2013
HyperLogLog++: Google’s Take On Engineering HLL
Google and AggregateKnowledge's improvements to the HyperLogLog cardinality estimation algorithm
hyperloglog  cardinality  estimation  streaming  stream-processing  cep 
february 2013
Russia's anti-child-porn internet blocklist allegedly being used for general censorship
Allegedly being used to censor political and anti-corruption journalism, and a Russian wikipedia-like site for hosting an article about suicide
censorship  feature-creep  russia  politics  blocklists 
february 2013
'Medians and Beyond: New Aggregation Techniques for Sensor Networks' [paper, PDF]
'We introduce Quantile Digest or q-digest, a novel data structure which provides provable guarantees on approximation error and maximum resource consumption. In more concrete terms, if the values returned by the sensors are integers in the range [1;n], then using q-digest we can answer quantile queries using message size m within an error of O(log(n)/m). We also outline how we can use q-digest to answer other queries such as range queries, most frequent items and histograms. Another notable property of q-digest is that in addition to the theoretical worst case bound error, the structure carries with itself an estimate of error for this particular query.'
q-digest  algorithms  streams  approximation  histograms  median  percentiles  quantiles 
february 2013
clearspring / stream-lib
ASL-licensed open source library of stream-processing/approximation algorithms: count-min sketch, space-saving top-k, cardinality estimation, LogLog, HyperLogLog, MurmurHash, lookup3 hash, Bloom filters, q-digest, stochastic top-k
algorithms  coding  streams  cep  stream-processing  approximation  probabilistic  space-saving  top-k  cardinality  estimation  bloom-filters  q-digest  loglog  hyperloglog  murmurhash  lookup3 
february 2013
'Efficient Computation of Frequent and Top-k Elements in Data Streams' [paper, PDF]
The Space-Saving algorithm to compute top-k in a stream. I've been asking a variation of this problem as an interview question for a while now, pretty cool to find such a neat solution. Pity neither myself nor anyone I've interviewed has come up with it ;)
space-saving  approximation  streams  stream-processing  cep  papers  pdf  algorithms 
february 2013
Real-time Analytics in Scala [slides, PDF]
some good approximation/streaming algorithms and tips on Scala implementation
streams  algorithms  approximation  coding  scala  slides 
february 2013
How did I do the Starwars Traceroute?
It is accomplished using many vrfs on 2 Cisco 1841s. For those less technical, VRFs are essentially private routing tables similar to a VPN. When a packet destined to 216.81.59.173 (AKA obiwan.scrye.net) hits my main gateway, I forward it onto the first VRF on the "ASIDE" router on 206.214.254.1. That router then has a specific route for 216.81.59.173 to 206.214.254.6, which resides on a different VRF on the "BSIDE" router. It then has a similar set up which points it at 206.214.254.9 which lives in another VPN on "ASIDE" router. All packets are returned using a default route pointing at the global routing table. This was by design so the packets TTL expiration did not have to return fully through the VRF Maze. I am a consultant to Epik Networks who let me use the Reverse DNS for an unused /24, and I used PowerDNS to update all of the entries through mysql. This took about 30 minutes to figure out how to do it, and about 90 minutes to implement.
vrfs  routing  networking  hacks  star-wars  traceroute  rdns  ip 
february 2013
Bit9's whitelisting keys stolen
Black hats steal code-signing keys from software whitelisting anti-malware firm. Pretty audacious
malware  security  whitelisting  av 
february 2013
osx - Remap "Home" and "End" to beginning and end of line
in summary: ~/Library/KeyBindings/DefaultKeyBinding.dict. Thanks, Apple, this is stupid
mac  keyboard  bindings  it-just-works  compatibility  ui  rebinding 
february 2013
Goonwaffe Stories: A Guide For Newbies [PDF]
impressively high-quality newbie's guide from the Goonswarm Federation -- as themittani.com describes it, 'frankly a work of art: a 1950's Pulp Scifi magazine full of internet spaceships and sociopathy.'
eve-online  space  goonswarm  gaming  mmo  pdf  pulp  science-fiction 
february 2013
Splout
'Splout is a scalable, open-source, easy-to-manage SQL big data view. Splout is to Hadoop + SQL what Voldemort or Elephant DB are to Hadoop + Key/Value. Splout serves a read-only, partitioned SQL view which is generated and indexed by Hadoop.'

Some FAQs: 'What's the difference between Splout SQL and Dremel-like solutions such as BigQuery, Impala or Apache Drill? Splout SQL is not a "fast analytics" Dremel-like engine. It is more thought to be used for serving datasets under web / mobile high-throughput, many lookups, low-latency applications. Splout SQL is more like a NoSQL database in the sense that it has been thought for answering queries under sub-second latencies. It has been thought for performing queries that impact a very small subset of the data, not queries that analyze the whole dataset at once.'
splout  sql  big-data  hadoop  read-only  scaling  queries  analytics 
february 2013
High Scalability - Analyzing billions of credit card transactions and serving low-latency insights in the cloud
Hadoop, a batch-generated read-only Voldemort cluster, and an intriguing optimal-storage histogram bucketing algorithm:
The optimal histogram is computed using a random-restart hill climbing approximated algorithm.
The algorithm has been shown very fast and accurate: we achieved 99% accuracy compared to an exact dynamic algorithm, with a speed increase of one factor. [...] The amount of information to serve in Voldemort for one year of BBVA's credit card transactions on Spain is 270 GB. The whole processing flow would run in 11 hours on a cluster of 24 "m1.large" instances. The whole infrastructure, including the EC2 instances needed to serve the resulting data would cost approximately $3500/month.
scalability  scaling  voldemort  hadoop  batch  algorithms  histograms  statistics  bucketing  percentiles 
february 2013
Evasi0n Jailbreak's Userland Component
Good writeup of the exploit techniques used in the new iOS jailbreak.
Evasi0n is interesting because it escalates privileges and has full access to the system partition all without any memory corruption.  It does this by exploiting the /var/db/timezone vulnerability to gain access to the root user’s launchd socket.  It then abuses launchd to load MobileFileIntegrity with an inserted codeless library, which is overriding MISValidateSignature to always return 0.
jailbreak  ios  iphone  ipad  exploits  evasi0n  via:nelson 
february 2013
Event Bars - Craft Beer
craft beer kegs for hire in Dublin, Sligo, Limerick and Galway. Needs more Metalman, of course ;)
beer  ireland  craft-beer  keg-hire  events  parties 
february 2013
Jetty-9 goes fast with Mechanical Sympathy
This is very cool! Applying Mechanical Sympathy optimization techniques to Jetty, specifically: "False sharing" on the BlockingArrayQueue data structure resolved; a new ArrayTernaryTrie data structure to improve header field storage, making it faster to build. look up, efficient on RAM, cheap to GC, and more cache-friendly than a traditional trie; and a branchless hex-to-byte conversion statement. The results are a 30%-faster microbenchmark on amd64, with 50% less Young Gen garbage collections. Lovely to see low-level infrastructure libs like Jetty getting this kind of optimization.
jetty  java  mechanical-sympathy  optimization  coding  tries 
february 2013
Programming Language Checklist
'You appear to be advocating a new:
[ ] functional [ ] imperative [ ] object-oriented [ ] procedural [ ] stack-based
[ ] "multi-paradigm" [ ] lazy [ ] eager [ ] statically-typed [ ] dynamically-typed
[ ] pure [ ] impure [ ] non-hygienic [ ] visual [ ] beginner-friendly
[ ] non-programmer-friendly [ ] completely incomprehensible
programming language. Your language will not work. Here is why it will not work.'
humor  programming  funny  coding  languages 
february 2013
"Security Engineering" now online in full
Ross Anderson says: 'I’m delighted to announce that my book Security Engineering – A Guide to Building Dependable Distributed Systems is now available free online in its entirety. You may download any or all of the chapters from the book’s web page.'
security  books  reference  coding  software  encryption  ross-anderson 
february 2013
Clairvoyant Squirrel: Large Scale Malicious Domain Classification
Storm-based service to detect malicious DNS domain usage from streaming pcap data in near-real-time. Uses string features in the DNS domain, along with randomness metrics using Markov analysis, combined with a Random Forest classifier, to achieve 98% precision at 10,000 matches/sec
storm  distributed  distcomp  random-forest  classifiers  machine-learning  anti-spam  slides 
february 2013
C++ B-Tree
a new C++ template library from Google which implements an in-memory B-Tree container type, suitable for use as a drop-in replacement for std::map, set, multimap and multiset. Lower memory use, and reportedly faster due to better cache-friendliness
c++  google  data-structures  containers  b-trees  stl  map  set  open-source 
february 2013
A Continuous Packaging Pipeline
presentation describing some nice automation tools for packaging vendor code for deployment
deployment  fosdem  presentations  slides  debian  deb  fpm  apt-get 
february 2013
java - Given that HashMaps in jdk1.6 and above cause problems with multi-threading, how should I fix my code - Stack Overflow
Massive Java concurrency fail in recent 1.6 and 1.7 JDK releases -- the java.util.HashMap type now spin-locks on an AtomicLong in its constructor.

Here's the response from the author: 'I'll acknowledge right up front that the initialization of hashSeed is a bottleneck but it is not one we expected to be a problem since it only happens once per Hash Map instance. For this code to be a bottleneck you would have to be creating hundreds or thousands of hash maps per second. This is certainly not typical. Is there really a valid reason for your application to be doing this? How long do these hash maps live?'

Oh dear. Assumptions of "typical" like this are not how you design a fundamental data structure. fail. For now there is a hacky reflection-based workaround, but this is lame and needs to be fixed as soon as possible. (Via cscotta)
java  hashmap  concurrency  bugs  fail  security  hashing  jdk  via:cscotta 
february 2013
IPMI: Freight Train To Hell
'Intel's Intelligent Platform Management Interface (IPMI), which is implemented and added onto by all server vendors, grant system administrators with a means to manage their hardware in an Out of Band (OOB) or Lights Out Management (LOM) fashion. However there are a series of design, utilization, and vendor issues that cause complex, pervasive, and serious security infrastructure problems.

The BMC is an embedded computer on the motherboard that implements IPMI; it enjoys an asymmetrical relationship with its host, with the BMC able to gain full control of memory and I/O, while the server is both blind and impotent against the BMC. Compromised servers have full access to the private IPMI network

The BMC uses reusable passwords that are infrequently changed, widely shared among servers, and stored in clear text in its storage. The passwords may be disclosed with an attack on the server, over the network network against the BMC, or with a physical attack against the motherboard (including after the server has been decommissioned.)

IT's reliance on IPMI to reduce costs, the near-complete lack of research, 3rd party products, or vendor documentation on IPMI and the BMC security, and the permanent nature of the BMC on the motherboard make it currently very difficult to defend, fix or remediate against these issues.'

(via Tony Finch)
via:fanf  security  ipmi  power-management  hardware  intel  passwords  bios 
february 2013
The colour of London's commute
Nice visualisation. 'What the map shows is the mix of transport to work of residents living in each part of London*, using ONS data at Middle Super Output Area (MSOA) level. Each MSOA is given an RGB colour determined by the modal share, with red colours representing travel by car, taxi or motorbike, blue travel by public transport and green cycling or walking. The result is a fairly simple pattern, with motor vehicles predominating on London's fringes, public transport in the inner suburbs and cycling and walking in the very centre. Those tendrils of blue reaching out presumably represent major public transport links.'
data  visualisation  dataviz  london  mapping  via:ldoody 
january 2013
Dublin Free WiFi Icons
some lovely pixel art to advertise the free wifi areas, by Craig Robinson. I see a girl in pyjamas, a Dub hurler, a viking, Molly Malone, Phil Lynott, Oscar Wilde, a Moore St market trader, a busker, and the Spire...
pixel-art  dublin  ireland  art  craig-robinson  icons 
january 2013
Where are the free WiFi spots in Dublin City Centre?
hooray, free wifi! beautiful Invader-style pixel-art mosaics to highlight them, too. nice one Joe
wifi  free  dublin  ireland  city  public 
january 2013
DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing
thumbs-up for DNSMadeEasy's Global Traffic Director anycast-based geographically-segmented DNS service, in particular
dns  architecture  scalability  search  duckduckgo  geoip  anycast 
january 2013
Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com
Using new Intel Core i7 instructions to speed up string manipulation. Fascinating stuff. SSE ftw
sse  optimization  simd  assembly  intel  i7  intel-core  strstr  strings  string-matching  strchr  strlen  coding 
january 2013
How Newegg crushed the “shopping cart” patent and saved online retail
Very cool account of Newegg's battle against a ludicrous patent-troll shakedown. Great quote from their Chief Legal Officer, Lee Cheng:
Patent trolling is based upon deficiencies in a critical, but underdeveloped, area of the law. The faster we drive these cases to verdict, and through appeal, and also get legislative reform on track, the faster our economy will be competitive in this critical area. We're competing with other economies that are not burdened with this type of litigation. China doesn't have this, South Korea doesn't have this, Europe doesn't have this. [...]

It's actually surprising how quickly people forget what Lemelson did. [referring to Jerome Lemelson, an infamous patent troll who used so-called "submarine patents" to make billions in licensing fees.] This activity is very similar. Trolls right now "submarine" as well. They use timing, like he used timing. Then they pop up and say "Hello, surprise! Give us your money or we will shut you down!" Screw them. Seriously, screw them. You can quote me on that.
patent-trolls  east-texas  newegg  shopping-cart  swpat  software-patents  patents  ecommerce  soverain 
january 2013
PUBLIC joho / 7XX-rfc
At Railscamp X it became clear there is a gap in the current HTTP specification. There are many ways for a developer to screw up their implementation, but no code to share the nature of the error with the end user. We humbly suggest the following status codes are included in the HTTP spec in the 7XX range.


Includes such useful status codes as "724 - This line should be unreachable".
http  standards  humour  funny  jokes 
january 2013
Fox DMCA Takedowns Order Google to Remove Fox DMCA Takedowns
Chilling Effects is setup to stop the ‘chilling effects’ of Internet censorship. Google sees this as a good thing and sends takedown requests it receives to be added to the database. Fox sends takedown requests to Google for pages which the company says contain links to material it holds the copyright to. Those pages include those on Chilling Effects which show which links Fox wants taken down. Google delists the Chilling Effects pages from its search engine, thus completing the circle and defeating the very reason Chilling Effects was set up for in the first place.
chilling-effects  copyright  internet  legal  dmca  google  law 
january 2013
Ironfan
'an expressive toolset for constructing scalable, resilient [service] architectures. It works in the cloud, in the data center, and on your laptop, and it makes your system diagram visible and inevitable. Inevitable systems coordinate automatically to interconnect, removing the hassle of manual configuration of connection points (and the associated danger of human error).' Looks like a pretty neat cluster deployment tool; driven from a single configuration file, using Chef, integrating closely with AWS and providing many useful additional features
chef  deployment  clusters  knife  services  aws  ec2  ops  ironfan  demo 
january 2013
Antigua Government Set to Launch “Pirate” Website To Punish United States
oh the lulz.
The Government of Antigua is planning to launch a website selling movies, music and software, without paying U.S. copyright holders. The Caribbean island is taking the unprecedented step because the United States refuses to lift a trade “blockade” preventing the island from offering Internet gambling services, despite several WTO decisions in Antigua’s favor. The country now hopes to recoup some of the lost income through a WTO approved “warez” site.
us-politics  antigua  piracy  filesharing  pirate  gambling  wto  ip  blockades 
january 2013
Basho | Alert Logic Relies on Riak to Support Rapid Growth
'The new [Riak-based] analytics infrastructure performs statistical and correlation processing on all data [...] approximately 5 TB/day. All of this data is processed in real-time as it streams in. [...] Alert Logic’s analytics infrastructure, powered by Riak, achieves performance results of up to 35k operations/second across each node in the cluster – performance that eclipses the existing MySQL deployment by a large margin on single node performance. In real business terms, the initial deployment of the combination of Riak and the analytic infrastructure has allowed Alert Logic to process in real-time 7,500 reports, which previously took 12 hours of dedicated processing every night.'

Twitter discussion here: https://twitter.com/fisherpk/status/294984960849367040 , which notes 'heavily cached SAN storage, 12 core blades and 90% get to put ops', and '3 riak nodes, 12-cores, 30k get heavy riak ops/sec. 8 nodes driving ops to that cluster'. Apparently the use of SAN storage on all nodes is historic, but certainly seems to have produced good iops numbers as an (expensive) side-effect...
iops  riak  basho  ops  systems  alert-logic  storage  nosql  databases 
january 2013
All polar bears descended from one Irish grizzly
'THE ARCTIC'S DWINDLING POPULATION of polar bears all descend from a single mamma brown bear which lived 20,000 to 50,000 years ago in present-day Ireland, new research suggests. DNA samples from the great white carnivores - taken from across their entire range in Russia, Canada, Greenland, Norway and Alaska - revealed that every individual's lineage could be traced back to this Irish forebear.' More than the average bear, I guess
animals  biology  science  dna  history  ireland  bears  polar-bears  grizzly-bears  via:ben 
january 2013
Namazu-e: Earthquake catfish prints
'In November 1855, the Great Ansei Earthquake struck the city of Edo (now Tokyo), claiming 7,000 lives and inflicting widespread damage. Within days, a new type of color woodblock print known as namazu-e (lit. "catfish pictures") became popular among the residents of the shaken city. These prints featured depictions of mythical giant catfish (namazu) who, according to popular legend, caused earthquakes by thrashing about in their underground lairs. In addition to providing humor and social commentary, many prints claimed to offer protection from future earthquakes.'
japan  art  namazu-e  ukiyo-e  catfish  earthquakes  myth 
january 2013
50 Watts
Incredible blog of book covers and illustrations, much from the 1970s
illustration  art  prints  1970s  graphics 
january 2013
Network graph viz of Irish politicians and organisations on Twitter
generated by the Clique Research Cluster at UCD and DERI. 'a visualization of the unified graph representation for the users in the data, produced using Gephi and sigma.js. Users are coloured according to their community (i.e. political affiliation). The size of each node is proportional to its in-degree (i.e. number of incoming links).' sigma.js provides a really user-friendly UI to the graphs, although -- as with most current graph visualisations -- it'd be particularly nice if it was possible to 'tease out' and focus on interesting nodes, and get a pasteable URL of the result, in context. Still, the most usable graph viz I've seen in a while...
graphs  dataviz  ucd  research  ireland  twitter  networks  community  sigma.js  javascript  canvas  gephi 
january 2013
Big Data Lambda Architecture
An article by Nathan "Storm" Marz describing the system architecture he's been talking about for a while; Hadoop-driven batch view, Storm-driven "speed view", and a merging API
storm  systems  architecture  lambda-architecture  design  Hadoop 
january 2013
Ivan Beshoff, Last Survivor Of Mutiny on the Potemkin, founded Beshoffs
wow. there's a factoid! the "Beshoffs" chain of chippers in Dublin were founded by this historic figure, who died in 1987
factoids  beshoffs  chips  dublin  history  small-world  battleship-potemkin  russia 
january 2013
'The Unified Logging Infrastructure for Data Analytics at Twitter' [PDF]
A picture of how Twitter standardized their internal service event logging formats to allow batch analysis and analytics. They surface service metrics to dashboards from Pig jobs on a daily basis, which frankly doesn't sound too great...
twitter  analytics  event-logging  events  logging  metrics 
january 2013
runit
'a UNIX init scheme with service supervision' - philosophically similar to daemontools, widely packaged, LSB init.d-script-compliant, BSD-licensed
daemon  supervision  services  unix  lsb  server  ops 
january 2013
Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm in Storm
Storm demo with a reasonably complex topology.
'how to implement a distributed, real-time trending topics algorithm in Storm. It uses the latest features available in Storm 0.8 (namely tick tuples) and should be a good starting point for anyone trying to implement such an algorithm for their own application. The new code is now available in the official storm-starter repository, so feel free to take a deeper look.'
storm  distcomp  distributed  tick-tuples  demo 
january 2013
fail0verflow ::
Excellent demo of how use of a block cipher with a known secret key makes an insecure MAC. "In short, CBC-MAC is a Message Authentication Code, not a strong hash function. While MACs can be built out of hash functions (e.g. HMAC), and hash functions can be built out of block ciphers like AES, not all MACs are also hash functions. CBC-MAC in particular is completely unsuitable for use as a hash function, because it only allows two parties with knowledge of a particular secret key to securely transmit messages between each other. Anyone with knowledge of that key can forge the messages in a way that keeps the MAC (“hash value”) the same. All you have to do is run the forged message through CBC-MAC as usual, then use the AES decryption operation on the original hash value to find the last intermediate state. XORing this state with the CBC-MAC for the forged message yields a new block of data which, when appended to the forged message, will cause it to have the original hash value. Because the input is taken backwards, you can either modify the first block of the file, or just run the hash function backwards until you reach the block that you want to modify. You can make a forged file pass the hash check as long as you can modify an arbitrary aligned 16-byte block in it."
crypto  hashing  security  cbc  mac  sha1  aes 
january 2013
Scala 2.8 Collections API -- Performance Characteristics
wow. Every library vending a set of collection types should have a page like this
collections  scala  performance  reference  complexity  big-o  coding 
january 2013
Irish EU Council Presidency proposes destruction of right to privacy | EDRI
'For example, based on the current situation in Ireland, the idea is that all companies can do whatever they want with personal data, without fear of sanction. Sanctions, such as fines, “should be optional or at least conditional upon a prior warning or reprimand”. In other words, do what you want, the worst that can happen is that you will receive a warning.' Shame! Daragh O'Brien's comment: 'utter idiocy'. ( at https://twitter.com/daraghobrien/status/292041500873850880 )
privacy  ireland  eu  fail  data-protection  data-privacy  politics 
january 2013
Fast Packed String Matching for Short Patterns [paper, PDF]
'Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other
fields, like NLP, information retrieval and computational biology. In the last two decades a general trend has appeared
trying to exploit the power of the word RAM model to speed-up the
performances of classical string matching algorithms. [...]
In this paper we use specialized word-size packed string matching instructions, based on the Intel streaming SIMD extensions (SSE) technology, to design very fast string matching algorithms in the case of short patterns.' Reminds me of http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm , but taking advantage of SIMD extensions, which should make things nice and speedy, at the cost of tying it to specific hardware platforms. (via Tony Finch)
rabin-karp  algorithms  strings  string-matching  papers  via:fanf 
january 2013
CES: Worse Products Through Software
'The companies out there that know how to make decent software have been steadily eating their way into and through markets previously dominated by the hardware guys. Apple with music players, TiVo with video recording, even Microsoft with its decade-old Xbox Live service, which continues to embarrass the far weaker offerings from Sony and Nintendo. (And, yes, iOS is embarrassing all three console makers.)'

See also Mat Honan's article at http://www.wired.com/gadgetlab/2012/12/internet-tv-sucks/ : 'Smart TVs are just too complicated. They have terrible user interfaces that differ wildly from device to device. It’s not always clear what content is even available — for example, after more than two years on the market, you still can’t watch Hulu Plus on your Google TV. [...] They give us too many options for apps most people will never use, and they do so at the expense of making it simple to find the shows and movies we want to watch, no matter where they are, be it online or on the air. As NPD puts it in the conclusion to its report, “OEMs and retailers need to focus less on new innovation in this space and more on simplification of the user experience and messaging if they want to drive additional, and new, behaviors on the TV.” Which is a more polite way of saying, clean up your horrible interface, Samsung.'

(via Craig)
via:craig  design  ui  tv  hardware  television  sony  ces  software 
january 2013
Reddit’s ranking algorithms
so Reddit uses the Wilson score confidence interval approach, it turns out; more details here (via Toby diPasquale)
ranking  rating  algorithms  popularity  python  wilson-score-interval  sorting  statistics  confidence-sort 
january 2013
Belgium plans artificial island to store wind power
' Belgium is planning to build a doughnut-shaped island in the North Sea that will store wind energy by pumping water out of a hollow in the middle, as it looks for ways to lessen its reliance on nuclear power. One of the biggest problems with electricity is that it is difficult to store and the issue is exaggerated in the case of renewable energy from wind or sun because it is intermittent depending on the weather.' 'The island is still in the planning stages, but will be built out of sand 3 km off the Belgian coast near the town of Wenduine if it gets the final go-ahead. The island, which would also work as an offshore substation to transform the voltage of the electricity generated by wind turbines, could take five or more years to plan and build.'
power  via:daev  belgium  wind-power  hydro  sea  islands  manmade  storage 
january 2013
aaw/hyperloglog-redis - GitHub
'This gem is a pure Ruby implementation of the HyperLogLog algorithm for estimating cardinalities of sets observed via a stream of events. A Redis instance is used for storing the counters.'
cardinality  sets  redis  algorithms  ruby  gems  hyperloglog 
january 2013
Leopold’s Day Map
'Bloomsday Map Of Dublin Based On Ulysses'. Beautiful! 'The Leopold’s Day map is a stunning marriage of typography and cartography plotting all the streets alluded to by Joyce in Ulysses which were in existence on June 16th 1904. It is accompanied by a comprehensive and beautifully typeset directory with over 400 entries noting the landmarks, business and people of Dublin that were referenced in the text. The Leopold’s Day map is an exquisitely detailed, limited edition piece. It has an impressive dimension of 1000mm x 700mm which means it can also fit into a ready made frame. Price: €125.00'
bloomsday  ulysses  dublin  ireland  maps  james-joyce  art  prints 
january 2013
Extreme Performance with Java - Charlie Hunt [slides, PDF]
presentation slides for Charlie Hunt's 2012 QCon presentation, where he discusses 'what you need to know about a modern JVM in order
to be effective at writing a low latency Java application'. The talk video is at http://www.infoq.com/presentations/Extreme-Performance-Java
low-latency  charlie-hunt  performance  java  jvm  presentations  qcon  slides  pdf 
january 2013
The Neurocritic: Fisher-Price Synesthesia
'Synesthesia [jm: sic] is a rare perceptual phenomenon in which the stimulation of one sensory modality, or exposure to one type of stimulus, leads to a sensory (or cognitive) experience in a different, non-stimulated modality. For instance, some synesthetes have colored hearing while others might taste shapes. GRAPHEME-COLOR SYNESTHESIA is the condition in which individual printed letters are perceived in a specific, constant color. This occurs involuntarily and in the absence of colored font. [...] A new study has identified 11 synesthetes whose grapheme-color mappings appear to be based on the Fisher Price plastic letter set made between 1972-1990.'

(via Dave Green)
fisher-price  synesthesia  synaesthesia  colors  colours  sight  neuroscience  brain  via-dave-green  toys 
january 2013
Notes on Distributed Systems for Young Bloods
'Below is a list of some lessons I’ve learned as a distributed systems engineer that are worth being told to a new engineer. Some are subtle, and some are surprising, but none are controversial. This list is for the new distributed systems engineer to guide their thinking about the field they are taking on. It’s not comprehensive, but it’s a good beginning.' This is a pretty nice list, a little over-stated, but that's the format. I particularly like the following: 'Exploit data-locality'; 'Learn to estimate your capacity'; 'Metrics are the only way to get your job done'; 'Use percentiles, not averages'; 'Extract services'.
systems  distributed  distcomp  cap  metrics  coding  guidelines  architecture  backpressure  design  twitter 
january 2013
Effective Scala
Twitter's Scala style guide. 'While highly effective, Scala is also a large language, and our experiences have taught us to practice great care in its application. What are its pitfalls? Which features do we embrace, which do we eschew? When do we employ “purely functional style”, and when do we avoid it? In other words: what have we found to be an effective use of the language? This guide attempts to distill our experience into short essays, providing a set of best practices. Our use of Scala is mainly for creating high volume services that form distributed systems — and our advice is thus biased — but most of the advice herein should translate naturally to other domains.'
twitter  scala  coding  style 
january 2013
OmniTI's Experiences Adopting Chef
A good, in-depth writeup of OmniTI's best practices with respect to build-out of multiple customer deployments, using multi-tenant Chef from a version-controlled repo. Good suggestions, and I am really looking forward to this bit:

'Chef tries to turn your system configuration into code. That means you now inherit all the woes of software engineering: making changes in a coordinated manner and ensuring that changes integrate well are now an even greater concern. In part three of this series, we’ll look at applying software quality assurance and release management practices to Chef cookbooks and roles.'
chef  deployment  ops  omniti  systems  vagrant  automation 
january 2013
Tunlr
'uses DNS witchcraft to allow you to access US/UK-only audio and video services like Hulu.com, BBC iPlayer, etc. without using a VPN or Web proxy.' According to http://superuser.com/questions/461316/how-does-tunlr-work , it proxies the initial connection setup and geo-auth, then mangles the stream address to stream directly, not via proxy. Sounds pretty useful
proxy  network  vpn  dns  tunnel  content  video  audio  iplayer  bbc  hulu  streaming  geo-restriction 
january 2013
Dan McKinley :: Whom the Gods Would Destroy, They First Give Real-time Analytics
'It's important to divorce the concepts of operational metrics and product analytics. [..] Funny business with timeframes can coerce most A/B tests into statistical significance.' 'The truth is that there are very few product decisions that can be made in real time.'

HN discussion: http://news.ycombinator.com/item?id=5032588
real-time  analytics  statistics  a-b-testing 
january 2013
The Justin Masonic Lodge
whoa. (via Dave O'Riordan)
wtf  masons  names  me  texas 
january 2013
What happened to KHTML after Apple announced Safari
'There was a huge amount of excitement at the announcement that Safari would be using KHTML. At that time, it was almost a given that the OSS rendering engine was Gecko. KHTML was KDE's little engine that could. But nobody ever expected it to be picked up by other folks. One of the original parts of the KHTML-to-OS X port was KWQ (pronounced, "quack") that abstracted out the KDE API portions that were used in KHTML.
Folks were pretty ecstatic at first. It seemed very validating.
But that changed quickly. As Zack's post indicates, WebKit became a thing of unmergable code-drops. Even inside of the KDE community there became a split between the KHTML purists and the WebKit faction. They'd previously more or less all been KHTML developers, but post-WebKit there was something of a pragmatists vs. idealists split. Zack fell on the latter side of that (for understandable reasons: there was an existing community project, with its own set of values, and that was hijacked to a large extent by WebKit).
A few years later WebKit transformed itself into a more or less valid open source project (see webkit.org), but that didn't close the rift in the KDE community between the two, at that point rather divergent, rendering engines. There's still some remaining melancholy that stems from that initial hope and what could have potentially been, but wasn't.'
history  safari  open-source  code-drops  over-the-wall  webkit  khtml  kde  oss  apple 
january 2013
paperplanes. The Virtues of Monitoring, Redux
A rather vague and touchy-feely "state of the union" post on monitoring. Good set of links at the end, though; I like the look of Sensu and Tasseo, but am still unconvinced about the value of Boundary's offering
monitoring  metrics  ops 
january 2013
check_graphite
'a Nagios plugin to poll Graphite'. Necessary, since service metrics are the true source of service health information
nagios  graphite  service-metrics  ops 
january 2013
Pushover: Simple Mobile Notifications for Android and iOS
'Pushover makes it easy to send real-time notifications to your Android and iOS devices.' extremely simple HTTPS API; 'Pushover has no monthly subscription fees and users will always be able to receive unlimited messages for free. Most applications can send messages for free, subject to monthly limits.' Also supported by ifttt.com
ios  android  iphone  push  messaging 
january 2013
Greyhound agrees to change consumer contracts and make refunds - National Consumer Agency
Take note, switchers:

'The National Consumer Agency (NCA) has received a commitment from Greyhound that it will amend certain terms in its standard consumer contract, which the NCA thinks are unfair to consumers. This will be done by January 18 2013.

Among the terms considered unfair by the NCA are that consumers must forfeit their credit balance and pay a €45 administration fee, if they cancel their contract with Greyhound within 12 months. If you were charged money in these circumstances, Greyhound has agreed to refund you.

Greyhound will communicate these changes to all of its consumers by 18 January 2013. If you have any questions about the changes or getting a refund, you should contact Greyhound directly.'
greyhound  consumer  ireland  dublin  rubbish 
january 2013
Surprisingly Good Evidence That Real Name Policies Fail To Improve Comments
'Enough theorizing, there’s actually good evidence to inform the debate. For 4 years, Koreans enacted increasingly stiff real-name commenting laws, first for political websites in 2003, then for all websites receiving more than 300,000 viewers in 2007, and was finally tightened to 100,000 viewers a year later after online slander was cited in the suicide of a national figure. The policy, however, was ditched shortly after a Korean Communications Commission study found that it only decreased malicious comments by 0.9%. Korean sites were also inundated by hackers, presumably after valuable identities.

Further analysis by Carnegie Mellon’s Daegon Cho and Alessandro Acquisti, found that the policy actually increased the frequency of expletives in comments for some user demographics. While the policy reduced swearing and “anti-normative” behavior at the aggregate level by as much as 30%, individual users were not dismayed. “Light users”, who posted 1 or 2 comments, were most affected by the law, but “heavy” ones (11-16+ comments) didn’t seem to mind.

Given that the Commission estimates that only 13% of comments are malicious, a mere 30% reduction only seems to clean up the muddied waters of comment systems a depressingly negligent amount.

The finding isn’t surprising: social science researchers have long known that participants eventually begin to ignore cameras video taping their behavior. In other words, the presence of some phantom judgmental audience doesn’t seem to make us better versions of ourselves.'

(via Ronan Lyons)
anonymity  identity  policy  comments  privacy  politics  new-media  via:ronanlyons 
january 2013
Requests: HTTP for Humans
'an elegant and simple HTTP library for Python, built for human beings.' 'Requests is an Apache2 Licensed HTTP library, written in Python, for human beings. Python’s standard urllib2 module provides most of the HTTP capabilities you need, but the API is thoroughly broken. It was built for a different time — and a different web. It requires an enormous amount of work (even method overrides) to perform the simplest of tasks. Requests takes all of the work out of Python HTTP/1.1 — making your integration with web services seamless. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic, powered by urllib3, which is embedded within Requests.'
python  http  urllib  libraries  requests  via:mikeste 
january 2013
A Non-Blocking HashTable by Dr. Cliff Click : programming
Proggit discovers the NonBlockingHashMap. This comment from Boundary's cscotta is particularly interesting: "The code is intricate and curiously-formatted, but NBHM is quite excellent. The majority of our analytics platform is backed by NBHMs updated rapidly in parallel. Cliff's a great, friendly, approachable guy; if you have any specific questions about the approaches or implementation, he may be happy to answer."
data-structures  algorithms  non-blocking  concurrency  threading  multicore  cliff-click  azul  maps  java  boundary 
january 2013
Efficient In-Memory Indexing with Generalized Prefix Trees [PDF]
'Efficient data structures for in-memory indexing gain in importance due to
(1) the exponentially increasing amount of data, (2) the growing main-memory capacity, and (3) the gap between main-memory and CPU speed. In consequence, there are
high performance demands for in-memory data structures. Such index structures are
used -- with minor changes -- as primary or secondary indices in almost every DBMS.
Typically, tree-based or hash-based structures are used, while structures based on
prefix-trees (tries) are neglected in this context. For tree-based and hash-based structures, the major disadvantages are inherently caused by the need for reorganization
and key comparisons. In contrast, the major disadvantage of trie-based structures in
terms of high memory consumption (created and accessed nodes) could be improved.
In this paper, we argue for reconsidering prefix trees as in-memory index structures
and we present the generalized trie, which is a prefix tree with variable prefix length
for indexing arbitrary data types of fixed or variable length. The variable prefix length
enables the adjustment of the trie height and its memory consumption. Further, we
introduce concepts for reducing the number of created and accessed trie levels. This
trie is order-preserving and has deterministic trie paths for keys, and hence, it does
not require any dynamic reorganization or key comparisons. Finally, the generalized
trie yields improvements compared to existing in-memory index structures, especially
for skewed data. In conclusion, the generalized trie is applicable as general-purpose
in-memory index structure in many different OLTP or hybrid (OLTP and OLAP) data
management systems that require balanced read/write performance.' (via Tony Finch)
via:fanf  prefix-trees  tries  data-structures 
january 2013
The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases [PDF]
'Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data structures like balanced binary search trees are not efficient on modern hardware, because they do not optimally utilize on-CPU caches. Hash tables, also often used for main-memory indexes, are fast but only support point queries. To overcome these shortcomings, we present ART, an adaptive radix tree (trie) for efficient indexing in main memory. Its lookup performance surpasses highly tuned, read-only search trees, while supporting very efficient insertions and deletions as well. At the same time, ART is very space efficient and solves the problem of excessive worst-case space consumption, which plagues most radix trees, by adaptively choosing compact and efficient data structures for internal nodes. Even though ART’s performance is comparable to hash tables, it maintains the data in sorted order, which enables additional operations like range scan and prefix lookup.' (via Tony Finch)
via:fanf  data-structures  trees  indexing  cache-aware  tries 
january 2013
HAT-trie: A Cache-conscious Trie-based Data Structure for Strings [PDF]
'Tries are the fastest tree-based data structures for managing strings in-memory, but are space-intensive. The burst-trie is almost as fast but reduces space by collapsing trie-chains into buckets. This is not however, a cache-conscious approach and can lead to poor performance on current processors. In this paper, we introduce the HAT-trie, a cache-conscious trie-based data structure that is formed by carefully combining existing components. We evaluate performance using several real-world datasets and against other highperformance data structures. We show strong improvements in both time and space; in most cases approaching that of the cache-conscious hash table. Our HAT-trie is shown to be the most efficient trie-based data structure for managing variable-length strings in-memory while maintaining sort order.' (via Tony Finch)
via:fanf  data-structures  tries  cache-aware  trees 
january 2013
"Matters Computational - Ideas, Algorithms, Source Code"
A hefty tome (in PDF format) containing lots of interesting algorithms and computational tricks; code is GPLv3 licensed
coding  algorithms  computation  via:cliffc  pdf  books 
january 2013
airlift/airline · GitHub
Annotations-based git-like CLI helper for Java
git  cli  java 
january 2013
« earlier      later »
abuse ads ai algorithms amazon analytics android anti-spam apache apple apps architecture art automation aws banking big-data bitcoin books bugs build business cars cassandra censorship children china cli coding compression concurrency containers copyright crime crypto culture cycling data data-protection data-structures databases dataviz debugging deployment design devops distcomp distributed dns docker driving dublin ec2 email eu europe exploits facebook fail false-positives fault-tolerance filesharing filtering food fraud funny future games gaming gc gchq git github go google government graphics hacking hacks hadoop hardware hashing health history home http https images internet ios ip iphone ireland isps java javascript journalism jvm kafka kids lambda languages latency law legal libraries life linux load-balancing logging machine-learning malware mapping maps medicine memory metrics microsoft ml mobile money monitoring movies mp3 music mysql netflix network networking news nosql nsa open-source ops optimization outages packaging papers patents pdf performance phones photos piracy politics presentations privacy programming protocols python recipes redis reliability replication research ruby russia s3 safety scala scalability scaling scams science search security shopping silicon-valley slides snooping social-media society software space spam sql ssl startups statistics storage streaming surveillance swpats sysadmin tcp tech technology testing time tips tls tools travel tuning tv twitter ui uk unix us-politics via:fanf via:nelson video web wifi work youtube

Copy this bookmark:



description:


tags: