jm + pdf   31

Engineer’s Guide to Drinks
excellent blueprint-style poster covering all the major cocktails
cocktails  drinks  engineering  posters  blueprints  graphics  pdf 
january 2016 by jm
"Hidden Technical Debt in Machine-Learning Systems" [pdf]
Another great paper about from Google, talking about the tradeoffs that must be considered in practice over the long term with running a complex ML system in production.
technical-debt  ml  machine-learning  ops  software  production  papers  pdf  google 
december 2015 by jm
Geographically-accurate version of the London underground map
as Boing Boing says: 'London's subway system switched early to an abstract map (PDF), and it became a legendary work of design. It just published an internally-used geographic version of map (PDF), however, for the first time in a century—and it's awesome.'
london  maps  mapping  geography  accuracy  pdf  subway  underground 
september 2015 by jm
AWS Key Management Service Cryptographic Details
"AWS Key Management Service (AWS KMS) provides cryptographic keys and operations scaled for the cloud. AWS KMS keys and functionality are used by other AWS cloud services, and you can use them to protect user data in your applications that use AWS. This white paper provides details on the cryptographic operations that are executed within AWS when you use AWS KMS."
white-papers  aws  amazon  kms  key-management  crypto  pdf 
december 2014 by jm
Report of the Internet Content Governance Advisory Group
looking at the summary, looks broadly sensible; no government-mandated filtering/blocking I can spot quickly
internet  filtering  safety  kids  porn  blocking  ireland  pegi  ratings  reports  pdf 
june 2014 by jm
"Understanding the Robustness of SSDs under Power Fault", FAST '13 [paper]
Horrific. SSDs (including "enterprise-class storage") storing sync'd writes in volatile RAM while claiming they were synced; one device losing 72.6GB, 30% of its data, after 8 injected power faults; and all SSDs tested displayed serious errors including random bit errors, metadata corruption, serialization errors and shorn writes. Don't trust lone unreplicated, unbacked-up SSDs!
pdf  papers  ssd  storage  reliability  safety  hardware  ops  usenix  serialization  shorn-writes  bit-errors  corruption  fsync 
january 2014 by jm
"What Should I Monitor?"
slides (lots of slides) from Baron Schwartz' talk at Velocity in NYC.
slides  monitoring  metrics  ops  devops  baron-schwartz  pdf  capacity 
october 2013 by jm
"Scalable Eventually Consistent Counters over Unreliable Networks" [paper, pdf]

Counters are an important abstraction in distributed computing, and
play a central role in large scale geo-replicated systems, counting events such as web page impressions or social network "likes". Classic distributed counters, strongly consistent, cannot be made both available and partition-tolerant, due to the CAP Theorem, being unsuitable to large scale scenarios.

This paper defi nes Eventually Consistent Distributed Counters (ECDC) and presents an implementation of the concept, Hando ff Counters, that is scalable and works over unreliable networks. By giving up the sequencer aspect of classic distributed counters, ECDC implementations can be made AP in the CAP design space, while retaining the essence of counting. Handoff Counters are the first CRDT (Conflict-free Replicated Data Type) based mechanism that overcomes the identity explosion problem in naive CRDTs, such as G-Counters (where state size is linear in the number of independent actors that ever incremented the counter), by managing identities towards avoiding global propagation, and garbage collecting temporary entries. The approach used in Hando ff Counters is not restricted to counters, being more generally applicable to other data types with associative and commutative operations.
pdf  papers  eventual-consistency  counters  distributed-systems  distcomp  cap-theorem  ecdc  handoff-counters  crdts  data-structures  g-counters 
august 2013 by jm
Clean Code Cheat Sheet [pdf]
'principles, patterns, smells and guidelines for clean code, class and package design, TDD, Acceptance Test Driven Development, and CI'
clean-code  code-smells  coding  tdd  testing  continous-integration  patterns  pdf 
july 2013 by jm
Stability Patterns and Antipatterns [slides]
Michael "Release It!" Nygard's slides from a recent O'Reilly event, discussing large-scale service reliability design patterns
michael-nygard  design-patterns  architecture  systems  networking  reliability  soa  slides  pdf 
may 2013 by jm
Berkeley DB Java Edition Architecture [PDF]
background white paper on the BDB-JE innards and design, from 2006. Still pretty accurate and good info
bdb-je  java  berkeley-db  bdb  design  databases  pdf  white-papers  trees 
may 2013 by jm
Romania believes rival nation behind MiniDuke cyber attack | Reuters
"It is a cyber attack ... pursued by an entity that has the characteristics of a state actor," [Romanian secret service] SRI spokesman Sorin Sava told Reuters [...]. "Our estimations show the attack is certainly relevant to Romania's national security taking into account the profile of the compromised entities." [...]

In this case, computer experts say an attacker from the former Soviet Union could be more likely. "MiniDuke" in some ways resembles a banking fraud Trojan dubbed "TinBa" believed to have been created by Russian criminal hackers.
ireland  malware  attacks  pdf  security  espionage  romania  miniduke 
march 2013 by jm
The MiniDuke Mystery: PDF 0-day Government Spy Assembler 0x29A Micro Backdoor - Securelist
By analysing the logs from the command servers, we have observed 59 unique victims in 23 countries: Belgium, Brazil, Bulgaria, Czech Republic, Georgia, Germany, Hungary, Ireland, Israel, Japan, Latvia, Lebanon, Lithuania, Montenegro, Portugal, Romania, Russian Federation, Slovenia, Spain, Turkey, Ukraine, United Kingdom and United States.
miniduke  pdf  malware  attacks  ireland  espionage 
march 2013 by jm
Irish government attacked using 'MiniDuke' PDF malware
although I haven't seen a word of it in the Irish media yet -- wonder if the government have noticed?
Cyber criminals have targeted government officials in more than 20 countries, including Ireland and Romania, in a complex online assault seen rarely since the turn of the millennium. The attack, dubbed "MiniDuke" by researchers, has infected government computers as recently as this week in an attempt to steal geopolitical intelligence, according to security experts.
ireland  malware  attacks  pdf  security  espionage  romania  miniduke 
march 2013 by jm
'Efficient Computation of Frequent and Top-k Elements in Data Streams' [paper, PDF]
The Space-Saving algorithm to compute top-k in a stream. I've been asking a variation of this problem as an interview question for a while now, pretty cool to find such a neat solution. Pity neither myself nor anyone I've interviewed has come up with it ;)
space-saving  approximation  streams  stream-processing  cep  papers  pdf  algorithms 
february 2013 by jm
Goonwaffe Stories: A Guide For Newbies [PDF]
impressively high-quality newbie's guide from the Goonswarm Federation -- as describes it, 'frankly a work of art: a 1950's Pulp Scifi magazine full of internet spaceships and sociopathy.'
eve-online  space  goonswarm  gaming  mmo  pdf  pulp  science-fiction 
february 2013 by jm
Extreme Performance with Java - Charlie Hunt [slides, PDF]
presentation slides for Charlie Hunt's 2012 QCon presentation, where he discusses 'what you need to know about a modern JVM in order
to be effective at writing a low latency Java application'. The talk video is at
low-latency  charlie-hunt  performance  java  jvm  presentations  qcon  slides  pdf 
january 2013 by jm
"Matters Computational - Ideas, Algorithms, Source Code"
A hefty tome (in PDF format) containing lots of interesting algorithms and computational tricks; code is GPLv3 licensed
coding  algorithms  computation  via:cliffc  pdf  books 
january 2013 by jm
Spanner: Google's Globally-Distributed Database [PDF]

Abstract: Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: non-blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.

To appear in:
OSDI'12: Tenth Symposium on Operating System Design and Implementation, Hollywood, CA, October, 2012.
database  distributed  google  papers  toread  pdf  scalability  distcomp  transactions  cap  consistency 
september 2012 by jm
_Building High-level Features Using Large Scale Unsupervised Learning_ [paper, PDF]
"We consider the problem of building highlevel, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images using unlabeled images? To answer this, we train a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200x200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also find that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting with these learned features, we trained our network to obtain 15.8% accuracy in recognizing 20,000 object categories from ImageNet, a leap of 70% relative improvement over the previous state-of-the-art."
algorithms  machine-learning  neural-networks  sgd  labelling  training  unlabelled-learning  google  research  papers  pdf 
june 2012 by jm
_Intellectual property rights and innovation: Evidence from the human genome_ (PDF)
'Do intellectual property (IP) rights on existing technologies hinder subsequent
innovation? Using newly-collected data on the sequencing of the human genome by
the public Human Genome Project and the private rm Celera, this paper estimates
the impact of Celera's gene-level IP on subsequent scienti c research and product
development. Genes initially sequenced by Celera were held with IP for up to two
years, but moved into the public domain once re-sequenced by the public e ort.
Across a range of empirical speci cations, I nd evidence that Celera's IP led to
reductions in subsequent scienti c research and product development on the order of
20 to 30 percent. Taken together, these results suggest that Celera's short-term IP
had persistent negative e ects on subsequent innovation relative to a counterfactual
of Celera genes having always been in the public domain.' (via Tony Finch)
via:fanf  genetics  ip  copyright  open-source  celera  patents  papers  pdf 
february 2012 by jm
Michael "Liar's Poker" Lewis on Ireland's economic collapse
PDF of the 15-page Vanity Fair article -- from interviews I've read in advance, this seems pretty good
michael-lewis  vanity-fair  articles  pdf  toread  economy  ireland  disaster  collapse  from delicious
february 2011 by jm
document scanner app for the iPhone/Android smartphones; take a photo of a doc, it'll fix geometry, remove shadows, white balance and sharpen appropriately, generate PDFs and image files, and upload to Evernote for OCRing. EUR4.99 though
android  apps  evernote  iphone  mobile  ocr  pdf  document  scanner  scan  from delicious
july 2010 by jm
NoSQL at Twitter (NoSQL EU 2010) [PDF]
specifically, Hadoop and Pig for log/metrics analytics, Cassandra going forward; great preso, lots of detail and code examples. also, impressive number-crunching going on at Twitter
twitter  analytics  cassandra  databases  hadoop  pdf  logs  metrics  number-crunching  nosql  pig  presentation  slides  scribe  from delicious
april 2010 by jm
BlueRunner: Email in the Cloud with Cassandra [PDF]
interesting prez from some IBM researchers on using Cassandra as a mail store, via Jeremy
via:jzawodny  mail  cassandra  database  data  ibm  nosql  performance  presentation  pdf  from delicious
april 2010 by jm
Embeddable Google Document Viewer
'Google Docs offers an undocumented feature that lets you embed PDF files and PowerPoint presentations in a web page. The files don't have to be uploaded to Google Docs, but they need to be available online.' sweet!
google  google-docs  javascript  iframe  content  pdf  adobe  html  web  documentation  embedding  powerpoint  ppt  viewer  embed  embedded  from delicious
september 2009 by jm

related tags

accuracy  adobe  algorithms  amazon  analytics  android  approximation  apps  architecture  articles  attacks  aws  baron-schwartz  bdb  bdb-je  berkeley-db  best-practices  bit-errors  blocking  blueprints  book  books  cap  cap-theorem  capacity  cassandra  celera  cep  charlie-hunt  clean-code  cocktails  code-smells  coding  collapse  column-oriented  columnar-stores  computation  conferences  consistency  content  continous-integration  copyright  corruption  counters  crdts  crypto  data  data-structures  database  databases  ddos  design  design-patterns  devops  disaster  distcomp  distributed  distributed-systems  document  documentation  dos  download  drinks  ecdc  economy  embed  embedded  embedding  engineering  espionage  eve-online  eventual-consistency  evernote  filtering  flash  free  fsync  g-counters  gaming  genetics  geography  google  google-docs  goonswarm  graphics  hadoop  handoff-counters  hardware  html  ibm  iframe  internet  ip  iphone  ireland  java  javascript  jvm  key-management  kids  kms  labelling  library  logs  london  low-latency  machine-learning  mail  malware  mapping  maps  metrics  michael-lewis  michael-nygard  miniduke  ml  mmo  mobile  monitoring  networking  neural-networks  nokogiri  nosql  number-crunching  ocr  open-source  ops  papers  patents  patterns  pdf  pegi  performance  pig  porn  posters  powerpoint  ppt  presentation  presentations  production  pulp  qcon  ratings  reference  reliability  reports  research  romania  ruby  safety  scalability  scaling  scan  scanner  science-fiction  scraping  scribe  security  serialization  sgd  shorn-writes  slides  soa  software  space  space-saving  ssd  storage  stream-processing  streams  subway  systems  tdd  technical-debt  tesseract  testing  toread  training  transactions  trees  twitter  underground  unlabelled-learning  usenix  vanity-fair  velocity  via:cliffc  via:fanf  via:jzawodny  via:waxy  viewer  web  white-papers  whitepapers 

Copy this bookmark: