jm + coding   177

Collection Pipeline
a nice summarisation of the state of pipe/stream-oriented collection operations in various languages, from Martin Fowler
martin-fowler  patterns  coding  ruby  clojure  streams  pipelines  pipes  unix  lambda  fp  java  languages 
25 days ago by jm
Metrics-Driven Development
we believe MDD is equal parts engineering technique and cultural process. It separates the notion of monitoring from its traditional position of exclusivity as an operations thing and places it more appropriately next to its peers as an engineering process. Provided access to real-time production metrics relevant to them individually, both software engineers and operations engineers can validate hypotheses, assess problems, implement solutions, and improve future designs.


Broken down into the following principles: 'Instrumentation-as-Code', 'Single Source of Truth', 'Developers Curate Visualizations and Alerts', 'Alert on What You See', 'Show me the Graph', 'Don’t Measure Everything (YAGNI)'.

We do all of these at Swrve, naturally (a technique I happily stole from Amazon).
metrics  coding  graphite  mdd  instrumentation  yagni  alerting  monitoring  graphs 
4 weeks ago by jm
"Pitfalls of Object Oriented Programming", SCEE R&D
Good presentation discussing "data-oriented programming" -- the concept of optimizing memory access speed by laying out large data in a columnar format in RAM, rather than naively in the default layout that OOP design suggests
columnar  ram  memory  optimization  coding  c++  oop  data-oriented-programming  data  cache  performance 
6 weeks ago by jm
stout
a C++ library adding some modern language features like Option, Try, Stopwatch, and other Guava-ish things (via @cscotta)
c++  library  stout  option  try  guava  coding 
7 weeks ago by jm
ThreadSanitizer
Google's purify/valgrind-like concurrency checking tool:

'As a bonus, ThreadSanitizer finds some other types of bugs: thread leaks, deadlocks, incorrect uses of mutexes, malloc calls in signal handlers, and more. It also natively understands atomic operations and thus can find bugs in lock-free algorithms. [...] The tool is supported by both Clang and GCC compilers (only on Linux/Intel64). Using it is very simple: you just need to add a -fsanitize=thread flag during compilation and linking. For Go programs, you simply need to add a -race flag to the go tool (supported on Linux, Mac and Windows).'
concurrency  bugs  valgrind  threadsanitizer  threading  deadlocks  mutexes  locking  synchronization  coding  testing 
7 weeks ago by jm
How to make breaking changes and not break all the things
Well-written description of the "several backward-compatible changes" approach to breaking-change schema migration (via Marc)
databases  coding  compatibility  migration  schemas  sql  continuous-deployment 
8 weeks ago by jm
quotly/test/acceptance/adding_quotes_spec.rb at master · cavalle/quotly · GitHub
Decent demo of acceptance testing using rspec (and some syntactic sugar to make it read like Steak code, I think)
rspec  acceptance-testing  bdd  testing  ruby  coding 
8 weeks ago by jm
ScalaTest
Scala's BDD approach -- very similar to Steak in Rubyland I think
scala  testing  bdd  acceptance-testing  steak  coding  scalatest 
8 weeks ago by jm
cavalle/steak · GitHub
a minimal extension of RSpec-Rails that adds several conveniences to do acceptance testing of Rails applications using Capybara. It's an alternative to Cucumber in plain Ruby.


Good approach here to copy, but very tied to Rails.
rails  ruby  testing  acceptance-testing  steak  bdd  rspec  coding 
8 weeks ago by jm
PetRegistrationAndPurchase.cs
A good example of "raw" BDD, without using a framework like Cucumber, Steak etc.
bdd  testing  csharp  acceptance-tests  coding 
8 weeks ago by jm
Cap'n Proto, FlatBuffers, and SBE
a feature comparison of these new serialization formats from Kenton, the capnp dude
serialization  protobuf  capnproto  sbe  flatbuffers  google  coding  storage 
9 weeks ago by jm
#AltDevBlog » Parallel Implementations
John Carmack describes this code-evolution approach to adding new code:
The last two times I did this, I got the software rendering code running on the new platform first, so everything could be tested out at low frame rates, then implemented the hardware accelerated version in parallel, setting things up so you could instantly switch between the two at any time.  For a mobile OpenGL ES application being developed on a windows simulator, I opened a completely separate window for the accelerated view, letting me see it simultaneously with the original software implementation.  This was a very significant development win.

If the task you are working on can be expressed as a pure function that simply processes input parameters into a return structure, it is easy to switch it out for different implementations.  If it is a system that maintains internal state or has multiple entry points, you have to be a bit more careful about switching it in and out.  If it is a gnarly mess with lots of internal callouts to other systems to maintain parallel state changes, then you have some cleanup to do before trying a parallel implementation.

There are two general classes of parallel implementations I work with:  The reference implementation, which is much smaller and simpler, but will be maintained continuously, and the experimental implementation, where you expect one version to “win” and consign the other implementation to source control in a couple weeks after you have some confidence that it is both fully functional and a real improvement.

It is completely reasonable to violate some generally good coding rules while building an experimental implementation – copy, paste, and find-replace rename is actually a good way to start.  Code fearlessly on the copy, while the original remains fully functional and unmolested.  It is often tempting to shortcut this by passing in some kind of option flag to existing code, rather than enabling a full parallel implementation.  It is a  grey area, but I have been tending to find the extra path complexity with the flag approach often leads to messing up both versions as you work, and you usually compromise both implementations to some degree.


(via Marc)
via:marc  coding  john-carmack  parallel  development  evolution  lifecycle  project-management 
9 weeks ago by jm
"Taking the hotdog"
aka. lock acquisition. ex-Amazon-Dublin lingo, observed in the wild ;)
language  hotdog  archie-mcphee  amazon  dublin  intercom  coding  locks  synchronization 
12 weeks ago by jm
The programming error that cost Mt Gox 2609 bitcoins
Digging into broken Bitcoin scripts in the blockchain. Fascinating:
While analyzing coinbase transactions, I came across another interesting bug that lost bitcoins. Some transactions have the meaningless and unredeemable script:

OP_IFDUP
OP_IF
OP_2SWAP
OP_VERIFY
OP_2OVER
OP_DEPTH

That script turns out to be the ASCII text script. Instead of putting the redemption script into the transaction, the P2Pool miners accidentally put in the literal word "script". The associated bitcoins are lost forever due to this error.


(via Nelson)
programming  script  coding  bitcoin  mtgox  via:nelson  scripting  dsls 
12 weeks ago by jm
BPF - the forgotten bytecode
'In essence Tcpdump asks the kernel to execute a BPF program within the kernel context. This might sound risky, but actually isn't. Before executing the BPF bytecode kernel ensures that it's safe:

* All the jumps are only forward, which guarantees that there aren't any loops in the BPF program. Therefore it must terminate.
* All instructions, especially memory reads are valid and within range.
* The single BPF program has less than 4096 instructions.

All this guarantees that the BPF programs executed within kernel context will run fast and will never infinitely loop. That means the BPF programs are not Turing complete, but in practice they are expressive enough for the job and deal with packet filtering very well.'

Good example of a carefully-designed DSL allowing safe "programs" to be written and executed in a privileged context without security risk, or risk of running out of control.
coding  dsl  security  via:oisin  linux  tcpdump  bpf  bsd  kernel  turing-complete  configuration  languages 
may 2014 by jm
Learn.code.org
Teaches the basics of computer science - K-8 Intro to CS, 15-25 hours. Introduces core CS and programming concepts, with lots of nice graphics, scenarios and characters from games to get the kids hooked ;) Recommended by Tom Raftery; his youngest (7yo) is having great fun with it.
education  programming  learning  coding  kids  k-8  code.org  games 
may 2014 by jm
Exceptional Performance
Good benchmark data on the performance of JVM exceptions
java  jvm  exceptions  benchmarking  performance  optimization  coding 
may 2014 by jm
moto
Mock Boto: 'a library that allows your python tests to easily mock out the boto library.' Supports S3, Autoscaling, EC2, DynamoDB, ELB, Route53, SES, SQS, and STS currently, and even supports a standalone server mode, to act as a mock service for non-Python clients. Excellent!

(via Conor McDermottroe)
python  aws  testing  mocks  mocking  system-tests  unit-tests  coding  ec2  s3 
may 2014 by jm
"A New Data Structure For Cumulative Frequency Tables"
paper by Peter M Fenwick, 1993. 'A new method (the ‘binary indexed tree’) is presented for maintaining the cumulative frequencies which are needed to support dynamic arithmetic data compression. It is based on a decomposition of the cumulative frequencies into portions which parallel the binary representation of the index of the table element (or symbol). The operations to traverse the data structure are based on the binary coding of the index. In comparison with previous methods, the binary indexed tree is faster, using more compact data and simpler code. The access time for all operations is either constant or proportional to the logarithm of the table size. In conjunction with the compact data structure, this makes the new method particularly suitable for large symbol alphabets.'

via Jakob Buchgraber, who's implementing it right now in Netty ;)
netty  frequency-tables  data-structures  algorithms  coding  binary-tree  indexing  compression  symbol-alphabets 
may 2014 by jm
Oisin's mobile app release checklist
'This form is to document the testing that has been done on each app version before submitting to the App Store. For each item, indicate Yes if the testing has been done, Not Applicable if the testing does not apply (eg testing audio for an app that doesn’t play any), or No if the testing has not been done for another reason.'
apps  checklists  release  coding  ios  android  mobile  ohurley 
may 2014 by jm
'Pickles & Spores: Improving Support for Distributed Programming in Scala
'Spores are "small units of possibly mobile functional behavior". They're a closure-like abstraction meant for use in distributed or concurrent environments. Spores provide a guarantee that the environment is effectively immutable, and safe to ship over the wire. Spores aim to give library authors some confidence in exposing functions (or, rather, spores) in public APIs for safe consumption in a distributed or concurrent environment.

The first part of the talk covers a simpler variant of spores as they are proposed for inclusion in Scala 2.11. The second part of the talk briefly introduces a current research project ongoing at EPFL which leverages Scala's type system to provide type constraints that give authors finer-grained control over spore capturing semantics. What's more, these type constraints can be composed during spore composition, so library authors are effectively able to propagate expert knowledge via these composable constraints.

The last part of the talk briefly covers Scala/Pickling, a fast new, open serialization framework.'
pickling  scala  presentations  spores  closures  fp  immutability  coding  distributed  distcomp  serialization  formats  network 
april 2014 by jm
vim-flake8
vim-flake8 is a Vim plugin that runs the currently open file through Flake8, a static syntax and style checker for Python source code. It supersedes both vim-pyflakes and vim-pep8. Flake8 is a wrapper around PyFlakes (static syntax checker), PEP8 (style checker) and Ned's MacCabe script (complexity checker).


Recommended by several pythonistas of my acquaintance!
vim  python  syntax  error-checking  errors  flake8  editors  ides  coding 
april 2014 by jm
OpenSSL Valhalla Rampage
OpenBSD are going wild ripping out "arcane VMS hacks" in an attempt to render OpenSSL's source code comprehensible, and finding amazing horrors like this:

'Well, even if time() isn't random, your RSA private key is probably pretty random. Do not feed RSA private key information to the random subsystem as entropy. It might be fed to a pluggable random subsystem…. What were they thinking?!'
random  security  openssl  openbsd  coding  horror  rsa  private-keys  entropy 
april 2014 by jm
Beefcake
A sane Google Protocol Buffers library for Ruby. It's all about being Buf; ProtoBuf.
protobuf  google  protocol-buffers  ruby  coding  libraries  gems  open-source 
april 2014 by jm
Shuffle Sharding
Colm MacCarthaigh writes about a simple sharding/load-balancing algorithm which uses randomized instance selection and optional additional compartmentalization. See also: continuous hashing, and http://aphyr.com/posts/278-timelike-2-everything-fails-all-the-time
hashing  load-balancing  sharding  partitions  dist-sys  distcomp  architecture  coding 
april 2014 by jm
Why no SSL ? — Varnish version 4.0.0 documentation
Poul-Henning Kemp details why Varnish doesn't do SSL -- basically due to the quality and complexity of open-source SSL implementations:
There is no other way we can guarantee that secret krypto-bits do not leak anywhere they should not, than by fencing in the code that deals with them in a child process, so the bulk of varnish never gets anywhere near the certificates, not even during a core-dump.


Now looking pretty smart, post-Heartbleed.
ssl  tls  varnish  open-source  poul-henning-kemp  https  http  proxies  security  coding 
april 2014 by jm
Here’s Why You’re Not Hiring the Best and the Brightest
Jeff Atwood's persuasive argument that remote working needs to be the norm in tech work:
There’s an elephant in the room in the form of an implied clause: Always hire the best people… who are willing to live in San Francisco. Substitute Mountain View, New York, Boston, Chicago, or any other city. The problem is the same. We pay lip service to the idea of hiring the best people in the world — but in reality, we’re only hiring the best people who happen to be close by.
recruiting  remote  hiring  business  coding  work  remote-work  telecommuting  jobs  silicon-valley  jeff-atwood 
april 2014 by jm
Transitioning to Scala
Advice from a developer who helped rebuild Walmart.ca with Scala and Play


This is really good advice.
walmart  scala  java  languages  coding  relearning  play  akka 
april 2014 by jm
Efficient substring searching
This is a couple of years old, but I like this:
Turbo Boyer-Moore is disappointing, its name doesn’t do it justice. In academia constant overhead doesn’t matter, but here we see that it matters a lot in practice. Turbo Boyer-Moore’s inner loop is so complex that we think we’re better off using the original Boyer-Moore.


A good demo of how large values of O(n) can be slower than small values of O(mn).
algorithms  search  strings  coding  big-o  string-search  searching 
march 2014 by jm
rr
A cool-looking new debugging tool for C/C++ from Mozilla.
Many, many people have noticed that if we had a way to reliably record program execution and replay it later, with the ability to debug the replay, we could largely tame the nondeterminism problem. This would also allow us to deliberately introduce nondeterminism so tests can explore more of the possible execution space, without impacting debuggability. Many record and replay systems have been built in pursuit of this vision. (I built one myself.) For various reasons these systems have not seen wide adoption. So, a few years ago we at Mozilla started a project to create a new record-and-replay tool that would overcome the obstacles blocking adoption. We call this tool rr.


Low runtime overhead; easy deployability; targeted at 32-bit (?!) Linux; OSS. (via Bryan O'Sullivan)
via:bos  mozilla  debugging  coding  firefox  rr  record  replay  gdb  c++  linux 
march 2014 by jm
The Stony Brook Algorithm Repository
This WWW page is intended to serve as a comprehensive collection of algorithm implementations for over seventy of the most fundamental problems in combinatorial algorithms. The problem taxonomy, implementations, and supporting material are all drawn from my [ie. Steven Skiena's] book 'The Algorithm Design Manual'. Since the practical person is more often looking for a program than an algorithm, we provide pointers to solid implementations of useful algorithms, when they are available.
algorithms  reference  coding  steven-skiena  combinatorial  cs 
march 2014 by jm
Good explanation of exponential backoff
I've often had to explain this key feature verbosely, and it's hard to do without handwaving. Great to have a solid, well-explained URL to point to
exponential-backoff  backoff  retries  reliability  web-services  http  networking  internet  coding  design 
march 2014 by jm
IntelliJ IDEA 13.1 will support Chronon Debugger
This, IMO, would be a really good reason to upgrade to the payware version of IDEA - Chronon looks cool.
Chronon is a new revolutionary tool keeping track of running Java programs and recording their execution process for later analysis, which can be helpful when you need to thoroughly retrace your steps when dealing with complicated bugs.
chronon  debugging  java  intellij  idea  ides  coding  time-warp  time 
march 2014 by jm
ImperialViolet - Apple's SSL/TLS bug
as we all know by now, a misplaced "goto fail" caused a critical, huge security flaw in versions of IOS and OSX SSL, since late 2012.

Lessons:

1. unit test the failure cases, particularly for critical security code!
2. use braces.
3. dead-code analysis would have caught this.

I'm not buying the "goto considered harmful" line, though, since any kind of control flow structure would have had the same problem.
coding  apple  osx  ios  crypto  ssl  security  goto-fail  goto  fail  unit-testing  coding-standards 
february 2014 by jm
java - Why not use Double or Float to represent currency?
A good canonical URL for this piece of coding guidance.
For example, suppose you have $1.03 and you spend 42c. How much money do you have left?

System.out.println(1.03 - .42); => prints out 0.6100000000000001.
coding  tips  floating-point  float  java  money  currency  bugs 
february 2014 by jm
Girls and Software
a pretty thought-provoking article from Linux Journal on women in computing, and how we're doing it all wrong
feminism  community  programming  coding  women  computing  software  society  work  linux-journal  children  teaching 
february 2014 by jm
Git is not scalable with too many refs/*
Mailing list thread from 2011; git starts to keel over if you tag too much
git  tags  coding  version-control  bugs  scaling  refs 
february 2014 by jm
Hero Culture
Good description of the "hero coder" organisational antipattern.
Now imagine that most of the team is involved in fire-fighting. New recruits see the older recruits getting praised for their brave work in the line-of-fire and they want that kind of praise and reward too. Before long everyone is focused on putting out fires and it is no ones interest to step back and take on the risks that long-term DevOps-focused goals entail.
coding  ops  admin  hero-coder  hero-culture  firefighting  organisations  teams  culture 
january 2014 by jm
Coders performing code reviews of scientific projects: pilot study
'PLOS and Mozilla conducted a month-long pilot study in which professional developers
performed code reviews on software associated with papers published in PLOS
Computational Biology. While the developers felt the reviews were limited by (a) lack of
familiarity with the domain and (b) lack of two-way contact with authors, the scientists
appreciated the reviews, and both sides were enthusiastic about repeating the experiment. '

Actually sounds like it was more successful than this summary implies.
plos  mozilla  code-reviews  coding  science  computational-biology  biology  studies 
january 2014 by jm
Sux
Some basic succinct data structures. [...] The main highlights are:
a novel, broadword-based implementation of rank/select queries for up to 264 bits that is highly competitive with known 32-bit implementations on 64-bit architectures (additional space required is 25% for ranking and 12.5%-37.5% for selection);
several Java structures using the Elias–Fano representation of monotone sequences for storing pointers, variable-length bit arrays, etc.
Java code implementing minimal perfect hashing using around 2.68 bits per element (also using some broadword ideas);
a few Java implementations of monotone minimal perfect hashing.
Sux is free software distributed under the GNU Lesser General Public License.
sux  succinct  data-structures  bits  compression  space  coding 
january 2014 by jm
Branchless hex-to-decimal conversion hack
via @simonebordet, on the mechanical-sympathy list: ((c & 0x1F) + ((c >> 6) * 0x19) – 0x10)
hacks  one-liners  coding  performance  optimization  hex  conversion  numbers  ascii 
january 2014 by jm
Don’t get stuck
Good description of Etsy's take on continuous deployment, committing directly to trunk, hidden with feature-flags, from Rafe Colburn
continuous-deployment  coding  agile  deployment  devops  etsy  rafe-colburn 
january 2014 by jm
stereopsis : graphics : radix tricks
some nice super-optimized Radix Sort code which handles floating point values. See also http://codercorner.com/RadixSortRevisited.htm for more info on the histogramming/counter concept
sorting  programming  coding  algorithms  radix-sort  optimization  floating-point 
december 2013 by jm
On undoing, fixing, or removing commits in git
Choose-your-own-adventure style. "Oh dear. This is going to get complicated."

(via Tom)
via:tom  cyoa  git  fixing  revert  source-control  coding 
december 2013 by jm
Virtual Clock - Testing Patterns Encyclopedia
a nice pattern for unit tests which need deterministic time behaviour. Trying to think up a really nice API for this....
testing  unit-tests  time  virtual-clock  real-time  coding 
december 2013 by jm
[JavaSpecialists 215] - StampedLock Idioms
a demo of Doug Lea's latest concurrent data structure in Java 8
doug-lea  concurrency  coding  java-8  java  threads 
december 2013 by jm
HdrHistogram by giltene
A Histogram that supports recording and analyzing sampled data value counts across a configurable integer value range with configurable value precision within the range. Value precision is expressed as the number of significant digits in the value recording, and provides control over value quantization behavior across the value range and the subsequent value resolution at any given level.
hdr  histogram  data-structures  coding  gil-tene  sampling  measuring 
october 2013 by jm
Toyota's killer firmware: Bad design and its consequences
This is exactly what you do NOT want to read about embedded systems controlling acceleration in your car:

The Camry electronic throttle control system code was found to have 11,000 global variables. Barr described the code as “spaghetti.” Using the Cyclomatic Complexity metric, 67 functions were rated untestable (meaning they scored more than 50). The throttle angle function scored more than 100 (unmaintainable).
Toyota loosely followed the widely adopted MISRA-C coding rules but Barr’s group found 80,000 rule violations. Toyota's own internal standards make use of only 11 MISRA-C rules, and five of those were violated in the actual code. MISRA-C:1998, in effect when the code was originally written, has 93 required and 34 advisory rules. Toyota nailed six of them. Barr also discovered inadequate and untracked peer code reviews and the absence of any bug-tracking system at Toyota.


On top of this, there was no error-correcting RAM in use; stack-killing recursive code; a quoted 94% stack usage; risks of unintentional RTOS task shutdown; buffer overflows; unsafe casting; race conditions; unchecked error code return values; and a trivial watchdog timer check. Crappy, unsafe coding.
firmware  horror  embedded-systems  toyota  camry  safety  acceleration  misra-c  coding  code-verification  spaghetti-code  cyclomatic-complexity  realtime  rtos  c  code-reviews  bug-tracking  quality 
october 2013 by jm
How to lose $172,222 a second for 45 minutes
Major outage and $465m of trading loss, caused by staggeringly inept software management: 8 years of incremental bitrot, technical debt, and failure to have correct processes to engage an ops team in incident response. Hopefully this will serve as a lesson that software is more than just coding, at least to one industry
trading  programming  coding  software  inept  fail  bitrot  tech-debt  ops  incident-response 
october 2013 by jm
NCCA Junior Cycle - Programming and Coding Consultation Page
the National Council for Curriculum and Assessment are looking for feedback on adding programming to the junior cycle (ie., early secondary school) in Ireland. Add your EUR.02!
ireland  programming  coding  education  schools 
october 2013 by jm
Timecop
'A Ruby gem providing "time travel" and "time freezing" capabilities, making it dead simple to test time-dependent code. It provides a unified method to mock Time.now, Date.today, and DateTime.now in a single call.'

This is about the nicest mock-time library I've found so far. (via Ben)
time  ruby  testing  coding  unit-tests  mocking  timecop  via:ben 
october 2013 by jm
To my daughter's high school programming teacher
During the first semester of my daughter's junior/senior year, she took her first programming class. She knew I'd be thrilled, but she did it anyway.

When my daughter got home from the first day of the semester, I asked her about the class. "Well, I'm the only girl in class," she said. Fortunately, that didn't bother her, and she even liked joking around with the guys in class. My daughter said that you noticed and apologized to her because she was the only girl in class. And when the lessons started (Visual Basic? Seriously??), my daughter flew through the assigments. After she finished, she'd help classmates who were behind or struggling in class.

Over the next few weeks, things went downhill. While I was attending SC '12 in Salt Lake City last November, my daughter emailed to tell me that the boys in her class were harassing her. "They told me to get in the kitchen and make them sandwiches," she said. I was painfully reminded of the anonymous men boys who left comments on a Linux Pro Magazine blog post I wrote a few years ago, saying the exact same thing.


I am sick to death of this 'brogrammer' bullshit.
brogrammers  sexism  culture  tech  teaching  coding  software  education 
september 2013 by jm
Excellent Rob Pike quote about algorithmic complexity
'Fancy algorithms are slow when n is small, and n is usually small.' -- Rob Pike


Been there, bought the t-shirt ;)
rob-pike  quotes  algorithms  big-o  complexity  coding 
september 2013 by jm
A Case Against Cucumber
This is exactly my problem with Cucumber and similar BDD test frameworks.
When I write a Cucumber feature, I have to write the Gherkin that describes the acceptance criteria, and the Ruby code that implements the step definitions. Since the code to implement the step definitions is just normal RSpec (or whichever testing library you use), if someone else is writing the Gherkin, the amount of setup to create a working test should be about the same. So you’re only breaking even!

However, I don’t believe that it would really be breaking even. Cucumber adds another layer of indirection on top of your tests. When I’m trying to see why a specific scenario is failing, first I need to find the step that is failing. Since these steps are defined with regular expressions, I have to grep for the step definition.
ruby  testing  bdd  cucumber  rspec  coding 
september 2013 by jm
Non-blocking transactional atomicity
Peter Bailis with an interesting distributed-storage atomicity algorithm for performing multi-record transactional updates
algorithms  nbta  transactions  databases  storage  distcomp  distributed  atomic  coding  eventual-consistency  crdts 
september 2013 by jm
Recordinality
a new, and interesting, sketching algorithm, with a Java implementation:
Recordinality is unique in that it provides cardinality estimation like HLL, but also offers "distinct value sampling." This means that Recordinality can allow us to fetch a random sample of distinct elements in a stream, invariant to cardinality. Put more succinctly, given a stream of elements containing 1,000,000 occurrences of 'A' and one occurrence each of 'B' - 'Z', the probability of any letter appearing in our sample is equal. Moreover, we can also efficiently store the number of times elements in our distinct sample have been observed. This can help us to understand the distribution of occurrences of elements in our stream. With it, we can answer questions like "do the elements we've sampled present in a power law-like pattern, or is the distribution of occurrences relatively even across the set?"
sketching  coding  algorithms  recordinality  cardinality  estimation  hll  hashing  murmurhash  java 
august 2013 by jm
Applied Cryptography, Cryptography Engineering, and how they need to be updated
Whoa, I had no idea my knowledge of crypto was so out of date! For example:
ECC is going to replace RSA within the next 10 years. New systems probably shouldn’t use RSA at all.


This blogpost is full of similar useful guidelines and rules of thumb. Here's hoping I don't need to work on a low-level cryptosystem any time soon, as the risk of screwing it up is always high, but if I do this is a good reference for how it needs to be done nowadays.
thomas-ptacek  crypto  cryptography  coding  design  security  aes  cbc  ctr  ecb  hmac  side-channels  rsa  ecc 
july 2013 by jm
Flower Filter
'A simple time-decaying approximate membership filter' -- like a Bloom filter with time decay. See also http://eng.42go.com/flower-filter-an-update/ for some notes on the non-independence of survival probabilities, and how that imposes negligible differences in practice.
bloom-filter  algorithms  coding  probabilistic  approximate  time  decay 
july 2013 by jm
Clean Code Cheat Sheet [pdf]
'principles, patterns, smells and guidelines for clean code, class and package design, TDD, Acceptance Test Driven Development, and CI'
clean-code  code-smells  coding  tdd  testing  continous-integration  patterns  pdf 
july 2013 by jm
Sketch of the Day: K-Minimum Values
Another sketching algorithm -- this one supports set union and intersection operations more easily than HyperLogLog when there are more than 2 sets
algorithms  coding  space-saving  cardinality  streams  stream-processing  estimation  sets  sketching 
june 2013 by jm
Java Garbage Collection Distilled
Martin Thompson lays it out:
Serial, Parallel, Concurrent, CMS, G1, Young Gen, New Gen, Old Gen, Perm Gen, Eden, Tenured, Survivor Spaces, Safepoints, and the hundreds of JVM start-up flags. Does this all baffle you when trying to tune the garbage collector while trying to get the required throughput and latency from your Java application? If it does then don’t worry, you are not alone. Documentation describing garbage collection feels like man pages for an aircraft. Every knob and dial is detailed and explained but nowhere can you find a guide on how to fly. This article will attempt to explain the tradeoffs when choosing and tuning garbage collection algorithms for a particular workload.
gc  java  garbage-collection  coding  cms  g1  jvm  optimization 
june 2013 by jm
On Scala
great, comprehensive review of the language, its pros and misfeatures, from Bill de hOra
scala  languages  coding  fp  reviews 
june 2013 by jm
Big Memory, Part 4
good microbenchmarking of a bunch of Java collections; Trove, fastutil, PCJ, mahout-collections, hppc
java  collections  benchmarks  performance  speed  coding  data-structures  optimization 
june 2013 by jm
Vagrant and Chef to provision dev test environments
We have recently switched from a manually configured development environment to a nearly fully automated one using Vagrant, Chef, and a few other tools. With this transition, we’ve moved to an environment where data on the dev boxes is considered disposable and only what’s checked into the SCM is “real”. This is where we’ve always wanted to be, but without the ability to easily rebuild the dev environment from scratch, it’s hard to internalize this behavior pattern.
dev  osx  chef  vagrant  testing  vms  coding 
june 2013 by jm
Rusty's API Design Manifesto
This classic came up in discussions yesterday...

In the Linux Kernel community Rusty Russell came up with a API rating scheme to help us determine if our API is sensible, or not.  It's a rating from -10 to 10, where 10 is perfect is -10 is hell. Unfortunately there are too many examples at the wrong end of the scale.
rusty-russell  quality  coding  kernel  linux  apis  design  code-reviews  code 
may 2013 by jm
Approximate Heavy Hitters -The SpaceSaving Algorithm
nice, readable intro to SpaceSaving (which I've linked to before) -- a simple stream-processing cardinality top-K estimation algorithm with bounded error.
algorithms  coding  space-saving  cardinality  streams  stream-processing  estimation 
may 2013 by jm
Older Is Wiser: Study Shows Software Developers’ Skills Improve Over Time
At least in terms of StackOverflow rep:
For the first part of the study, the researchers compared the age of users with their reputation scores. They found that an individual’s reputation increases with age, at least into a user’s 40s. There wasn’t enough data to draw meaningful conclusions for older programmers. The researchers then looked at the number of different subjects that users asked and answered questions about, which reflects the breadth of their programming interests. The researchers found that there is a sharp decline in the number of subjects users weighed in on between the ages of 15 and 30 – but that the range of subjects increased steadily through the programmers’ 30s and into their early 50s.

Finally, the researchers evaluated the knowledge of older programmers (ages 37 and older) compared to younger programmers (younger than 37) in regard to relatively recent technologies – meaning technologies that have been around for less than 10 years. For two smartphone operating systems, iOS and Windows Phone 7, the veteran programmers had a significant edge in knowledge over their younger counterparts. For every other technology, from Django to Silverlight, there was no statistically significant difference between older and younger programmers. “The data doesn’t support the bias against older programmers – if anything, just the opposite,” Murphy-Hill says.


Damn right ;)
coding  age  studies  software  work  stack-overflow  ncsu  knowledge  skills  life 
april 2013 by jm
Lectures in Advanced Data Structures (6.851)
Good lecture notes on the current state of the art in data structure research.
Data structures play a central role in modern computer science. You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major results and current directions of research in data structures:

TIME TRAVEL We can remember the past efficiently (a technique called persistence), but in general it's difficult to change the past and see the outcomes on the present (retroactivity). So alas, Back To The Future isn't really possible.
GEOMETRY When data has more than one dimension (e.g. maps, database tables).
DYNAMIC OPTIMALITY Is there one binary search tree that's as good as all others? We still don't know, but we're close.
MEMORY HIERARCHY Real computers have multiple levels of caches. We can optimize the number of cache misses, often without even knowing the size of the cache.
HASHING Hashing is the most used data structure in computer science. And it's still an active area of research.
INTEGERS Logarithmic time is too easy. By careful analysis of the information you're dealing with, you can often reduce the operation times substantially, sometimes even to constant. We will also cover lower bounds that illustrate when this is not possible.
DYNAMIC GRAPHS A network link went down, or you just added or deleted a friend in a social network. We can still maintain essential information about the connectivity as it changes.
STRINGS Searching for phrases in giant text (think Google or DNA).
SUCCINCT Most “linear size” data structures you know are much larger than they need to be, often by an order of magnitude. Some data structures require almost no space beyond the raw data but are still fast (think heaps, but much cooler).


(via Tim Freeman)
data-structures  lectures  mit  video  data  algorithms  coding  csail  strings  integers  hashing  sorting  bst  memory 
april 2013 by jm
jq
like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text. [it] is written in portable C, and it has zero runtime dependencies. You can download a single binary, scp it to a far away machine, and expect it to work.


Nice tool. Needs to get into the Debian/Ubuntu apt repos pronto ;)
jq  tools  cli  via:peakscale  json  coding  data  sed  unix 
april 2013 by jm
Log4j2 Asynchronous Loggers for Low-Latency Logging - Apache Log4j 2
implemented using the LMAX Disruptor library -- very impressive performance figures. I presume in real-world usage, these latencies are dwarfed by hardware costs, though
disruptor  coding  java  log4j  logging  async  performance 
april 2013 by jm
You probably shouldn’t use a spreadsheet for important work
Daniel Lemire comments on the recent cases of bugs in spreadsheets causing major impact:
There are several critical problems with a tool like Excel that need to be widely known:

* Spreadsheets do not support testing. For anything that matters, you should validate and test your code automatically and systematically;

* Spreadsheets make code reviews impractical. To visually inspect the code, you need to click and each and every cell. In practice, this means that you cannot reasonably ask someone to read over your formulas to make sure that there is no mistake;

* Spreadsheets encourage redundancies. Spreadsheets encourage copy-and-paste. Though copying and pasting is sometimes the right tool, it also creates redundancies. These redundancies make it very difficult to update a spreadsheet: are you absolutely sure that you have changed the formula throughout?


Agreed on all three, particularly on the impossibility of testing. IMO, everyone who may be in a job where automation via spreadsheet is likely, needs training in SDE fundamentals: unit testing, the important of open source and open data for reproducibility, version control, and code review. We are all computer scientists now.
spreadsheets  excel  coding  errors  bugs  testability  unit-testing  testing  quality  sde  sde-fundamentals  dry 
april 2013 by jm
Functional Reactive Programming in the Netflix API with RxJava
Hmm, this seems nifty as a compositional building block for Java code to enable concurrency without thread-safety and sync problems.
Functional reactive programming offers efficient execution and composition by providing a collection of operators capable of filtering, selecting, transforming, combining and composing Observable's.

The Observable data type can be thought of as a "push" equivalent to Iterable which is "pull". With an Iterable, the consumer pulls values from the producer and the thread blocks until those values arrive. By contrast with the Observable type, the producer pushes values to the consumer whenever values are available. This approach is more flexible, because values can arrive synchronously or asynchronously.
concurrency  java  jvm  threads  thread-safety  coding  rx  frp  fp  functional-programming  reactive  functional  async  observable 
april 2013 by jm
« earlier      
per page:    204080120160

related tags

1980s  acceleration  acceptance-testing  acceptance-tests  actors  admin  advice  aes  age  agile  akka  alerting  algorithms  allan-klumpp  amazon  android  annotations  api  apis  apollo-program  apple  approximate  approximation  apps  archie-mcphee  architecture  architecture-astronauts  archival  arrays  articles  ascii  assembly  async  atomic  austerity  automation  autosave  aws  backoff  bdd  benchmarking  benchmarks  best-practices  big-o  binary-tree  biology  bitcoin  bitrot  bits  block-oriented  bloom-filter  bloom-filters  book  books  bpf  branch  branch-prediction  branching  brogrammers  bsd  bst  bug-tracking  bugs  build  building  business  c  c++  c-i  c=64  cache  camry  cap  capn-proto  capnproto  cardinality  career  cas  cbc  cep  cheat-sheet  checklists  chef  children  chrome  chronon  clean-code  cli  client-side  clojure  closures  cms  code  code-digger  code-review  code-reviews  code-smells  code-verification  code.org  coderdojo  coding  coding-standards  collaboration  collections  columnar  combinatorial  community  compatibility  compilation  compiler  compilers  complexity  compression  computation  computational-biology  computer-science  computing  concurrency  configuration  const  constraint-solving  continous-integration  continuous-deployment  contracts  conversion  cork  corrupt  cost  crash-only-software  crashing  crdts  crypto  cryptography  cs  csail  csharp  css  ctr  cucumber  cuda  culture  currency  cyclomatic-complexity  cyoa  dashcode  data  data-oriented-programming  data-structures  databases  david-ungar  deadlocks  debt  debugger  debugging  decay  demos  dependency-injection  deploy  deployment  deplyment  design  dev  development  devops  display  disruptor  dist-sys  distcomp  distributed  distributed-systems  djb  dmitry-vyukov  don-eyles  dot-net  doug-lea  download  dry  dsl  dsls  dublin  duct-tape  dvr  ec2  ecb  ecc  eclipse  economics  economy  editors  education  eiffel  elitism  embedded-systems  emulation  encapsulation  encryption  engineering  engines  entropy  erlang  error-checking  errors  essay  estimation  estonia  etsy  event-sourcing  events  eventual-consistency  evernote  evolution  excel  exceptions  experts  exploits  exponential-backoff  extensions  fail  false-positives  fault-tolerance  feminism  final  finance  findbugs  firefighting  firefox  firmware  fixing  flake8  flatbuffers  flickr  float  floating-point  fluent-interfaces  formats  fortran  fp  free  frequency-tables  frp  fsm  functional  functional-programming  funny  fuzzy-matching  g1  ga  games  gaming  garbage-collection  gc  gdb  geek  gems  genetic-algorithms  gerrit  gil  gil-tene  girls  git  gmail  go  google  google-drive  goto  goto-fail  gpu  graph  graphite  graphs  guardian  guava  guidelines  hacker-news  hackers  hacking  hacks  hardware  hash-tables  hashing  hashtables  hax  hdr  head-mounted-display  heap  hero-coder  hero-culture  hex  hijack  hiring  histogram  history  hll  hmac  hobbies  honeypots  horror  hotdog  hotspot  html  http  https  humor  hyperloglog  i7  i14y  ibm  ide  idea  ides  immutability  incident-response  indexing  inept  input  instrumentation  integers  integration  intel  intel-core  intellij  interactive  intercom  interfaces  internet  interoperability  interpreters  interviews  invalid  invariants  ios  iphone  irb  ireland  james-hamilton  jargon  java  java-8  javascript  jay-kreps  jeff-atwood  jenkins  jersey  jetty  jgc  jobs  joel-spolsky  john-carmack  jokes  jpl  jpmorgan  jq  js  js1k  json  justin-bieber  jvm  jwz  k-8  kafka  kernel  kids  knowledge  lambda  language  languages  latency  learning  lectures  legal  leonard-richardson  let-it-fail  libraries  library  life  lifecycle  like  linkedin  linux  linux-journal  lisp  live  load-balancing  lock-free  locking  locks  log  log4j  logging  loglog  london-whale  lookup3  lua  lucene  magic  make  makefiles  mame  management  martin-fowler  martin-thompson  mathematics  maths  matrix  mdd  measurement  measuring  mechanical-sympathy  meebo  memory  messaging  metrics  microreboot  microsoft  migration  minecraft  misra-c  mit  mobile  mocking  mocks  money  monitoring  mozilla  mtgox  multicore  multiprocessing  murmurhash  mutexes  mysql  nasa  nbta  ncsu  neologisms  netflix  netty  network  networking  node.js  nostalgia  numbers  observable  ohurley  one-liners  oo  oop  open-source  openbsd  openssl  ops  optimization  option  organisations  osx  ouch  overengineering  pair-programming  parallel  parallelism  partitions  patents  patterns  paul-krugman  pdf  peer-pressure  percona  performance  philosophy  pickling  pipelines  pipes  play  plos  politics  poul-henning-kemp  preconditions  premature-flexibilization  presentations  printf  private-keys  probabilistic  processors  production  profiling  programming  programming-languages  project-management  prophet  protobuf  protobufs  protocol-buffers  protocols  provisioning  proxies  pt-query-digest  pthreads  puzzles  python  q-digest  qa  qnx  quake-3  quality  quants  querying  questions  queue  queues  quotes  race-and-repair  radix-sort  rafe-colburn  rails  rake  ram  random  raspberry-pi  reactive  real-time  realtime  record  recordinality  recovery  recruiting  redis  redo  reference  reform  refs  refuctoring  relearning  release  reliability  remote  remote-work  repl  replay  replication  reputation  rest  restful  retries  revert  reviews  rips  rob-pike  ross-anderson  rpc  rr  rsa  rspec  rtos  ruby  rubygems  rusty-russell  rx  s3  safety  sampling  sbe  scala  scalability  scalatest  scaling  schemas  school  schools  science  script  scripting  scrum  sd  sde  sde-fundamentals  search  searching  security  sed  semantics  semaphores  senior  serialization  server  services  set  set-cover  sets  sexism  sharding  shell-scripts  shellcode  side-channels  silicon-valley  simd  sip  sketching  skills  skiplists  slang  slides  society  software  software-development  solver  sorting  soundcloud  source-code  source-control  space  space-saving  spacex  spaghetti-code  specifications  speech  speed  spores  spreadsheets  spy-hunter  sql  sse  ssh  ssl  stack-overflow  staffing  starcraft  steak  steven-skiena  storage  stout  strchr  stream-processing  streams  string-matching  string-search  stringly-typed  strings  strlen  strstr  students  studies  style  succinct  succinct-encoding  sux  swpats  symbol-alphabets  synchronization  syntax  sysadmin  system-tests  systems  tags  takedowns  tcpdump  tdd  teaching  teams  tech  tech-debt  techdirt  tee  telecommuting  testability  testing  tests  text  text-matching  the-duck  thomas-ptacek  thread-safety  threading  threads  threadsanitizer  time  time-warp  timecop  tips  tls  tools  top-k  toread  toyota  trac  trading  transactions  trees  tricks  tridge  tries  try  tuning  turing-complete  twisted  twitter  ui  unit-testing  unit-tests  unix  usa  user-scripts  vagrant  valgrind  validation  value-at-risk  varnish  version-control  via:ben  via:bos  via:cjhorn  via:cliffc  via:fanf  via:iamcal  via:janl  via:jzawodny  via:marc  via:Mozai  via:nelson  via:oisin  via:peakscale  via:preddit  via:proggit  via:sergio-bossa  via:tom  via:twitter  video  vietnam  vim  virtual-clock  vision  vms  vnc  volatile  walmart  web  web-services  witchcraft  women  work  workflows  wtf  xp  yagni  zerg-rush 

Copy this bookmark:



description:


tags: