jm + coding   155

vim-flake8
vim-flake8 is a Vim plugin that runs the currently open file through Flake8, a static syntax and style checker for Python source code. It supersedes both vim-pyflakes and vim-pep8. Flake8 is a wrapper around PyFlakes (static syntax checker), PEP8 (style checker) and Ned's MacCabe script (complexity checker).


Recommended by several pythonistas of my acquaintance!
vim  python  syntax  error-checking  errors  flake8  editors  ides  coding 
16 hours ago by jm
OpenSSL Valhalla Rampage
OpenBSD are going wild ripping out "arcane VMS hacks" in an attempt to render OpenSSL's source code comprehensible, and finding amazing horrors like this:

'Well, even if time() isn't random, your RSA private key is probably pretty random. Do not feed RSA private key information to the random subsystem as entropy. It might be fed to a pluggable random subsystem…. What were they thinking?!'
random  security  openssl  openbsd  coding  horror  rsa  private-keys  entropy 
5 days ago by jm
Beefcake
A sane Google Protocol Buffers library for Ruby. It's all about being Buf; ProtoBuf.
protobuf  google  protocol-buffers  ruby  coding  libraries  gems  open-source 
7 days ago by jm
Shuffle Sharding
Colm MacCarthaigh writes about a simple sharding/load-balancing algorithm which uses randomized instance selection and optional additional compartmentalization. See also: continuous hashing, and http://aphyr.com/posts/278-timelike-2-everything-fails-all-the-time
hashing  load-balancing  sharding  partitions  dist-sys  distcomp  architecture  coding 
8 days ago by jm
Why no SSL ? — Varnish version 4.0.0 documentation
Poul-Henning Kemp details why Varnish doesn't do SSL -- basically due to the quality and complexity of open-source SSL implementations:
There is no other way we can guarantee that secret krypto-bits do not leak anywhere they should not, than by fencing in the code that deals with them in a child process, so the bulk of varnish never gets anywhere near the certificates, not even during a core-dump.


Now looking pretty smart, post-Heartbleed.
ssl  tls  varnish  open-source  poul-henning-kemp  https  http  proxies  security  coding 
10 days ago by jm
Here’s Why You’re Not Hiring the Best and the Brightest
Jeff Atwood's persuasive argument that remote working needs to be the norm in tech work:
There’s an elephant in the room in the form of an implied clause: Always hire the best people… who are willing to live in San Francisco. Substitute Mountain View, New York, Boston, Chicago, or any other city. The problem is the same. We pay lip service to the idea of hiring the best people in the world — but in reality, we’re only hiring the best people who happen to be close by.
recruiting  remote  hiring  business  coding  work  remote-work  telecommuting  jobs  silicon-valley  jeff-atwood 
15 days ago by jm
Transitioning to Scala
Advice from a developer who helped rebuild Walmart.ca with Scala and Play


This is really good advice.
walmart  scala  java  languages  coding  relearning  play  akka 
18 days ago by jm
Efficient substring searching
This is a couple of years old, but I like this:
Turbo Boyer-Moore is disappointing, its name doesn’t do it justice. In academia constant overhead doesn’t matter, but here we see that it matters a lot in practice. Turbo Boyer-Moore’s inner loop is so complex that we think we’re better off using the original Boyer-Moore.


A good demo of how large values of O(n) can be slower than small values of O(mn).
algorithms  search  strings  coding  big-o  string-search  searching 
23 days ago by jm
rr
A cool-looking new debugging tool for C/C++ from Mozilla.
Many, many people have noticed that if we had a way to reliably record program execution and replay it later, with the ability to debug the replay, we could largely tame the nondeterminism problem. This would also allow us to deliberately introduce nondeterminism so tests can explore more of the possible execution space, without impacting debuggability. Many record and replay systems have been built in pursuit of this vision. (I built one myself.) For various reasons these systems have not seen wide adoption. So, a few years ago we at Mozilla started a project to create a new record-and-replay tool that would overcome the obstacles blocking adoption. We call this tool rr.


Low runtime overhead; easy deployability; targeted at 32-bit (?!) Linux; OSS. (via Bryan O'Sullivan)
via:bos  mozilla  debugging  coding  firefox  rr  record  replay  gdb  c++  linux 
27 days ago by jm
The Stony Brook Algorithm Repository
This WWW page is intended to serve as a comprehensive collection of algorithm implementations for over seventy of the most fundamental problems in combinatorial algorithms. The problem taxonomy, implementations, and supporting material are all drawn from my [ie. Steven Skiena's] book 'The Algorithm Design Manual'. Since the practical person is more often looking for a program than an algorithm, we provide pointers to solid implementations of useful algorithms, when they are available.
algorithms  reference  coding  steven-skiena  combinatorial  cs 
4 weeks ago by jm
Good explanation of exponential backoff
I've often had to explain this key feature verbosely, and it's hard to do without handwaving. Great to have a solid, well-explained URL to point to
exponential-backoff  backoff  retries  reliability  web-services  http  networking  internet  coding  design 
5 weeks ago by jm
IntelliJ IDEA 13.1 will support Chronon Debugger
This, IMO, would be a really good reason to upgrade to the payware version of IDEA - Chronon looks cool.
Chronon is a new revolutionary tool keeping track of running Java programs and recording their execution process for later analysis, which can be helpful when you need to thoroughly retrace your steps when dealing with complicated bugs.
chronon  debugging  java  intellij  idea  ides  coding  time-warp  time 
6 weeks ago by jm
ImperialViolet - Apple's SSL/TLS bug
as we all know by now, a misplaced "goto fail" caused a critical, huge security flaw in versions of IOS and OSX SSL, since late 2012.

Lessons:

1. unit test the failure cases, particularly for critical security code!
2. use braces.
3. dead-code analysis would have caught this.

I'm not buying the "goto considered harmful" line, though, since any kind of control flow structure would have had the same problem.
coding  apple  osx  ios  crypto  ssl  security  goto-fail  goto  fail  unit-testing  coding-standards 
8 weeks ago by jm
java - Why not use Double or Float to represent currency?
A good canonical URL for this piece of coding guidance.
For example, suppose you have $1.03 and you spend 42c. How much money do you have left?

System.out.println(1.03 - .42); => prints out 0.6100000000000001.
coding  tips  floating-point  float  java  money  currency  bugs 
10 weeks ago by jm
Girls and Software
a pretty thought-provoking article from Linux Journal on women in computing, and how we're doing it all wrong
feminism  community  programming  coding  women  computing  software  society  work  linux-journal  children  teaching 
10 weeks ago by jm
Git is not scalable with too many refs/*
Mailing list thread from 2011; git starts to keel over if you tag too much
git  tags  coding  version-control  bugs  scaling  refs 
10 weeks ago by jm
Hero Culture
Good description of the "hero coder" organisational antipattern.
Now imagine that most of the team is involved in fire-fighting. New recruits see the older recruits getting praised for their brave work in the line-of-fire and they want that kind of praise and reward too. Before long everyone is focused on putting out fires and it is no ones interest to step back and take on the risks that long-term DevOps-focused goals entail.
coding  ops  admin  hero-coder  hero-culture  firefighting  organisations  teams  culture 
12 weeks ago by jm
Coders performing code reviews of scientific projects: pilot study
'PLOS and Mozilla conducted a month-long pilot study in which professional developers
performed code reviews on software associated with papers published in PLOS
Computational Biology. While the developers felt the reviews were limited by (a) lack of
familiarity with the domain and (b) lack of two-way contact with authors, the scientists
appreciated the reviews, and both sides were enthusiastic about repeating the experiment. '

Actually sounds like it was more successful than this summary implies.
plos  mozilla  code-reviews  coding  science  computational-biology  biology  studies 
12 weeks ago by jm
Sux
Some basic succinct data structures. [...] The main highlights are:
a novel, broadword-based implementation of rank/select queries for up to 264 bits that is highly competitive with known 32-bit implementations on 64-bit architectures (additional space required is 25% for ranking and 12.5%-37.5% for selection);
several Java structures using the Elias–Fano representation of monotone sequences for storing pointers, variable-length bit arrays, etc.
Java code implementing minimal perfect hashing using around 2.68 bits per element (also using some broadword ideas);
a few Java implementations of monotone minimal perfect hashing.
Sux is free software distributed under the GNU Lesser General Public License.
sux  succinct  data-structures  bits  compression  space  coding 
12 weeks ago by jm
Branchless hex-to-decimal conversion hack
via @simonebordet, on the mechanical-sympathy list: ((c & 0x1F) + ((c >> 6) * 0x19) – 0x10)
hacks  one-liners  coding  performance  optimization  hex  conversion  numbers  ascii 
january 2014 by jm
Don’t get stuck
Good description of Etsy's take on continuous deployment, committing directly to trunk, hidden with feature-flags, from Rafe Colburn
continuous-deployment  coding  agile  deployment  devops  etsy  rafe-colburn 
january 2014 by jm
stereopsis : graphics : radix tricks
some nice super-optimized Radix Sort code which handles floating point values. See also http://codercorner.com/RadixSortRevisited.htm for more info on the histogramming/counter concept
sorting  programming  coding  algorithms  radix-sort  optimization  floating-point 
december 2013 by jm
On undoing, fixing, or removing commits in git
Choose-your-own-adventure style. "Oh dear. This is going to get complicated."

(via Tom)
via:tom  cyoa  git  fixing  revert  source-control  coding 
december 2013 by jm
Virtual Clock - Testing Patterns Encyclopedia
a nice pattern for unit tests which need deterministic time behaviour. Trying to think up a really nice API for this....
testing  unit-tests  time  virtual-clock  real-time  coding 
december 2013 by jm
[JavaSpecialists 215] - StampedLock Idioms
a demo of Doug Lea's latest concurrent data structure in Java 8
doug-lea  concurrency  coding  java-8  java  threads 
december 2013 by jm
HdrHistogram by giltene
A Histogram that supports recording and analyzing sampled data value counts across a configurable integer value range with configurable value precision within the range. Value precision is expressed as the number of significant digits in the value recording, and provides control over value quantization behavior across the value range and the subsequent value resolution at any given level.
hdr  histogram  data-structures  coding  gil-tene  sampling  measuring 
october 2013 by jm
Toyota's killer firmware: Bad design and its consequences
This is exactly what you do NOT want to read about embedded systems controlling acceleration in your car:

The Camry electronic throttle control system code was found to have 11,000 global variables. Barr described the code as “spaghetti.” Using the Cyclomatic Complexity metric, 67 functions were rated untestable (meaning they scored more than 50). The throttle angle function scored more than 100 (unmaintainable).
Toyota loosely followed the widely adopted MISRA-C coding rules but Barr’s group found 80,000 rule violations. Toyota's own internal standards make use of only 11 MISRA-C rules, and five of those were violated in the actual code. MISRA-C:1998, in effect when the code was originally written, has 93 required and 34 advisory rules. Toyota nailed six of them. Barr also discovered inadequate and untracked peer code reviews and the absence of any bug-tracking system at Toyota.


On top of this, there was no error-correcting RAM in use; stack-killing recursive code; a quoted 94% stack usage; risks of unintentional RTOS task shutdown; buffer overflows; unsafe casting; race conditions; unchecked error code return values; and a trivial watchdog timer check. Crappy, unsafe coding.
firmware  horror  embedded-systems  toyota  camry  safety  acceleration  misra-c  coding  code-verification  spaghetti-code  cyclomatic-complexity  realtime  rtos  c  code-reviews  bug-tracking  quality 
october 2013 by jm
How to lose $172,222 a second for 45 minutes
Major outage and $465m of trading loss, caused by staggeringly inept software management: 8 years of incremental bitrot, technical debt, and failure to have correct processes to engage an ops team in incident response. Hopefully this will serve as a lesson that software is more than just coding, at least to one industry
trading  programming  coding  software  inept  fail  bitrot  tech-debt  ops  incident-response 
october 2013 by jm
NCCA Junior Cycle - Programming and Coding Consultation Page
the National Council for Curriculum and Assessment are looking for feedback on adding programming to the junior cycle (ie., early secondary school) in Ireland. Add your EUR.02!
ireland  programming  coding  education  schools 
october 2013 by jm
Timecop
'A Ruby gem providing "time travel" and "time freezing" capabilities, making it dead simple to test time-dependent code. It provides a unified method to mock Time.now, Date.today, and DateTime.now in a single call.'

This is about the nicest mock-time library I've found so far. (via Ben)
time  ruby  testing  coding  unit-tests  mocking  timecop  via:ben 
october 2013 by jm
To my daughter's high school programming teacher
During the first semester of my daughter's junior/senior year, she took her first programming class. She knew I'd be thrilled, but she did it anyway.

When my daughter got home from the first day of the semester, I asked her about the class. "Well, I'm the only girl in class," she said. Fortunately, that didn't bother her, and she even liked joking around with the guys in class. My daughter said that you noticed and apologized to her because she was the only girl in class. And when the lessons started (Visual Basic? Seriously??), my daughter flew through the assigments. After she finished, she'd help classmates who were behind or struggling in class.

Over the next few weeks, things went downhill. While I was attending SC '12 in Salt Lake City last November, my daughter emailed to tell me that the boys in her class were harassing her. "They told me to get in the kitchen and make them sandwiches," she said. I was painfully reminded of the anonymous men boys who left comments on a Linux Pro Magazine blog post I wrote a few years ago, saying the exact same thing.


I am sick to death of this 'brogrammer' bullshit.
brogrammers  sexism  culture  tech  teaching  coding  software  education 
september 2013 by jm
Excellent Rob Pike quote about algorithmic complexity
'Fancy algorithms are slow when n is small, and n is usually small.' -- Rob Pike


Been there, bought the t-shirt ;)
rob-pike  quotes  algorithms  big-o  complexity  coding 
september 2013 by jm
A Case Against Cucumber
This is exactly my problem with Cucumber and similar BDD test frameworks.
When I write a Cucumber feature, I have to write the Gherkin that describes the acceptance criteria, and the Ruby code that implements the step definitions. Since the code to implement the step definitions is just normal RSpec (or whichever testing library you use), if someone else is writing the Gherkin, the amount of setup to create a working test should be about the same. So you’re only breaking even!

However, I don’t believe that it would really be breaking even. Cucumber adds another layer of indirection on top of your tests. When I’m trying to see why a specific scenario is failing, first I need to find the step that is failing. Since these steps are defined with regular expressions, I have to grep for the step definition.
ruby  testing  bdd  cucumber  rspec  coding 
september 2013 by jm
Non-blocking transactional atomicity
Peter Bailis with an interesting distributed-storage atomicity algorithm for performing multi-record transactional updates
algorithms  nbta  transactions  databases  storage  distcomp  distributed  atomic  coding  eventual-consistency  crdts 
september 2013 by jm
Recordinality
a new, and interesting, sketching algorithm, with a Java implementation:
Recordinality is unique in that it provides cardinality estimation like HLL, but also offers "distinct value sampling." This means that Recordinality can allow us to fetch a random sample of distinct elements in a stream, invariant to cardinality. Put more succinctly, given a stream of elements containing 1,000,000 occurrences of 'A' and one occurrence each of 'B' - 'Z', the probability of any letter appearing in our sample is equal. Moreover, we can also efficiently store the number of times elements in our distinct sample have been observed. This can help us to understand the distribution of occurrences of elements in our stream. With it, we can answer questions like "do the elements we've sampled present in a power law-like pattern, or is the distribution of occurrences relatively even across the set?"
sketching  coding  algorithms  recordinality  cardinality  estimation  hll  hashing  murmurhash  java 
august 2013 by jm
Applied Cryptography, Cryptography Engineering, and how they need to be updated
Whoa, I had no idea my knowledge of crypto was so out of date! For example:
ECC is going to replace RSA within the next 10 years. New systems probably shouldn’t use RSA at all.


This blogpost is full of similar useful guidelines and rules of thumb. Here's hoping I don't need to work on a low-level cryptosystem any time soon, as the risk of screwing it up is always high, but if I do this is a good reference for how it needs to be done nowadays.
thomas-ptacek  crypto  cryptography  coding  design  security  aes  cbc  ctr  ecb  hmac  side-channels  rsa  ecc 
july 2013 by jm
Flower Filter
'A simple time-decaying approximate membership filter' -- like a Bloom filter with time decay. See also http://eng.42go.com/flower-filter-an-update/ for some notes on the non-independence of survival probabilities, and how that imposes negligible differences in practice.
bloom-filter  algorithms  coding  probabilistic  approximate  time  decay 
july 2013 by jm
Clean Code Cheat Sheet [pdf]
'principles, patterns, smells and guidelines for clean code, class and package design, TDD, Acceptance Test Driven Development, and CI'
clean-code  code-smells  coding  tdd  testing  continous-integration  patterns  pdf 
july 2013 by jm
Sketch of the Day: K-Minimum Values
Another sketching algorithm -- this one supports set union and intersection operations more easily than HyperLogLog when there are more than 2 sets
algorithms  coding  space-saving  cardinality  streams  stream-processing  estimation  sets  sketching 
june 2013 by jm
Java Garbage Collection Distilled
Martin Thompson lays it out:
Serial, Parallel, Concurrent, CMS, G1, Young Gen, New Gen, Old Gen, Perm Gen, Eden, Tenured, Survivor Spaces, Safepoints, and the hundreds of JVM start-up flags. Does this all baffle you when trying to tune the garbage collector while trying to get the required throughput and latency from your Java application? If it does then don’t worry, you are not alone. Documentation describing garbage collection feels like man pages for an aircraft. Every knob and dial is detailed and explained but nowhere can you find a guide on how to fly. This article will attempt to explain the tradeoffs when choosing and tuning garbage collection algorithms for a particular workload.
gc  java  garbage-collection  coding  cms  g1  jvm  optimization 
june 2013 by jm
On Scala
great, comprehensive review of the language, its pros and misfeatures, from Bill de hOra
scala  languages  coding  fp  reviews 
june 2013 by jm
Big Memory, Part 4
good microbenchmarking of a bunch of Java collections; Trove, fastutil, PCJ, mahout-collections, hppc
java  collections  benchmarks  performance  speed  coding  data-structures  optimization 
june 2013 by jm
Vagrant and Chef to provision dev test environments
We have recently switched from a manually configured development environment to a nearly fully automated one using Vagrant, Chef, and a few other tools. With this transition, we’ve moved to an environment where data on the dev boxes is considered disposable and only what’s checked into the SCM is “real”. This is where we’ve always wanted to be, but without the ability to easily rebuild the dev environment from scratch, it’s hard to internalize this behavior pattern.
dev  osx  chef  vagrant  testing  vms  coding 
june 2013 by jm
Rusty's API Design Manifesto
This classic came up in discussions yesterday...

In the Linux Kernel community Rusty Russell came up with a API rating scheme to help us determine if our API is sensible, or not.  It's a rating from -10 to 10, where 10 is perfect is -10 is hell. Unfortunately there are too many examples at the wrong end of the scale.
rusty-russell  quality  coding  kernel  linux  apis  design  code-reviews  code 
may 2013 by jm
Approximate Heavy Hitters -The SpaceSaving Algorithm
nice, readable intro to SpaceSaving (which I've linked to before) -- a simple stream-processing cardinality top-K estimation algorithm with bounded error.
algorithms  coding  space-saving  cardinality  streams  stream-processing  estimation 
may 2013 by jm
Older Is Wiser: Study Shows Software Developers’ Skills Improve Over Time
At least in terms of StackOverflow rep:
For the first part of the study, the researchers compared the age of users with their reputation scores. They found that an individual’s reputation increases with age, at least into a user’s 40s. There wasn’t enough data to draw meaningful conclusions for older programmers. The researchers then looked at the number of different subjects that users asked and answered questions about, which reflects the breadth of their programming interests. The researchers found that there is a sharp decline in the number of subjects users weighed in on between the ages of 15 and 30 – but that the range of subjects increased steadily through the programmers’ 30s and into their early 50s.

Finally, the researchers evaluated the knowledge of older programmers (ages 37 and older) compared to younger programmers (younger than 37) in regard to relatively recent technologies – meaning technologies that have been around for less than 10 years. For two smartphone operating systems, iOS and Windows Phone 7, the veteran programmers had a significant edge in knowledge over their younger counterparts. For every other technology, from Django to Silverlight, there was no statistically significant difference between older and younger programmers. “The data doesn’t support the bias against older programmers – if anything, just the opposite,” Murphy-Hill says.


Damn right ;)
coding  age  studies  software  work  stack-overflow  ncsu  knowledge  skills  life 
april 2013 by jm
Lectures in Advanced Data Structures (6.851)
Good lecture notes on the current state of the art in data structure research.
Data structures play a central role in modern computer science. You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major results and current directions of research in data structures:

TIME TRAVEL We can remember the past efficiently (a technique called persistence), but in general it's difficult to change the past and see the outcomes on the present (retroactivity). So alas, Back To The Future isn't really possible.
GEOMETRY When data has more than one dimension (e.g. maps, database tables).
DYNAMIC OPTIMALITY Is there one binary search tree that's as good as all others? We still don't know, but we're close.
MEMORY HIERARCHY Real computers have multiple levels of caches. We can optimize the number of cache misses, often without even knowing the size of the cache.
HASHING Hashing is the most used data structure in computer science. And it's still an active area of research.
INTEGERS Logarithmic time is too easy. By careful analysis of the information you're dealing with, you can often reduce the operation times substantially, sometimes even to constant. We will also cover lower bounds that illustrate when this is not possible.
DYNAMIC GRAPHS A network link went down, or you just added or deleted a friend in a social network. We can still maintain essential information about the connectivity as it changes.
STRINGS Searching for phrases in giant text (think Google or DNA).
SUCCINCT Most “linear size” data structures you know are much larger than they need to be, often by an order of magnitude. Some data structures require almost no space beyond the raw data but are still fast (think heaps, but much cooler).


(via Tim Freeman)
data-structures  lectures  mit  video  data  algorithms  coding  csail  strings  integers  hashing  sorting  bst  memory 
april 2013 by jm
jq
like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text. [it] is written in portable C, and it has zero runtime dependencies. You can download a single binary, scp it to a far away machine, and expect it to work.


Nice tool. Needs to get into the Debian/Ubuntu apt repos pronto ;)
jq  tools  cli  via:peakscale  json  coding  data  sed  unix 
april 2013 by jm
Log4j2 Asynchronous Loggers for Low-Latency Logging - Apache Log4j 2
implemented using the LMAX Disruptor library -- very impressive performance figures. I presume in real-world usage, these latencies are dwarfed by hardware costs, though
disruptor  coding  java  log4j  logging  async  performance 
april 2013 by jm
You probably shouldn’t use a spreadsheet for important work
Daniel Lemire comments on the recent cases of bugs in spreadsheets causing major impact:
There are several critical problems with a tool like Excel that need to be widely known:

* Spreadsheets do not support testing. For anything that matters, you should validate and test your code automatically and systematically;

* Spreadsheets make code reviews impractical. To visually inspect the code, you need to click and each and every cell. In practice, this means that you cannot reasonably ask someone to read over your formulas to make sure that there is no mistake;

* Spreadsheets encourage redundancies. Spreadsheets encourage copy-and-paste. Though copying and pasting is sometimes the right tool, it also creates redundancies. These redundancies make it very difficult to update a spreadsheet: are you absolutely sure that you have changed the formula throughout?


Agreed on all three, particularly on the impossibility of testing. IMO, everyone who may be in a job where automation via spreadsheet is likely, needs training in SDE fundamentals: unit testing, the important of open source and open data for reproducibility, version control, and code review. We are all computer scientists now.
spreadsheets  excel  coding  errors  bugs  testability  unit-testing  testing  quality  sde  sde-fundamentals  dry 
april 2013 by jm
Functional Reactive Programming in the Netflix API with RxJava
Hmm, this seems nifty as a compositional building block for Java code to enable concurrency without thread-safety and sync problems.
Functional reactive programming offers efficient execution and composition by providing a collection of operators capable of filtering, selecting, transforming, combining and composing Observable's.

The Observable data type can be thought of as a "push" equivalent to Iterable which is "pull". With an Iterable, the consumer pulls values from the producer and the thread blocks until those values arrive. By contrast with the Observable type, the producer pushes values to the consumer whenever values are available. This approach is more flexible, because values can arrive synchronously or asynchronously.
concurrency  java  jvm  threads  thread-safety  coding  rx  frp  fp  functional-programming  reactive  functional  async  observable 
april 2013 by jm
Archiving Gmail to Evernote
Google Drive and GMail have a built-in scripting engine. I had no idea
gmail  evernote  archival  scripting  coding  hacks  google-drive 
april 2013 by jm
Microsoft Code Digger extension
Miguel de Icaza says it's witchcraft -- I'm inclined to agree:

Code Digger analyzes possible execution paths through your .NET code. The result is a table where each row shows a unique behavior of your code. The table helps you understand the behavior of the code, and it may also uncover hidden bugs. Through the new context menu item "Generate Inputs / Outputs Table" in the Visual Studio editor, you can invoke Code Digger to analyze your code. Code Digger computes and displays input-output pairs. Code Digger systematically hunts for bugs, exceptions, and assertion failures.
testing  constraint-solving  solver  witchcraft  magic  dot-net  coding  tests  code-digger  microsoft 
april 2013 by jm
Lucene 4 - Revisiting Problems For Speed [slides]
a Presentation from Simon Willnauer on optimization work performed on Lucene in 2011. The most interesting stuff here is the work done to replace an O(n^2) FuzzyQuery fuzzy-match algorithm with a FSM trie is extremely cool -- benchmarked at 214 times faster!
benchmarks  slides  lucene  search  fuzzy-matching  text-matching  strings  algorithms  coding  fsm  tries 
april 2013 by jm
The Excel Depression - NYTimes.com
Krugman on the Reinhart-Rogoff Excel-bug fiasco.
What the Reinhart-Rogoff affair shows is the extent to which austerity has been sold on false pretenses. For three years, the turn to austerity has been presented not as a choice but as a necessity. Economic research, austerity advocates insisted, showed that terrible things happen once debt exceeds 90 percent of G.D.P. But “economic research” showed no such thing; a couple of economists made that assertion, while many others disagreed. Policy makers abandoned the unemployed and turned to austerity because they wanted to, not because they had to. So will toppling Reinhart-Rogoff from its pedestal change anything? I’d like to think so. But I predict that the usual suspects will just find another dubious piece of economic analysis to canonize, and the depression will go on and on.
paul-krugman  economics  excel  coding  bugs  software  austerity  debt 
april 2013 by jm
Excel, untestability, and the reliability of quants
Wow, this is a great software-quality story -- I knew Excel was the most widely used programming environment out there, but this is a factor I'd overlooked:

In his remarks on the final panel, Frank Partnoy mentioned something I missed when it came out a few weeks ago: the role of Microsoft Excel in the “London Whale” trading debacle. [..] To summarize: JPMorgan’s Chief Investment Office needed a new value-at-risk (VaR) model for the synthetic credit portfolio (the one that blew up) and assigned a quantitative whiz [...] to create it. The new model “operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another.” The internal Model Review Group identified this problem as well as a few others, but approved the model, while saying that it should be automated and another significant flaw should be fixed. After the London Whale trade blew up, the Model Review Group discovered that the model had not been automated and found several other errors. Most spectacularly, “After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR ...”

I write periodically about the perils of bad software in the business world in general and the financial industry in particular, by which I usually mean back-end enterprise software that is poorly designed, insufficiently tested, and dangerously error-prone. But this is something different. [...] While Excel the program is reasonably robust, the spreadsheets that people create with Excel are incredibly fragile. There is no way to trace where your data come from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets, for starters. The biggest problem is that anyone can create Excel spreadsheets -- badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way.

This is why the JPMorgan VaR model is the rule, not the exception: manual data entry, manual copy-and-paste, and formula errors. This is another important reason why you should pause whenever you hear that banks’ quantitative experts are smarter than Einstein, or that sophisticated risk management technology can protect banks from blowing up. At the end of the day, it’s all software. While all software breaks occasionally, Excel spreadsheets break all the time. But they don’t tell you when they break: they just give you the wrong number.
excel  reliability  software  coding  ides  jpmorgan  value-at-risk  finance  london-whale  quants  spreadsheets  unit-tests  testability  testing 
april 2013 by jm
Reality, Reactivity, Relevance and Repeatability in Java Application Profiling
this product from JInspired appears to support runtime profiling of java apps with < 5% performance impact
profiling  performance  java  coding  measurement 
april 2013 by jm
Austerity policies founded on Excel typo
You've probably heard that countries with a high debt:GDP ratio suffer from slow economic growth. The specific number 90 percent has been invoked frequently. That's all thanks to a study conducted by Carmen Reinhardt and Kenneth Rogoff for their book This Time It's Different. But the results have been difficult for other researchers to replicate. Now three scholars at the University of Massachusetts have done so in "Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff" and they find that the Reinhart/Rogoff result is based on opportunistic exclusion of Commonwealth data in the late-1940s, a debatable premise about how to weight the data, and most of all a sloppy Excel coding error.

Read Mike Konczal for the whole rundown, but I'll just focus on the spreadsheet part. At one point they set cell L51 equal to AVERAGE(L30:L44) when the correct procuedure was AVERAGE(L30:L49). By typing wrong, they accidentally left Denmark, Canada, Belgium, Austria, and Australia out of the average. When you run the math correctly "the average real GDP growth rate for countries carrying a public debt-to-GDP ratio of over 90 percent is actually 2.2 percent, not -0.1 percent."
austerity  politics  excel  coding  errors  bugs  spreadsheets  economics  economy 
april 2013 by jm
Ked
To our knowledge, Ked is the first scripting language to emerge from The People's Republic of Cork. Below is an account of what we know so far about the mysterious Corkonian language. Any suggested updates or contributions are encouraged.

Genius.
coding  cork  jokes  funny  like  languages  programming 
april 2013 by jm
Cap'n Proto
Cap’n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except faster. In fact, in benchmarks, Cap’n Proto is INFINITY TIMES faster than Protocol Buffers.


Basically, marshalling like writing an aligned C struct to the wire, QNX messaging protocol-style. Wasteful on space, but responds to this by suggesting compression (which is a fair point tbh). C++-only for now. I'm not seeing the same kind of support for optional data that protobufs has though. Overall I'm worried there's some useful features being omitted here...
serialization  formats  protobufs  capn-proto  protocols  coding  c++  rpc  qnx  messaging  compression  compatibility  interoperability  i14y 
april 2013 by jm
JPL Institutional Coding Standard for the Java Programming Language
From JPL's Laboratory for Reliable Software (LaRS). Great reference; there's some really useful recommendations here, and good explanations of familiar ones like "prefer composition over inheritance". Many are supported by FindBugs, too.

Here's the full list:

compile with checks turned on;
apply static analysis;
document public elements;
write unit tests;
use the standard naming conventions;
do not override field or class names;
make imports explicit;
do not have cyclic package and class dependencies;
obey the contract for equals();
define both equals() and hashCode();
define equals when adding fields;
define equals with parameter type Object;
do not use finalizers;
do not implement the Cloneable interface;
do not call nonfinal methods in constructors;
select composition over inheritance;
make fields private;
do not use static mutable fields;
declare immutable fields final;
initialize fields before use;
use assertions;
use annotations;
restrict method overloading;
do not assign to parameters;
do not return null arrays or collections;
do not call System.exit;
have one concept per line;
use braces in control structures;
do not have empty blocks;
use breaks in switch statements;
end switch statements with default;
terminate if-else-if with else;
restrict side effects in expressions;
use named constants for non-trivial literals;
make operator precedence explicit;
do not use reference equality;
use only short-circuit logic operators;
do not use octal values;
do not use floating point equality;
use one result type in conditional expressions;
do not use string concatenation operator in loops;
do not drop exceptions;
do not abruptly exit a finally block;
use generics;
use interfaces as types when available;
use primitive types;
do not remove literals from collections;
restrict numeric conversions;
program against data races;
program against deadlocks;
do not rely on the scheduler for synchronization;
wait and notify safely;
reduce code complexity
nasa  java  reference  guidelines  coding-standards  jpl  reliability  software  coding  oo  concurrency  findbugs  bugs 
march 2013 by jm
SpaceX software dev practices
Metrics rule the roost -- I guess there's been a long history of telemetry in space applications.

To make software more visible, you need to know what it is doing, he said, which means creating "metrics on everything you can think of".... Those metrics should cover areas like performance, network utilization, CPU load, and so on.

The metrics gathered, whether from testing or real-world use, should be stored as it is "incredibly valuable" to be able to go back through them, he said. For his systems, telemetry data is stored with the program metrics, as is the version of all of the code running so that everything can be reproduced if needed.

SpaceX has programs to parse the metrics data and raise an alarm when "something goes bad". It is important to automate that, Rose said, because forcing a human to do it "would suck". The same programs run on the data whether it is generated from a developer's test, from a run on the spacecraft, or from a mission. Any failures should be seen as an opportunity to add new metrics. It takes a while to "get into the rhythm" of doing so, but it is "very useful". He likes to "geek out on error reporting", using tools like libSegFault and ftrace.

Automation is important, and continuous integration is "very valuable", Rose said. He suggested building for every platform all of the time, even for "things you don't use any more". SpaceX does that and has found interesting problems when building unused code. Unit tests are run from the continuous integration system any time the code changes. "Everyone here has 100% unit test coverage", he joked, but running whatever tests are available, and creating new ones is useful. When he worked on video games, they had a test to just "warp" the character to random locations in a level and had it look in the four directions, which regularly found problems.

"Automate process processes", he said. Things like coding standards, static analysis, spaces vs. tabs, or detecting the use of Emacs should be done automatically. SpaceX has a complicated process where changes cannot be made without tickets, code review, signoffs, and so forth, but all of that is checked automatically. If static analysis is part of the workflow, make it such that the code will not build unless it passes that analysis step.

When the build fails, it should "fail loudly" with a "monitor that starts flashing red" and email to everyone on the team. When that happens, you should "respond immediately" to fix the problem. In his team, they have a full-size Justin Bieber cutout that gets placed facing the team member who broke the build. They found that "100% of software engineers don't like Justin Bieber", and will work quickly to fix the build problem.
spacex  dev  coding  metrics  deplyment  production  space  justin-bieber 
march 2013 by jm
CS in VN
Neil Fraser visits a school in Vietnam, and investigates their computer science curriculum. They are doing an incredible job, it looks like -- very impressive!
vietnam  programming  education  cs  computer-science  schools  coding  children 
march 2013 by jm
Data Corruption To Go: The Perils Of sql_mode = NULL « Code as Craft
bloody hell. A load of cases where MySQL will happily accommodate all sorts of malformed and invalid input -- thankfully with fixes.

Also includes a very nifty example of Etsy tee'ing their production db traffic (30k pps in and out) via tcpdump and pt-query-digest to a test database host. Fantastic hackery
mysql  input  corrupt  invalid  validation  coding  databases  sql  testing  tcpdump  percona  pt-query-digest  tee 
march 2013 by jm
Single Producer/Consumer lock free Queue step by step
great dissection of Martin "Disruptor" Thompson's lock-free single-producer/single-consumer queue data structure, with benchmark results showing crazy speedups. This is particularly useful since it's a data structure that can be used to provide good lock-free speedups without adopting the entire Disruptor design pattern.
disruptor  coding  java  jvm  martin-thompson  lock-free  volatile  atomic  queue  data-structures 
march 2013 by jm
From a monolithic Ruby on Rails app to the JVM
How Soundcloud have ditched the monolithic Rails for nimbler, small-scale distributed polyglot services running on the JVM
soundcloud  rails  slides  jvm  scalability  ruby  scala  clojure  coding 
march 2013 by jm
4 Things Java Programmers Can Learn from Clojure (without learning Clojure)
'1. Use immutable values; 2. Do no work in the constructor; 3. Program to small interfaces; 4. Represent computation, not the world'. Strongly agreed with #1, and the others look interesting too
clojure  lisp  design  programming  coding  java 
march 2013 by jm
Test-Driven Infrastructure with Chef
Interesting idea.
The book introduces “Infrastructure as Code,” test-driven development, Chef, and cucumber-chef, and then proceeds to a simple example using Chef to provision a shared Linux server. The recipes for the server are developed test-first, demonstrating both the technique and the workflow.
tdd  chef  server  provisioning  build  deploy  linux  coding  ops  sysadmin 
march 2013 by jm
#AltDevBlogADay » Latency Mitigation Strategies
John Carmack on the low-latency coding techniques used to support head mounted display devices.

Virtual reality (VR) is one of the most demanding human-in-the-loop applications from a latency standpoint. The latency between the physical movement of a user’s head and updated photons from a head mounted display reaching their eyes is one of the most critical factors in providing a high quality experience.

Human sensory systems can detect very small relative delays in parts of the visual or, especially, audio fields, but when absolute delays are below approximately 20 milliseconds they are generally imperceptible. Interactive 3D systems today typically have latencies that are several times that figure, but alternate configurations of the same hardware components can allow that target to be reached.

A discussion of the sources of latency throughout a system follows, along with techniques for reducing the latency in the processing done on the host system.
head-mounted-display  display  ui  latency  vision  coding  john-carmack 
february 2013 by jm
Crash-only software
I couldn't remember the name for this design principle, so it's worth a bookmark to remind me in future...

'This refers to computer programs that handle failures by simply restarting, without attempting any sophisticated recovery. Correctly written components of crash-only software can microreboot to a known-good state without the help of a user. Since failure-handling and normal startup use the same methods, this can increase the chance that bugs in failure-handling code will be noticed.'
crashing  crash-only-software  design  architecture  coding  software  fault-tolerance  erlang  let-it-fail  microreboot  recovery  autosave 
february 2013 by jm
clearspring / stream-lib
ASL-licensed open source library of stream-processing/approximation algorithms: count-min sketch, space-saving top-k, cardinality estimation, LogLog, HyperLogLog, MurmurHash, lookup3 hash, Bloom filters, q-digest, stochastic top-k
algorithms  coding  streams  cep  stream-processing  approximation  probabilistic  space-saving  top-k  cardinality  estimation  bloom-filters  q-digest  loglog  hyperloglog  murmurhash  lookup3 
february 2013 by jm
Real-time Analytics in Scala [slides, PDF]
some good approximation/streaming algorithms and tips on Scala implementation
streams  algorithms  approximation  coding  scala  slides 
february 2013 by jm
Jetty-9 goes fast with Mechanical Sympathy
This is very cool! Applying Mechanical Sympathy optimization techniques to Jetty, specifically: "False sharing" on the BlockingArrayQueue data structure resolved; a new ArrayTernaryTrie data structure to improve header field storage, making it faster to build. look up, efficient on RAM, cheap to GC, and more cache-friendly than a traditional trie; and a branchless hex-to-byte conversion statement. The results are a 30%-faster microbenchmark on amd64, with 50% less Young Gen garbage collections. Lovely to see low-level infrastructure libs like Jetty getting this kind of optimization.
jetty  java  mechanical-sympathy  optimization  coding  tries 
february 2013 by jm
« earlier      
per page:    204080120160

related tags

1980s  acceleration  actors  admin  advice  aes  age  agile  akka  algorithms  allan-klumpp  annotations  api  apis  apollo-program  apple  approximate  approximation  architecture  architecture-astronauts  archival  arrays  articles  ascii  assembly  async  atomic  austerity  automation  autosave  backoff  bdd  benchmarks  best-practices  big-o  biology  bitrot  bits  block-oriented  bloom-filter  bloom-filters  book  books  branch  branch-prediction  branching  brogrammers  bst  bug-tracking  bugs  build  building  business  c  c++  c-i  c=64  cache  camry  cap  capn-proto  cardinality  career  cas  cbc  cep  cheat-sheet  chef  children  chrome  chronon  clean-code  cli  client-side  clojure  cms  code  code-digger  code-review  code-reviews  code-smells  code-verification  coderdojo  coding  coding-standards  collaboration  collections  combinatorial  community  compatibility  compilation  compiler  compilers  complexity  compression  computation  computational-biology  computer-science  computing  concurrency  configuration  const  constraint-solving  continous-integration  continuous-deployment  contracts  conversion  cork  corrupt  cost  crash-only-software  crashing  crdts  crypto  cryptography  cs  csail  css  ctr  cucumber  cuda  culture  currency  cyclomatic-complexity  cyoa  dashcode  data  data-structures  databases  david-ungar  debt  debugger  debugging  decay  demos  dependency-injection  deploy  deployment  deplyment  design  dev  development  devops  display  disruptor  dist-sys  distcomp  distributed  distributed-systems  djb  dmitry-vyukov  don-eyles  dot-net  doug-lea  download  dry  dsl  duct-tape  dvr  ecb  ecc  eclipse  economics  economy  editors  education  eiffel  elitism  embedded-systems  emulation  encapsulation  encryption  engineering  engines  entropy  erlang  error-checking  errors  essay  estimation  estonia  etsy  event-sourcing  events  eventual-consistency  evernote  evolution  excel  experts  exploits  exponential-backoff  extensions  fail  false-positives  fault-tolerance  feminism  final  finance  findbugs  firefighting  firefox  firmware  fixing  flake8  flickr  float  floating-point  fluent-interfaces  formats  fortran  fp  free  frp  fsm  functional  functional-programming  funny  fuzzy-matching  g1  ga  games  gaming  garbage-collection  gc  gdb  geek  gems  genetic-algorithms  gerrit  gil  gil-tene  girls  git  gmail  go  google  google-drive  goto  goto-fail  gpu  graph  graphs  guardian  guava  guidelines  hacker-news  hackers  hacking  hacks  hardware  hash-tables  hashing  hashtables  hax  hdr  head-mounted-display  heap  hero-coder  hero-culture  hex  hijack  hiring  histogram  history  hll  hmac  hobbies  honeypots  horror  hotspot  html  http  https  humor  hyperloglog  i7  i14y  ibm  ide  idea  ides  immutability  incident-response  inept  input  integers  integration  intel  intel-core  intellij  interactive  interfaces  internet  interoperability  interpreters  interviews  invalid  invariants  ios  iphone  irb  ireland  james-hamilton  jargon  java  java-8  javascript  jay-kreps  jeff-atwood  jenkins  jersey  jetty  jgc  jobs  joel-spolsky  john-carmack  jokes  jpl  jpmorgan  jq  js  js1k  json  justin-bieber  jvm  jwz  kafka  kernel  kids  knowledge  language  languages  latency  learning  lectures  legal  leonard-richardson  let-it-fail  libraries  library  life  like  linkedin  linux  linux-journal  lisp  live  load-balancing  lock-free  locking  log  log4j  logging  loglog  london-whale  lookup3  lua  lucene  magic  make  makefiles  mame  management  martin-fowler  martin-thompson  mathematics  maths  matrix  measurement  measuring  mechanical-sympathy  meebo  memory  messaging  metrics  microreboot  microsoft  minecraft  misra-c  mit  mocking  money  mozilla  multicore  multiprocessing  murmurhash  mysql  nasa  nbta  ncsu  neologisms  netflix  network  networking  node.js  nostalgia  numbers  observable  one-liners  oo  oop  open-source  openbsd  openssl  ops  optimization  organisations  osx  ouch  overengineering  pair-programming  parallel  parallelism  partitions  patents  patterns  paul-krugman  pdf  peer-pressure  percona  performance  philosophy  play  plos  politics  poul-henning-kemp  preconditions  premature-flexibilization  printf  private-keys  probabilistic  processors  production  profiling  programming  programming-languages  project-management  prophet  protobuf  protobufs  protocol-buffers  protocols  provisioning  proxies  pt-query-digest  pthreads  puzzles  python  q-digest  qa  qnx  quake-3  quality  quants  querying  questions  queue  queues  quotes  race-and-repair  radix-sort  rafe-colburn  rails  rake  random  raspberry-pi  reactive  real-time  realtime  record  recordinality  recovery  recruiting  redis  redo  reference  reform  refs  refuctoring  relearning  reliability  remote  remote-work  repl  replay  replication  rest  restful  retries  revert  reviews  rips  rob-pike  ross-anderson  rpc  rr  rsa  rspec  rtos  ruby  rubygems  rusty-russell  rx  safety  sampling  scala  scalability  scaling  school  schools  science  scripting  scrum  sd  sde  sde-fundamentals  search  searching  security  sed  semantics  semaphores  senior  serialization  server  services  set  set-cover  sets  sexism  sharding  shell-scripts  shellcode  side-channels  silicon-valley  simd  sip  sketching  skills  skiplists  slang  slides  society  software  software-development  solver  sorting  soundcloud  source-code  source-control  space  space-saving  spacex  spaghetti-code  specifications  speech  speed  spreadsheets  spy-hunter  sql  sse  ssh  ssl  stack-overflow  starcraft  steven-skiena  storage  strchr  stream-processing  streams  string-matching  string-search  stringly-typed  strings  strlen  strstr  students  studies  style  succinct  succinct-encoding  sux  swpats  synchronization  syntax  sysadmin  systems  tags  takedowns  tcpdump  tdd  teaching  teams  tech  tech-debt  techdirt  tee  telecommuting  testability  testing  tests  text  text-matching  the-duck  thomas-ptacek  thread-safety  threading  threads  time  time-warp  timecop  tips  tls  tools  top-k  toread  toyota  trac  trading  transactions  trees  tricks  tridge  tries  tuning  turing-complete  twisted  twitter  ui  unit-testing  unit-tests  unix  usa  user-scripts  vagrant  validation  value-at-risk  varnish  version-control  via:ben  via:bos  via:cjhorn  via:cliffc  via:fanf  via:iamcal  via:janl  via:jzawodny  via:Mozai  via:peakscale  via:preddit  via:proggit  via:sergio-bossa  via:tom  via:twitter  video  vietnam  vim  virtual-clock  vision  vms  vnc  volatile  walmart  web  web-services  witchcraft  women  work  workflows  wtf  xp  yagni  zerg-rush 

Copy this bookmark:



description:


tags: