jm + bugs   78

terrible review for Solidity as a programming environment in HN
"Solidity/EVM is by far the worst programming environment I have ever encountered. It would be impossible to write even toy programs correctly in this language, yet it is literally called "Solidity" and used to program a financial system that manages hundreds of millions of dollars."

Via Tony Finch
blockchain  ethereum  programming  coding  via:fanf  funny  fail  floating-point  money  json  languages  bugs  reliability 
july 2017 by jm
An empirical study on the correctness of formally verified distributed systems
We must recognise that even formal verification can leave gaps and hidden assumptions that need to be teased out and tested, using the full battery of testing techniques at our disposal. Building distributed systems is hard. But knowing that shouldn’t make us shy away from trying to do the right thing, instead it should make us redouble our efforts in our quest for correctness.
formal-verification  software  coding  testing  tla+  chapar  fuzzing  verdi  bugs  papers 
may 2017 by jm
How Space Weather Can Influence Elections on Earth - Motherboard
oh, god -- I'm not keen on this take: how's about designing systems that recognise the risks?
"Everything was going fine, but then suddenly, there were an additional 4,000 votes cast. Because it was a local election, which are normally very small, people were surprised and asked, 'how did this happen?'"

The culprit was not voter fraud or hacked machines. It was a single event upset (SEU), a term describing the fallout of an ionizing particle bouncing off a vulnerable node in the machine's register, causing it to flip a bit, and log the additional votes. The Sun may not have been the direct source of the particle—cosmic rays from outside the solar system are also in the mix—but solar-influenced space weather certainly contributes to these SEUs.
bit-flips  science  elections  voting-machines  vvat  belgium  bugs  risks  cosmic-rays 
february 2017 by jm
DST breaks everything
LOL as DST bug uncovers spurious automated noise complaints:
In January last year the airport unearthed a scheme whereby campaigners were using automated software to generate complaints against the airport. Officials caught out the set-up when the two anti-Heathrow enthusiasts forgot to take into account the hour going back in October, and began complaining about flights that had not yet taken off or arrived.
bugs  dst  daylight-savings-time  funny  heathrow  complaints  automation  noise 
november 2016 by jm
Simple testing can prevent most critical failures
Specifically, the following 3 classes of errors were implicated in 92% of the major production outages in this study and could have been caught with simple code review:
Error handlers that ignore errors (or just contain a log statement); error handlers with “TODO” or “FIXME” in the comment; and error handlers that catch an abstract exception type (e.g. Exception or Throwable in Java) and then take drastic action such as aborting the system.

(Interestingly, the latter was a particular favourite approach of some misplaced "fail fast"/"crash-only software design" dogma in Amazon. I wasn't a fan)
fail-fast  crash-only-software  coding  design  bugs  code-review  review  outages  papers  logging  errors  exceptions 
october 2016 by jm
[RFE] add a way to run in a new systemd scope automatically · Issue #428 · tmux/tmux
omgwtfbbq. 1: User reports that their gnome session leaks processes; 2: systemd modifies default session behaviour to kill all processes, including screen/tmux; 3: _everyone_ complains because they break 30 years of UNIX process semantics, then 4: they request that tmux/screen hack their shit to workaround their brokenness. Get fucked, systemd. This is the kind of shit that would finally drive me to BSDland
systemd  horror  linux  fail  unix  gnome  tmux  bugs  omgwtfbbq 
may 2016 by jm
Gradle plugin that allows easy integration with the infer static analyzer
infer  java  static-analysis  bugs  coding  null 
may 2016 by jm
Linux kernel bug delivers corrupt TCP/IP data to Mesos, Kubernetes, Docker containers — Vijay Pandurangan
Bug in the "veth" driver skips TCP checksums. Reminder: app-level checksums are important
checksums  tcp  veth  ethernet  drivers  linux  kernel  bugs  docker 
april 2016 by jm
Google Cloud Status
Ouch, multi-region outage:
At 14:50 Pacific Time on April 11th, our engineers removed an unused GCE IP block from our network configuration, and instructed Google’s automated systems to propagate the new configuration across our network. By itself, this sort of change was harmless and had been performed previously without incident. However, on this occasion our network configuration management software detected an inconsistency in the newly supplied configuration. The inconsistency was triggered by a timing quirk in the IP block removal - the IP block had been removed from one configuration file, but this change had not yet propagated to a second configuration file also used in network configuration management. In attempting to resolve this inconsistency the network management software is designed to ‘fail safe’ and revert to its current configuration rather than proceeding with the new configuration. However, in this instance a previously-unseen software bug was triggered, and instead of retaining the previous known good configuration, the management software instead removed all GCE IP blocks from the new configuration and began to push this new, incomplete configuration to the network.

One of our core principles at Google is ‘defense in depth’, and Google’s networking systems have a number of safeguards to prevent them from propagating incorrect or invalid configurations in the event of an upstream failure or bug. These safeguards include a canary step where the configuration is deployed at a single site and that site is verified to still be working correctly, and a progressive rollout which makes changes to only a fraction of sites at a time, so that a novel failure can be caught at an early stage before it becomes widespread. In this event, the canary step correctly identified that the new configuration was unsafe. Crucially however, a second software bug in the management software did not propagate the canary step’s conclusion back to the push process, and thus the push system concluded that the new configuration was valid and began its progressive rollout.
multi-region  outages  google  ops  postmortems  gce  cloud  ip  networking  cascading-failures  bugs 
april 2016 by jm
Clampers have to clock off as hour change crashes system
DST strikes again:

The failure of the ParkbyText system, operated by National Controlled Parking Systems (NCPS), was described by one employee contacted by a midlands motorist unable to pay for his parking at a train station as a “Y2K moment”. The system failure caused early morning panic for thousands of drivers who tried unsuccessfully to use text messages or an app to pay for their parking ahead of returning to work after the bank holiday weekend.

Impact was that they had to stop enforcement until the day passed, I think.
parkbytext  sms  parking  ireland  ncps  dst  fail  bugs 
march 2016 by jm
The Three Go Landmines
'There are three easy to make mistakes in go. I present them here in the way they are often found in the wild, not in the way that is easiest to understand. All three of these mistakes have been made in Kubernetes code, getting past code review at least once each that I know of.'
k8s  go  golang  errors  coding  bugs 
march 2016 by jm
TIL: clock skew exists
good roundup of real-world clock skew links
clocks  clock-skew  ntp  realtime  time  bugs  distcomp  reliability  skew 
february 2016 by jm
OnePlus 2 and OnePlus X damaging Deutsche Telekom SIM cards
I can confirm, there is a help forum from the "deutsche telekom", they say there is a feature called MEC (it's mainly for setting phone parameters to match their network), active on all their SIM cards, which is not correctly handled by any of the OnePlus Devices (one, two, x) so it writes constantly to flash memory, killing it arround 100.000 writes which is 3-6 weeks.

(via Mike Walsh on the Irish tech slack)
via:itc  oneplus  phones  sim-cards  mec  deutsche-telekom  bugs  flash 
february 2016 by jm
How Completely Messed Up Practices Become Normal
on Normalization of Deviance, with a few anecdotes from Silicon Valley. “The gradual process through which unacceptable practice or standards become acceptable. As the deviant behavior is repeated without catastrophic results, it becomes the social norm for the organization.”
normalization-of-deviance  deviance  bugs  culture  ops  reliability  work  workplaces  processes  norms 
december 2015 by jm
Valid MFA token does not work during first 1am hour before daylight savings ends and second 1am hour starts · Issue #1611 · aws/aws-cli
Add another one to the "yay for DST" pile. (also yay for AWS using PST/PDT as default internal timezone instead of UTC...)
utc  timezones  fail  bugs  aws  aws-cli  dst  daylight-savings  time 
november 2015 by jm
Twins denied driver’s permit because DMV can’t tell them apart
"The computer can recognize faces, a feature that comes in handy if somebody’s is trying to get an illegal ID. It apparently is not programmed to detect twins."

As Hilary Mason put it: "You do not want to be an edge case in this future we are building."
future  grim  bugs  twins  edge-cases  coding  fail  dmv  software  via:hmason 
october 2015 by jm
Facebook Infer
New static analysis goodnews, freshly open-sourced by Facebook:
Facebook Infer uses logic to do reasoning about a program's execution, but reasoning at this scale — for large applications built from millions of lines of source code — is hard. Theoretically, the number of possibilities that need to be checked is more than the number of estimated atoms in the observable universe. Furthermore, at Facebook our code is not a fixed artifact but an evolving system, updated frequently and concurrently by many developers. It is not unusual to see more than a thousand modifications to our mobile code submitted for review in a given day. The requirements on the program analyzer then become even more challenging because we expect a tool to report quickly on these code modifications — in the region of 10 minutes — to fit in with developers' workflow. Coping with this scale and velocity requires advanced mathematical techniques. Facebook Infer uses two such techniques: separation logic and bi-abduction.

Separation logic is a theory that allows Facebook Infer's analysis to reason about small, independent parts of the application storage, rather than having to consider the entirety of the memory potentially at every step. That would be a daunting task on modern processors with their large addressable virtual memories.

Bi-abduction is a logical inference technique that allows Facebook Infer to discover properties about the behavior of independent parts of the application code. By storing these properties between runs, Facebook Infer needs to analyze only the parts of the software that have changed, reusing the results of its previous analysis where it can.

By combining these approaches, our analyzer is able to find complex problems in modifications to an application built from millions of lines of code, in minutes.

(via Bryan O'Sullivan)
via:bos  infer  facebook  static-analysis  lint  code  java  ios  android  coding  bugs 
june 2015 by jm
AV vendors still relying on MD5 to identify malware
oh dear. I can see how this happened -- in many cases they may not still have samples to derive new sums from :(
md5  hashing  antivirus  malware  security  via:fanf  bugs 
june 2015 by jm
murbul comments on The security issue of's Android Wallet is not about system's entropy. It's their own BUGs on PRNG again!
I was in the middle of writing a breakdown of what went wrong, but you've beat me to it.
Basically, they have a LinuxSecureRandom class that's supposed to override the standard SecureRandom. This class reads from /dev/urandom and should provide cryptographically secure random values.
They also seed the generator using SecureRandom#setSeed with data pulled from With their custom SecureRandom, this is safe because it mixes the entropy using XOR, so even if the data is dodgy it won't reduce security. It's just an added bonus.
BUT! On some devices under some circumstances, the LinuxSecureRandom class doesn't get registered. This is likely because /dev/urandom doesn't exist or can't be accessed for some reason. Instead of screaming bloody murder like any sensible implementation would, they just ignore that and fall back to using the standard SecureRandom.
If the above happens, there's a problem because the default implementation of SecureRandom#setSeed doesn't mix. If you set the seed, it replaces the entropy entirely. So now the entropy is coming solely from
And the final mistake: They were using HTTP instead of HTTPS to make the webservice call to On Jan 4, started enforcing HTTPS and returning a 301 Permanently Moved error for HTTP - see So since that date, the entropy has actually been the error message (turned into bytes) instead of the expected 256-bit number. Using that seed, SecureRandom will generate the private key for address 1Bn9ReEocMG1WEW1qYjuDrdFzEFFDCq43F 100% of the time. Ouch. This is around the time that address first appears, so the timeline matches.
I haven't had a thorough look at what they've replaced it with in the latest version, but initial impressions are that it's not ideal. Not disastrous, but not good.

Always check return values; always check HTTP status codes.
bugs  android  fail  securerandom  random  prng  bitcoin  http  randomness  entropy  error-checking 
may 2015 by jm
iPhone UTF-8 text vulnerability
'Due to how the banner notifications process the Unicode text. The banner briefly attempts to present the incoming text and then "gives up" thus the crash'. Apparently the entire Springboard launcher crashes.
apple  vulnerability  iphone  utf-8  unicode  fail  bugs  springboard  ios  via:abetson 
may 2015 by jm
Linux futex_wait() bug
major bug in kernel versions 3.14 - 3.18 on Haswell hardware
haswell  linux  futex_wait  futexes  kernel  bugs  hang 
may 2015 by jm
The Discovery of Apache ZooKeeper's Poison Packet - PagerDuty
Excellent deep dive into a production issue. Root causes: crappy error handling code in Zookeeper; lack of bounds checking in ZK; and a nasty kernel bug.
zookeeper  bugs  error-handling  bounds-checking  oom  poison-packets  pagerduty  packets  tcpdump  xen  aes  linux  kernel 
may 2015 by jm
Race conditions on Facebook, DigitalOcean and others
good trick -- exploit eventual consistency and a lack of distributed transactions by launching race-condition-based attacks
attacks  exploits  race-conditions  bugs  eventual-consistency  distributed-transactions  http  facebook  digitalocean  via:aphyr 
april 2015 by jm
32-bit overflow in BitGo js code caused an accidental 85 BTC transaction fee
Yes, this is a fucking 32-bit integer overflow. Whatever software was used, it calculated the sum of all inputs using 32-bit variables, which overflow at about 20 BTC if signed or 40 BTC if not. The fee was supposed to be 0xC350 = 50,000 satoshis, but it turned out to be 0x2,0000,C350 = 8,589,984,592 satoshis.
Captains of the industry. If they were captains of any other industry, like say for example automotive, we'd have people dying in car crashes between two stationary vehicles.
bitcoin  fail  bitgo  javascript  bugs  32-bit  overflow  btc 
april 2015 by jm
When S3's eventual consistency is REALLY eventual
a consistency outage in S3 last year, resulting in about 40 objects failing read-after-write consistency for a duration of about 23 hours
s3  eventual-consistency  aws  consistency  read-after-writes  bugs  outages  stackdriver 
april 2015 by jm
On Ruby
The horrors of monkey-patching:
I call out the Honeybadger gem specifically because was the most recent time I'd been bit by a seemingly good thing promoted in the community: monkey patching third party code. Now I don't fault Honeybadger for making their product this way. It provides their customers with direct business value: "just require 'honeybadger' and you're done!" I don't agree with this sort of practice. [....]

I distrust everything [in Ruby] but a small set of libraries I've personally vetted or are authored by people I respect. Why is this important? Without a certain level of scrutiny you will introduce odd and hard to reproduce bugs. This is especially important because Ruby offers you absolutely zero guarantee whatever the state your program is when a given method is dispatched. Constants are not constants. Methods can be redefined at run time. Someone could have written a time sensitive monkey patch to randomly undefined methods from anything in ObjectSpace because they can. This example is so horribly bad that no one should every do, but the programming language allows this. Much worse, this code be arbitrarily inject by some transitive dependency (do you even know what yours are?).
ruby  monkey-patching  coding  reliability  bugs  dependencies  libraries  honeybadger  sinatra 
april 2015 by jm
Avro, mail # dev - bytes and fixed handling in Python implementation - 2014-09-04, 22:54
More Avro trouble with "bytes" fields! Avoid using "bytes" fields in Avro if you plan to interoperate with either of the Python implementations; they both fail to marshal them into JSON format correctly. This is the official "avro" library, which produces UTF-8 errors when a non-UTF-8 byte is encountered
bytes  avro  marshalling  fail  bugs  python  json  utf-8 
march 2015 by jm
tebeka / fastavro / issues / #11 - fastavro breaks dumping binary fixed [4] — Bitbucket
The Python "fastavro" library cannot correctly render "bytes" fields. This is a bug, and the maintainer is acting in a really crappy manner in this thread. Avoid this library
fastavro  fail  bugs  utf-8  bytes  encoding  asshats  open-source  python 
march 2015 by jm
The Four Month Bug: JVM statistics cause garbage collection pauses (
Ugh, tying GC safepoints to disk I/O? bad idea:
The JVM by default exports statistics by mmap-ing a file in /tmp (hsperfdata). On Linux, modifying a mmap-ed file can block until disk I/O completes, which can be hundreds of milliseconds. Since the JVM modifies these statistics during garbage collection and safepoints, this causes pauses that are hundreds of milliseconds long. To reduce worst-case pause latencies, add the -XX:+PerfDisableSharedMem JVM flag to disable this feature. This will break tools that read this file, like jstat.
bugs  gc  java  jvm  disk  mmap  latency  ops  jstat 
march 2015 by jm
Bug Prediction at Google
LOL. grepping commit logs for /bug|fix/ does the job, apparently:
In the literature, Rahman et al. found that a very cheap algorithm actually performs almost as well as some very expensive bug-prediction algorithms. They found that simply ranking files by the number of times they've been changed with a bug-fixing commit (i.e. a commit which fixes a bug) will find the hot spots in a code base. Simple! This matches our intuition: if a file keeps requiring bug-fixes, it must be a hot spot because developers are clearly struggling with it.
bugs  rahman-algorithm  heuristics  source-code-analysis  coding  algorithms  google  static-code-analysis  version-control 
march 2015 by jm
Do not use 'YYYY' or '%G' in time format specifiers
Formats the year based on ISO week numbering, which often is not what you want. Both have been responsible for high-profile production bugs (in Apple and Android).
apple  android  bugs  time  date  year  iso  week  formatting  strftime  posix 
january 2015 by jm
Two recent systemd crashes
Hey look, PID 1 segfaulting! I haven't seen that happen since we managed to corrupt /bin/sh on Ultrix in 1992. Nice work Fedora
fedora  reliability  unix  linux  systemd  ops  bugs 
december 2014 by jm
OS X doesn't support 'ndots' DNS resolution
"ping" will not append the "search" domains configured in /etc/resolv.conf. Apparently this has been broken since OS X Lion, no sign of a fix. Nice work Apple
apple  fail  bugs  resolv  dns  domains  osx 
november 2014 by jm
Why Gandhi Is Such An Asshole In Civilization
When a player adopted democracy in Civilization, their aggression would be automatically reduced by 2. Code being code, if Gandhi went democratic his aggression wouldn't go to -1, it looped back around to the ludicrously high figure of 255, making him as aggressive as a civilization could possibly be.
civ  civilization  funny  videogames  bugs  gandhi  nuclear-war  integers  overflow 
november 2014 by jm
Elastic MapReduce vs S3
Turns out there are a few bugs in EMR's S3 support, believe it or not.

1. 'Consider disabling Hadoop's speculative execution feature if your cluster is experiencing Amazon S3 concurrency issues. You do this through the and mapred.reduce.tasks.speculative.execution configuration settings. This is also useful when you are troubleshooting a slow cluster.'

2. Upgrade to AMI 3.1.0 or later, otherwise retries of S3 ops don't work.
s3  emr  hadoop  aws  bugs  speculative-execution  ops 
october 2014 by jm
Testing fork time on AWS/Xen infrastructure
Redis uses forking to perform persistence flushes, which means that once every 30 minutes it performs like crap (and kills the 99th percentile latency). Given this, various Redis people have been benchmarking fork() times on various Xen platforms, since Xen has a crappy fork() implementation
fork  xen  redis  bugs  performance  latency  p99 
october 2014 by jm
Falsehoods programmers believe about time
I have repeatedly been confounded to discover just how many mistakes in both test and application code stem from misunderstandings or misconceptions about time. By this I mean both the interesting way in which computers handle time, and the fundamental gotchas inherent in how we humans have constructed our calendar — daylight savings being just the tip of the iceberg.

In fact I have seen so many of these misconceptions crop up in other people’s (and my own) programs that I thought it would be worthwhile to collect a list of the more common problems here.

See also the follow-up:

(via Marc)
via:marcomorain  time  dates  timezones  coding  gotchas  calendar  bugs 
october 2014 by jm
To "patch" software comes from a physical patch applied to paper tape
hmason: TIL that the phrase software "patch" is from a physical patch applied to Mark 1 paper tape to modify the program.

It's amazing how a term like that can become so divorced from its original meaning so effectively. History!
history  computing  software  patch  paper-tape  patching  bugs 
october 2014 by jm
#5045 (epoll_reactor::update_timeout() uses incorrect interrupter if TIMERFD is not available) – Boost C++ Libraries
ah, memories. This is the bug that caused me to have to run a fleet-wide upgrade across the EC2 substrate. Thanks, boost::asio!
bugs  network-monitoring  boost  boost-asio  memories  history 
september 2014 by jm
Comment #28 : Bug #255161 : Bugs : “cupsys” package : Ubuntu
file(1) bug causes the input Postscript file to be misidentified as an Erlang JAM file if it contains the string 'Tue' starting at byte 4.
via:hackernews  file  unix  cups  printing  funny  bugs  fail  ubuntu  linux 
august 2014 by jm
Google's purify/valgrind-like concurrency checking tool:

'As a bonus, ThreadSanitizer finds some other types of bugs: thread leaks, deadlocks, incorrect uses of mutexes, malloc calls in signal handlers, and more. It also natively understands atomic operations and thus can find bugs in lock-free algorithms. [...] The tool is supported by both Clang and GCC compilers (only on Linux/Intel64). Using it is very simple: you just need to add a -fsanitize=thread flag during compilation and linking. For Go programs, you simply need to add a -race flag to the go tool (supported on Linux, Mac and Windows).'
concurrency  bugs  valgrind  threadsanitizer  threading  deadlocks  mutexes  locking  synchronization  coding  testing 
june 2014 by jm
ByteArrayOutputStream is really, really slow sometimes in JDK6
This leads us to the bug. The size of the array is determined by Math.max(buf.length << 1, newcount). Ordinarily, buf.length << 1 returns double buf.length, which would always be much larger than newcount for a 2 byte write. Why was it not? The problem is that for all integers larger than Integer.MAX_INTEGER / 2, shifting left by one place causes overflow, setting the sign bit. The result is a negative integer, which is always less than newcount. So for all byte arrays larger than 1073741824 bytes (i.e. one GB), any write will cause the array to resize, and only to exactly the size required.

bugs  java  jdk6  bytearrayoutputstream  impala  performance  overflow 
june 2014 by jm
Stuck in the iMessage abyss? Here’s how to get your texts back
some potential (apocryphal) workarounds for this extremely annoying Apple bug
apple  bugs  imessage  sms  phones  mobile  android  hacks 
may 2014 by jm
iMessage purgatory
Oh Apple, you asshats. This is some seriously shitty programming. iMessage on iOS devices caches the "iMessage-capable" flag for all numbers, indefinitely, so if you switch from iPhone to Android, messages from your friends' iPhones won't get delivered to you henceforth -- and to add insult to injury, it claims they do with a "Delivered." status appearing under the message. This is happening to me right now...
apple  sms  messaging  phones  mobile  imessage  android  fail  bad-programming  bugs 
may 2014 by jm
Mark McLoughlin on Heartbleed
An excellent list of aspects of the Heartbleed OpenSSL bug which need to be thought about/talked about/considered
heartbleed  openssl  bugs  exploits  security  ssl  tls  web  https 
april 2014 by jm
The little ssh that (sometimes) couldn't - Mina Naguib
A good demonstration of what it looks like when network-level packet corruption occurs on a TCP connection
ssh  sysadmin  networking  tcp  bugs  bit-flips  cosmic-rays  corruption  packet 
april 2014 by jm
Issue 122 - android-query - HTTP 204 Response results in Network Error (-101)
an empty 204 response to a HTTP PUT will trigger this. See also, '" unexpected end of stream" on HttpURLConnection HEAD call'.
http  urlconnection  httpurlconnection  java  android  dalvik  bugs  204  head  get  exceptions 
march 2014 by jm
java - Why not use Double or Float to represent currency?
A good canonical URL for this piece of coding guidance.
For example, suppose you have $1.03 and you spend 42c. How much money do you have left?

System.out.println(1.03 - .42); => prints out 0.6100000000000001.
coding  tips  floating-point  float  java  money  currency  bugs 
february 2014 by jm
Git is not scalable with too many refs/*
Mailing list thread from 2011; git starts to keel over if you tag too much
git  tags  coding  version-control  bugs  scaling  refs 
february 2014 by jm
error-prone - Catch common Java mistakes as compile-time errors
It's common for even the best programmers to make simple mistakes. And commonly, a refactoring which seems safe can leave behind code which will never do what's intended. We're used to getting help from the compiler, but it doesn't do much beyond static type checking. Using error-prone to augment the compiler's static analysis, you can catch more mistakes before they cost you time, or end up as bugs in production. We use error-prone in Google's Java build system to eliminate classes of serious bugs from entering our code, and we've open-sourced it, so you can too!
analysis  java  static-analysis  code  errors  bugs 
november 2013 by jm
Mac OS 10.9 – Infinity times your spam
a pretty stupid IMAP bug hoses Fastmail:
Yes you read that right. It’s copying all the email from the Junk Folder back into the Junk Folder again!. This is legal IMAP, so our server proceeds to create a new copy of each message in the folder. It then expunges the old copies of the messages, but it’s happening so often that the current UID on that folder is up to over 3 million. It was just over 2 million a few days ago when I first emailed the user to alert them to the situation, so it’s grown by another million since. The only way I can think this escaped QA was that they used a server which (like gmail) automatically suppresses duplicates for all their testing, because this is a massively bad problem.
osx  bugs  mail  imap  fastmail  fail 
october 2013 by jm
How to configure ntpd so it will not move time backwards
The "-x" switch will expand the step/slew boundary from 128ms to 600 seconds, ensuring the time is slewed (drifted slowly towards the correct time at a max of 5ms per second) rather than "stepped" (a sudden jump, potentially backwards). Since slewing has a max of 5ms per second, time can never "jump backwards", which is important to avoid some major application bugs (particularly in Java timers).
ntpd  time  ntp  ops  sysadmin  slew  stepping  time-synchronization  linux  unix  java  bugs 
august 2013 by jm
Randomly Failed! The State of Randomness in Current Java Implementations
This would appear to be the paper which sparked off the drama around BitCoin thefts from wallets generated on Android devices:

The SecureRandom PRNG is the primary source of randomness for Java and is used e.g., by cryptographic operations. This underlines its importance regarding security. Some of fallback solutions of the investigated implementations [are] revealed to be weak and predictable or capable of being influenced. Very alarming are the defects found in Apache Harmony, since it is partly used by Android.

More on the BitCoin drama: ,
android  java  prng  random  security  bugs  apache-harmony  apache  crypto  bitcoin  papers 
august 2013 by jm
An excellent writeup of the TCP bounded-buffer deadlock problem
on pages 146-149 of 'TCP/IP Sockets in C: Practical Guide for Programmers' by Michael J. Donahoo and Kenneth L. Calvert.
tcp  ip  bounded-buffer  deadlock  bugs  buffering  connections  distributed-systems 
july 2013 by jm
the TCP bounded buffer deadlock problem
I've wound up mentioning this twice in the past week, so it's worth digging up and bookmarking!
Under certain circumstances a TCP connection can end up in a "deadlock", where neither the client nor the server is able to write data out or read data in. This is caused by two factors. First, a client or server cannot perform two transactions at once; a read cannot be performed if a write transaction is in progress, and vice versa. Second, the buffers that exist at either end of the TCP connection are of limited size. The deadlock occurs when both the client and server are trying to send an amount of data that is larger than the combined input and output buffer size.
tcp  ip  bounded-buffer  deadlock  bugs  buffering  connections  distributed-systems 
july 2013 by jm
You probably shouldn’t use a spreadsheet for important work
Daniel Lemire comments on the recent cases of bugs in spreadsheets causing major impact:
There are several critical problems with a tool like Excel that need to be widely known:

* Spreadsheets do not support testing. For anything that matters, you should validate and test your code automatically and systematically;

* Spreadsheets make code reviews impractical. To visually inspect the code, you need to click and each and every cell. In practice, this means that you cannot reasonably ask someone to read over your formulas to make sure that there is no mistake;

* Spreadsheets encourage redundancies. Spreadsheets encourage copy-and-paste. Though copying and pasting is sometimes the right tool, it also creates redundancies. These redundancies make it very difficult to update a spreadsheet: are you absolutely sure that you have changed the formula throughout?

Agreed on all three, particularly on the impossibility of testing. IMO, everyone who may be in a job where automation via spreadsheet is likely, needs training in SDE fundamentals: unit testing, the important of open source and open data for reproducibility, version control, and code review. We are all computer scientists now.
spreadsheets  excel  coding  errors  bugs  testability  unit-testing  testing  quality  sde  sde-fundamentals  dry 
april 2013 by jm
The Excel Depression -
Krugman on the Reinhart-Rogoff Excel-bug fiasco.
What the Reinhart-Rogoff affair shows is the extent to which austerity has been sold on false pretenses. For three years, the turn to austerity has been presented not as a choice but as a necessity. Economic research, austerity advocates insisted, showed that terrible things happen once debt exceeds 90 percent of G.D.P. But “economic research” showed no such thing; a couple of economists made that assertion, while many others disagreed. Policy makers abandoned the unemployed and turned to austerity because they wanted to, not because they had to. So will toppling Reinhart-Rogoff from its pedestal change anything? I’d like to think so. But I predict that the usual suspects will just find another dubious piece of economic analysis to canonize, and the depression will go on and on.
paul-krugman  economics  excel  coding  bugs  software  austerity  debt 
april 2013 by jm
Austerity policies founded on Excel typo
You've probably heard that countries with a high debt:GDP ratio suffer from slow economic growth. The specific number 90 percent has been invoked frequently. That's all thanks to a study conducted by Carmen Reinhardt and Kenneth Rogoff for their book This Time It's Different. But the results have been difficult for other researchers to replicate. Now three scholars at the University of Massachusetts have done so in "Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff" and they find that the Reinhart/Rogoff result is based on opportunistic exclusion of Commonwealth data in the late-1940s, a debatable premise about how to weight the data, and most of all a sloppy Excel coding error.

Read Mike Konczal for the whole rundown, but I'll just focus on the spreadsheet part. At one point they set cell L51 equal to AVERAGE(L30:L44) when the correct procuedure was AVERAGE(L30:L49). By typing wrong, they accidentally left Denmark, Canada, Belgium, Austria, and Australia out of the average. When you run the math correctly "the average real GDP growth rate for countries carrying a public debt-to-GDP ratio of over 90 percent is actually 2.2 percent, not -0.1 percent."
austerity  politics  excel  coding  errors  bugs  spreadsheets  economics  economy 
april 2013 by jm
JPL Institutional Coding Standard for the Java Programming Language
From JPL's Laboratory for Reliable Software (LaRS). Great reference; there's some really useful recommendations here, and good explanations of familiar ones like "prefer composition over inheritance". Many are supported by FindBugs, too.

Here's the full list:

compile with checks turned on;
apply static analysis;
document public elements;
write unit tests;
use the standard naming conventions;
do not override field or class names;
make imports explicit;
do not have cyclic package and class dependencies;
obey the contract for equals();
define both equals() and hashCode();
define equals when adding fields;
define equals with parameter type Object;
do not use finalizers;
do not implement the Cloneable interface;
do not call nonfinal methods in constructors;
select composition over inheritance;
make fields private;
do not use static mutable fields;
declare immutable fields final;
initialize fields before use;
use assertions;
use annotations;
restrict method overloading;
do not assign to parameters;
do not return null arrays or collections;
do not call System.exit;
have one concept per line;
use braces in control structures;
do not have empty blocks;
use breaks in switch statements;
end switch statements with default;
terminate if-else-if with else;
restrict side effects in expressions;
use named constants for non-trivial literals;
make operator precedence explicit;
do not use reference equality;
use only short-circuit logic operators;
do not use octal values;
do not use floating point equality;
use one result type in conditional expressions;
do not use string concatenation operator in loops;
do not drop exceptions;
do not abruptly exit a finally block;
use generics;
use interfaces as types when available;
use primitive types;
do not remove literals from collections;
restrict numeric conversions;
program against data races;
program against deadlocks;
do not rely on the scheduler for synchronization;
wait and notify safely;
reduce code complexity
nasa  java  reference  guidelines  coding-standards  jpl  reliability  software  coding  oo  concurrency  findbugs  bugs 
march 2013 by jm
KDE's brush with git repository corruption: post-mortem
a barely-averted disaster... phew.

while we planned for the case of the server losing a disk or entirely biting the dust, or the total loss of the VM’s filesystem, we didn’t plan for the case of filesystem corruption, and the way the corruption affected our mirroring system triggered some very unforeseen and pathological conditions. [...] the corruption was perfectly mirrored... or rather, due to its nature, imperfectly mirrored. And all data on the anongit [mirrors] was lost.

One risk demonstrated: by trusting in mirroring, rather than a schedule of snapshot backups covering a wide time range, they nearly had a major outage. Silent data corruption, and code bugs, happen -- backups protect against this, but RAID, replication, and mirrors do not.

Another risk: they didn't have a rate limit on project-deletion, which resulted in the "anongit" mirrors deleting their (safe) data copies in response to the upstream corruption. Rate limiting to sanity-check automated changes is vital. What they should have had in place was described by the fix: 'If a new projects file is generated and is more than 1% different than the previous file, the previous file is kept intact (at 1500 repositories, that means 15 repositories would have to be created or deleted in the span of three minutes, which is extremely unlikely).'
rate-limiting  case-studies  post-mortems  kde  git  data-corruption  risks  mirroring  replication  raid  bugs  backups  snapshots  sanity-checks  automation  ops 
march 2013 by jm
java - Given that HashMaps in jdk1.6 and above cause problems with multi-threading, how should I fix my code - Stack Overflow
Massive Java concurrency fail in recent 1.6 and 1.7 JDK releases -- the java.util.HashMap type now spin-locks on an AtomicLong in its constructor.

Here's the response from the author: 'I'll acknowledge right up front that the initialization of hashSeed is a bottleneck but it is not one we expected to be a problem since it only happens once per Hash Map instance. For this code to be a bottleneck you would have to be creating hundreds or thousands of hash maps per second. This is certainly not typical. Is there really a valid reason for your application to be doing this? How long do these hash maps live?'

Oh dear. Assumptions of "typical" like this are not how you design a fundamental data structure. fail. For now there is a hacky reflection-based workaround, but this is lame and needs to be fixed as soon as possible. (Via cscotta)
java  hashmap  concurrency  bugs  fail  security  hashing  jdk  via:cscotta 
february 2013 by jm
High-frequency trading: The fast and the furious | The Economist

"The NYMEX panel found that Infinium had finished writing the algorithm only the day before it introduced it to the market, and had tested it for only a couple of hours in a simulated trading environment to see how it would perform. The firm's normal testing processes take six to eight weeks. When the algorithm started its frenetic buying spree, the measures designed to shut it down automatically did not work. One was supposed to turn the system off if a maximum order size was breached, but because the machine was placing lots of small orders rather than a single big one the shutdown was not triggered. The other measure was meant to prevent Infinium from selling or buying more than a certain number of contracts, but because of an error in the way the rogue algorithm had been written, this, too, failed to spot a problem."
hft  automation  trading  markets  stocks  nymex  bugs  software 
august 2012 by jm
Bug #885027 in calibre: “SUID Mount Helper has 5 Major Vulnerabilities”
Amazing response to a security bug report. 'what's happening in this bug report right now is a perfect example of how *not* to do security response. When faced with two people who clearly know a few things about secure coding, rather than taking their advice and actually fixing the root cause of the problem (or abandon it as a hopeless situation, which is probably the more appropriate response), you've chosen to waste our time by demanding that we write weaponized exploits to exploit what most people already know to be exploitable. To top it off, when shown repeatedly how your half-baked "fixes" don't actually fix anything, rather than taking our advice you just add another small hurdle that can be trivially bypassed. It would be sad if it weren't so funny. I've decided that it's time to stop beating a dead horse. Usually I get paid good money to own software this hard, and I don't think you're worth making an exception. Best of luck, I'm sure you'll figure it out eventually.'
security  funny  calibre  linux  setuid  inept  open-source  bugs  bug-reports 
november 2011 by jm
Java Hangs When Converting 2.2250738585072012e-308
ie. the same value as the PHP bug. 'Konstantin [Pressier] reported this problem to Oracle three weeks ago, but is still waiting for a reply.' good job, Oracle!
oracle  fail  security  java  bugs  floating-point  from delicious
february 2011 by jm
Ubuntu's cron package silently ignores files
Ubuntu have hack-patched Vixie Cron to silently ignore cron files which contain a ".". omgwtf
omgwtfbbq  broken  ubuntu  patching  quality  bugs  software  stupid  packaging  from delicious
september 2010 by jm
iPhone 3GS GPS suddenly stops working? here's the fix
via a forum on MacRumors -- blow away the locationd cache. Worked perfectly for me after my GPS crapped out halfway through my holidays :( Requires that the phone be jailbroken first
iphone  gps  software  3gs  reliability  bugs  macrumors  jailbreaking  locationd  from delicious
may 2010 by jm
The SAY2K10 bug []
LWN follows up on the FH_DATE_PAST_20XX fiasco. 'It would appear that what SpamAssassin needs is some dedicated maintenance talent which is not dependent on evening hours put in by developers committed to other projects.' I wish
spamassassin  say2k10  bugs  maintainance  lwn  commentary  from delicious
january 2010 by jm
SSL trick certificate published
ioerror published the '\00' wild-card SSL cert for any domain (for affected SSL client libs at least)
ssl  tls  security  nul  ioerror  bugs  exploits  from delicious
november 2009 by jm
SD, a distributed bug tracker
now available. sadly, no support for Bugzilla, which is what we use in SpamAssassin (srsly), so I won't be trying it out just yet, but still -- cool
bugs  bug-tracking  trac  prophet  distributed  coding  tools  web  sd 
august 2009 by jm
'Two wrongs don't make a right, but two bugs do'
a story of how a bug in Apollo 11's Lunar Module control software, intended to work around a deficiency of the engine hardware, barely avoided mission-endangering results
apollo-program  bugs  software  coding  engines  hardware  don-eyles  allan-klumpp  interfaces  specifications 
july 2009 by jm

related tags

3gs  32-bit  aes  algorithms  allan-klumpp  analysis  android  antivirus  apache  apache-harmony  apollo-program  apple  asshats  attacks  austerity  automation  avro  aws  aws-cli  backups  bad-programming  belgium  bit-flips  bitcoin  bitgo  blockchain  boost  boost-asio  bounded-buffer  bounds-checking  broken  btc  buffering  bug-reports  bug-tracking  bugs  bytearrayoutputstream  bytes  calendar  calibre  cascading-failures  case-studies  cassandra  chapar  checksums  civ  civilization  clock-skew  clocks  cloud  code  code-review  coding  coding-standards  commentary  complaints  computing  concurrency  connections  consistency  containers  corruption  cosmic-rays  crash-only-software  crypto  culture  cups  currency  dalvik  data-corruption  date  dates  daylight-savings  daylight-savings-time  deadlock  deadlocks  debt  dependencies  design  deutsche-telekom  deviance  devops  digitalocean  discoveryd  disk  distcomp  distributed  distributed-systems  distributed-transactions  dmv  dns  docker  domains  don-eyles  drivers  dry  dst  economics  economy  edge-cases  elections  emr  encoding  engines  entropy  error-checking  error-handling  errors  ethereum  ethernet  eventual-consistency  excel  exception-handling  exceptions  exploits  fabric  facebook  fail  fail-fast  failure  fastavro  fastmail  fault-tolerance  fedora  file  findbugs  flash  float  floating-point  fork  formal-verification  formatting  funny  futexes  futex_wait  future  fuzzing  gandhi  gc  gce  get  git  gnome  go  golang  google  gotchas  gps  grim  guidelines  hacks  hadoop  hang  hardware  hashing  hashmap  haswell  hbase  hdfs  head  heartbleed  heathrow  heuristics  hft  history  honeybadger  horror  http  https  httpurlconnection  imap  imessage  impala  inept  infer  integers  interfaces  ioerror  ios  ip  iphone  ireland  iso  jailbreaking  java  javascript  jdk  jdk6  journald  jpl  json  jstat  jvm  k8s  kde  kernel  languages  latency  libraries  lint  linux  locationd  locking  logging  logs  lwn  lxc  mac  macrumors  mail  maintainance  malware  mapreduce  markets  marshalling  md5  mdnsresponder  mec  memories  messaging  mirroring  mmap  mobile  money  monkey-patching  multi-region  mutexes  nasa  ncps  netflix  network-monitoring  networking  noise  normalization-of-deviance  norms  ntp  ntpd  nuclear-war  nul  null  nymex  omgwtfbbq  oneplus  oo  oom  open-source  openssl  ops  oracle  osx  outages  overflow  p99  packaging  packet  packets  pagerduty  paper-tape  papers  parkbytext  parking  patch  patching  paul-krugman  performance  phones  php  poison-packets  politics  posix  post-mortems  postmortems  printing  prng  processes  programming  prophet  pty  python  quality  race-conditions  rahman-algorithm  raid  random  randomness  rate-limiting  read-after-writes  realtime  redis  reference  refs  reliability  replication  resolv  review  reviews  risks  ruby  s3  sanity-checks  say2k10  scaling  science  sd  sde  sde-fundamentals  securerandom  security  setuid  sim-cards  sinatra  skew  slew  sms  snapshots  software  source-code-analysis  spamassassin  specifications  speculative-execution  spreadsheets  springboard  ssh  ssl  stackdriver  startup  static-analysis  static-code-analysis  stepping  stocks  strftime  stupid  synchronization  sysadmin  systemd  systemdsucks  tags  tcp  tcpdump  tdd  testability  testing  threading  threadsanitizer  time  time-synchronization  timezones  tips  tla+  tls  tmux  tools  trac  trading  twins  ubuntu  unicode  unit-testing  unit-tests  unix  urlconnection  utc  utf-8  valgrind  verdi  version-control  veth  via:abetson  via:aphyr  via:bos  via:cscotta  via:fanf  via:hackernews  via:hmason  via:itc  via:marcomorain  via:markdennehy  videogames  voting-machines  vulnerability  vvat  web  week  work  workplaces  xen  year  zookeeper 

Copy this bookmark: