jm + ntpd   3

How and why the leap second affected Cloudflare DNS
The root cause of the bug that affected our DNS service was the belief that time cannot go backwards. In our case, some code assumed that the difference between two times would always be, at worst, zero. RRDNS is written in Go and uses Go’s time.Now() function to get the time. Unfortunately, this function does not guarantee monotonicity. Go currently doesn’t offer a monotonic time source.


So the clock went "backwards", s1 - s2 returned < 0, and the code couldn't handle it (because it's a little known and infrequent failure case).

Part of the root cause here is cultural -- Google has solved the leap-second problem internally through leap smearing, and Go seems to be fundamentally a Google product at heart.

The easiest fix in general in the "outside world" is to use "ntpd -x" to do a form of smearing. It looks like AWS are leap smearing internally (https://aws.amazon.com/blogs/aws/look-before-you-leap-the-coming-leap-second-and-aws/), but it is a shame they aren't making this a standard part of services running on top of AWS and a feature of the AWS NTP fleet.
ntp  time  leap-seconds  fail  cloudflare  rrdns  go  golang  dns  leap-smearing  ntpd  aws 
january 2017 by jm
Five different ways to handle leap seconds with NTP
Without switching to chronyd, ntpd -x sounds not too suboptimal:
With ntpd, the kernel backward step is used by default. With ntpd versions before 4.2.6, or 4.2.6 and later patched for this bug, the -x option (added to /etc/sysconfig/ntpd) can be used to disable the kernel leap second correction and ignore the leap second as far as the local clock is concerned. The one-second error gained after the leap second will be measured and corrected later by slewing in normal operation using NTP servers which already corrected their local clocks.


It's all pretty messy though :(
ntpd  ntp  chronyd  clocks  time  synchronization  via:fanf  linux  leap-seconds 
june 2015 by jm
How to configure ntpd so it will not move time backwards
The "-x" switch will expand the step/slew boundary from 128ms to 600 seconds, ensuring the time is slewed (drifted slowly towards the correct time at a max of 5ms per second) rather than "stepped" (a sudden jump, potentially backwards). Since slewing has a max of 5ms per second, time can never "jump backwards", which is important to avoid some major application bugs (particularly in Java timers).
ntpd  time  ntp  ops  sysadmin  slew  stepping  time-synchronization  linux  unix  java  bugs 
august 2013 by jm

Copy this bookmark:



description:


tags: