jm + snapshots   4

Dr. Bunsen / Time Warp
I use it to modify Time Machine’s backup behavior using weighted reservoir sampling. I built Time Warp to preserve important backup snapshots and prevent Time Machine from deleting them.

via Aman. Nifty!
backup  python  time-machine  decay  exponential-decay  weighting  algorithms  snapshots  ops 
january 2014 by jm
This program creates an EBS snapshot for an Amazon EC2 EBS volume. To
help ensure consistent data in the snapshot, it tries to flush and
freeze the filesystem(s) first as well as flushing and locking the
database, if applicable.

Filesystems can be frozen during the snapshot. Prior to Linux kernel
2.6.29, XFS must be used for freezing support. While frozen, a
filesystem will be consistent on disk and all writes will block.

There are a number of timeouts to reduce the risk of interfering with
the normal database operation while improving the chances of getting a
consistent snapshot.

If you have multiple EBS volumes in a RAID configuration, you can
specify all of the volume ids on the command line and it will create
snapshots for each while the filesystem and database are locked. Note
that it is your responsibility to keep track of the resulting snapshot
ids and to figure out how to put these back together when you need to
restore the RAID setup.

ubuntu  ec2  aws  linux  ebs  snapshots  ops  tools  alestic 
may 2013 by jm
KDE's brush with git repository corruption: post-mortem
a barely-averted disaster... phew.

while we planned for the case of the server losing a disk or entirely biting the dust, or the total loss of the VM’s filesystem, we didn’t plan for the case of filesystem corruption, and the way the corruption affected our mirroring system triggered some very unforeseen and pathological conditions. [...] the corruption was perfectly mirrored... or rather, due to its nature, imperfectly mirrored. And all data on the anongit [mirrors] was lost.

One risk demonstrated: by trusting in mirroring, rather than a schedule of snapshot backups covering a wide time range, they nearly had a major outage. Silent data corruption, and code bugs, happen -- backups protect against this, but RAID, replication, and mirrors do not.

Another risk: they didn't have a rate limit on project-deletion, which resulted in the "anongit" mirrors deleting their (safe) data copies in response to the upstream corruption. Rate limiting to sanity-check automated changes is vital. What they should have had in place was described by the fix: 'If a new projects file is generated and is more than 1% different than the previous file, the previous file is kept intact (at 1500 repositories, that means 15 repositories would have to be created or deleted in the span of three minutes, which is extremely unlikely).'
rate-limiting  case-studies  post-mortems  kde  git  data-corruption  risks  mirroring  replication  raid  bugs  backups  snapshots  sanity-checks  automation  ops 
march 2013 by jm
Copying block devices between machines
a very hairy hack to perform a block-level rsync-like "send just the changes" algorithm between two very large files (think /dev/sda block devices).  Crazy, but it'd work alright!
devices  hairy  hacks  shell  perl  networking  ssh  rsync  lvm  snapshots  from delicious
march 2011 by jm

Copy this bookmark: