jm + replicas   4

glibc changed their UTF-8 character collation ordering across versions, breaking postgres
This is terrifying:
Streaming replicas—and by extension, base backups—can become dangerously broken when the source and target machines run slightly different versions of glibc. Particularly, differences in strcoll and strcoll_l leave "corrupt" indexes on the slave. These indexes are sorted out of order with respect to the strcoll running on the slave. Because postgres is unaware of the discrepancy is uses these "corrupt" indexes to perform merge joins; merges rely heavily on the assumption that the indexes are sorted and this causes all the results of the join past the first poison pill entry to not be returned. Additionally, if the slave becomes master, the "corrupt" indexes will in cases be unable to enforce uniqueness, but quietly allow duplicate values.

Moral of the story -- keep your libc versions in sync across storage replication sets!
postgresql  scary  ops  glibc  collation  utf-8  characters  indexing  sorting  replicas  postgres 
7 days ago by jm
Hi-tech caves bring prehistoric Sistine chapel back to life
ooh, Lascaux 4 is finally opening:
St-Cyr added: “It’s impossible for anyone to see the original now, but this is the next best thing. What is lost in not having the real thing is balanced by the fact people can see so much more of the detail of the wonderful paintings and engravings.”
lascaux  cave-art  history  prehistory  caves  replicas 
december 2016 by jm
'Copysets: Reducing the Frequency of Data Loss in Cloud Storage' [paper]
An improved replica-selection algorithm for replicated storage systems.

We present Copyset Replication, a novel general purpose replication technique that significantly reduces the frequency of data loss events. We implemented and evaluated Copyset Replication on two open source data center storage systems, HDFS and RAMCloud, and show it incurs a low overhead on all operations. Such systems require that each node’s data be scattered across several nodes for parallel data recovery and access. Copyset Replication presents a near optimal tradeoff between the number of nodes on which the data is scattered and the probability of data loss. For example, in a 5000-node RAMCloud cluster under a power outage, Copyset Replication reduces the probability of data loss from 99.99% to 0.15%. For Facebook’s HDFS cluster, it reduces the probability from 22.8% to 0.78%.
storage  cloud-storage  replication  data  reliability  fault-tolerance  copysets  replicas  data-loss 
july 2013 by jm
[tahoe-dev] erasure coding makes files more fragile, not less
Zooko says: "This monitoring and operations engineering is a lot of work!" amen to that
erasure-coding  replicas  fs  tahoe-lafs  zooko  monitoring  devops  ops 
march 2012 by jm

Copy this bookmark: