jm + data-loss   2

Elasticsearch and data loss
"@alexbfree @ThijsFeryn [ElasticSearch is] fine as long as data loss is acceptable. . We lose ~1% of all writes on average."
elasticsearch  data-loss  reliability  data  search  aphyr  jepsen  testing  distributed-systems  ops 
october 2015 by jm
'Copysets: Reducing the Frequency of Data Loss in Cloud Storage' [paper]
An improved replica-selection algorithm for replicated storage systems.

We present Copyset Replication, a novel general purpose replication technique that significantly reduces the frequency of data loss events. We implemented and evaluated Copyset Replication on two open source data center storage systems, HDFS and RAMCloud, and show it incurs a low overhead on all operations. Such systems require that each node’s data be scattered across several nodes for parallel data recovery and access. Copyset Replication presents a near optimal tradeoff between the number of nodes on which the data is scattered and the probability of data loss. For example, in a 5000-node RAMCloud cluster under a power outage, Copyset Replication reduces the probability of data loss from 99.99% to 0.15%. For Facebook’s HDFS cluster, it reduces the probability from 22.8% to 0.78%.
storage  cloud-storage  replication  data  reliability  fault-tolerance  copysets  replicas  data-loss 
july 2013 by jm

Copy this bookmark: