3203
How Tailscale works
People often ask us for an overview of how Tailscale works. We’ve been
putting off answering that, because we kept changing it!
But now things have started to settle down.
Let’s go through the entire Tailscale system from bottom to top, the same
way we built it (but skipping some zigzags we took along the way).
networking  nat-traversal  confidentiality 
14 days ago
Things They Never Taught You About Naming and Addressing
And one of the most difficult and subtle topics in all of networking
networking  addressing  service-location  RINA 
25 days ago
Low-Cost Deterministic C++ Exceptions for Embedded Systems
he C++ programming language offers a strong exception mechanism for error handling at the language level, improving code readability, safety, and maintainability. However, current C++ implementations are targeted at general-purpose systems, often sacrificing code size, memory usage, and resource determinism for the sake of performance. This makes C++ exceptions a particularly undesirable choice for embedded applications where code size and resource determinism are often paramount. Consequently, embedded coding guidelines either forbid the use of C++ exceptions, or embedded C++ tool chains omit exception handling altogether. In this paper, we develop a novel implementation of C++ exceptions that eliminates these issues, and enables their use for embedded systems. We combine existing stack unwinding techniques with a new approach to memory management and run-time type information (RTTI). In doing so we create a compliant C++ exception handling implementation, providing bounded runtime and memory usage, while reducing code size requirements by up to 82%, and incurring only a minimal runtime overhead for the common case of no exceptions.
c++ 
4 weeks ago
Firecracker: Lightweight Virtualization for Serverless Applications
Serverless containers and functions are widely used for deploying and managing software in the cloud. Their popularity is due to reduced cost of operations, improved utilization of hardware, and faster scaling than traditional deployment methods. The economics and scale of serverless applications demand that workloads from multiple customers run on the same hardware with minimal overhead, while preserving strong security and performance isolation. The traditional view is that there is a choice between virtualization with strong security and high overhead, and container technologies with weaker security and minimal overhead. This tradeoff is unacceptable to public infrastructure providers, who need both strong security and minimal overhead. To meet this need, we developed Fire-cracker, a new open source Virtual Machine Monitor (VMM)specialized for serverless workloads, but generally useful for containers, functions and other compute workloads within a reasonable set of constraints. We have deployed Firecracker in two publically available serverless compute services at Amazon Web Services (Lambda and Fargate), where it supports millions of production workloads, and trillions of requests per month. We describe how specializing for serverless in-formed the design of Firecracker, and what we learned from seamlessly migrating Lambda customers to Firecracker.
virtualization  performance 
6 weeks ago
Debezium
Debezium is an open source distributed platform for change data capture. Start it up, point it at your databases, and your apps can start responding to all of the inserts, updates, and deletes that other apps commit to your databases. Debezium is durable and fast, so your apps can respond quickly and never miss an event, even when things go wrong.
database  event 
7 weeks ago
Circllhist -- A Log-Linear Histogram Data Structure for IT Infrastructure Monitoring
The circllhist histogram is a fast and memory efficient data structure for summarizing large numbers of latency measurements. It is particularly suited for applications in IT infrastructure monitoring, and provides nano-second data insertion, full mergeability, accurate approximation of quantiles with a-priori bounds on the relative error.
Open-source implementations are available for C/lua/python/Go/Java/JavaScript.
monitoring  datastructure  statistics 
9 weeks ago
GNU Recutils
Recutils is a collection of tools, like recins, recdel, and recsel used to manage these recfiles/databases. They allow for all the normal basic relational database operations, typing, auto-incrementing, and even field-level crypto. All of this power is yours with the bonus that your database is a human-readable text file that you can grep/awk/sed freely, and a line-oriented structure makes it perfect for version control systems.
database  linux 
9 weeks ago
Cloudburst: Stateful Functions-as-a-Service
Function-as-a-Service (FaaS) platforms and "serverless" cloud computing are becoming increasingly popular. Current FaaS offerings are targeted at stateless functions that do minimal I/O and communication. We argue that the benefits of serverless computing can be extended to a broader range of applications and algorithms. We present the design and implementation of Cloudburst, a stateful FaaS platform that provides familiar Python programming with low-latency mutable state and communication, while maintaining the autoscaling benefits of serverless computing. Cloudburst accomplishes this by leveraging Anna, an autoscaling key-value store, for state sharing and overlay routing combined with mutable caches co-located with function executors for data locality. Performant cache consistency emerges as a key challenge in this architecture. To this end, Cloudburst provides a combination of lattice-encapsulated state and new definitions and protocols for distributed session consistency. Empirical results on benchmarks and diverse applications show that Cloudburst makes stateful functions practical, reducing the state-management overheads of current FaaS platforms by orders of magnitude while also improving the state of the art in serverless consistency
FaaS 
11 weeks ago
Age-Partitioned Bloom Filters
Bloom filters (BF) are widely used for approximate membership queries over a set of elements. BF variants allow removals, sets of unbounded size or querying a sliding window over an unbounded stream. However, for this last case the best current approaches are dictionary based (e.g., based on Cuckoo Filters or TinyTable), and it may seem that BF-based approaches will never be competitive to dictionary-based ones. In this paper we present Age-Partitioned Bloom Filters, a BF-based approach for duplicate detection in sliding windows that not only is competitive in time-complexity, but has better space usage than current dictionary-based approaches (e.g., SWAMP), at the cost of some moderate slack. APBFs retain the BF simplicity, unlike dictionary-based approaches, important for hardware-based implementations, and can integrate known improvements such as double hashing or blocking. We present an Age-Partitioned Blocked Bloom Filter variant which can operate with 2-3 cache-line accesses per insertion and around 2-4 per query, even for high accuracy filters.
probabilistic-algorithm  bloomfilter  datastructure 
12 weeks ago
Rezolus
We are open-sourcing Rezolus, our high-resolution systems performance telemetry agent. Rezolus began in an effort to help uncover performance anomalies and utilization spikes that were too brief to be captured through our normal observability and metrics systems. It has proven to be useful to help us quantify workload characteristics, provide data to drive optimization efforts, and has been used to diagnose runtime performance issues
monitoring  performance  linux  ebpf 
12 weeks ago
Deterministic Aperture: A distributed, load balancing algorithm
In this blog post, we will walk through a new client-side load balancing technique we’ve developed and deployed widely at Twitter which has allowed our microservice architecture to efficiently scale clusters to thousands of instances. We call this new technique deterministic aperture
rpc  load-balancing 
12 weeks ago
Ironies of Automation
This paper discusses the ways in which automation of industrial processes may expand rather than eliminate problems with the human operator.
resilience  operations 
12 weeks ago
Accelerating UDP packet transmission for QUIC
While a lot of work has gone into optimizing TCP implementations as much as possible over the years, including building offloading capabilities in both software (like in operating systems) and hardware (like in network interfaces), UDP hasn't received quite as much attention as TCP, which puts QUIC at a disadvantage. In this post we'll look at a few tricks that help mitigate this disadvantage for UDP, and by association QUIC.
performance  linux  udp  networking 
12 weeks ago
Faker
Faker is a Python package that generates fake data for you. Whether you need to bootstrap your database, create good-looking XML documents, fill-in your persistence to stress test it, or anonymize data taken from a production service, Faker is for you.
testing 
12 weeks ago
resilience-engineering
Resilience engineering papers. Contribute to lorin/resilience-engineering development by creating an account on GitHub.
resilience 
january 2020
Linux Pressure Stall Information (PSI) by Example
The three fundamental building blocks of computers, CPU, I/O and RAM can come under pressure due to contention and this is not uncommon. To both accurately size workloads and to increase hardware utilization, having information on how much pressure workloads are causing can come in very handy
linux 
january 2020
Message DB
A fully-featured event store and message store implemented in PostgreSQL for Pub/Sub, Event Sourcing, Messaging, and Evented Microservices applications.
postgresql  event  messaging  database 
january 2020
Always Bump Downwards
When writing a bump allocator, always bump downwards. That is, allocate from high addresses, down towards lower addresses by decrementing the bump pointer. Although it is perhaps less natural to think about, it is more efficient than incrementing the bump pointer and allocating from lower addresses up to higher ones
memory  performance 
january 2020
Flat combined Red Black Trees
Flat combining is a concurrency threaded technique whereby one thread performs all the operations in batch by scanning a queue of operations to-be-done and performing them together. Flat combining makes sense as long as k operations each taking O(n) separately can be batched together and done in less than O(k*n). Red black tree is a balanced binary search tree with permanent balancing warranties. Operations in red black tree are hard to batch together: for example inserting nodes in two different branches of the tree affect different areas of the tree. In this paper we investigate alternatives to making a flat combine approach work for red black trees.
datastructure  concurrency 
january 2020
Building a Large-scale Distributed Storage System Based on Raft
In this article, I’d like to share some of our firsthand experience in designing a large-scale distributed storage system based on the Raft consensus algorithm.
storage  scaling  availability  consistency 
january 2020
Sparkler: A KVM-based Virtual Machine Manager
How about building a tiny virtual machine manager (VMM) and a super tiny “operating system” to understand how KVM really works? That’s exactly what we’ll be doing with Sparkler.
virtualization  linux 
january 2020
Taiji: Managing Global User Traffic for Large-Scale Internet Services at the Edge
We present Taiji, a new system for managing user traffic for large-scale Internet services that accomplishes two goals: 1) balancing the utilization of data centers and 2) minimizing network latency of user requests.
load-balancing 
january 2020
Snap: a Microkernel Approach to Host Networking
This paper presents our design and experience with a microkernel-inspired approach to host networking called Snap. Snap is a userspace networking system that supports Google’s rapidly evolving needs with flexible modules that implement a range of network functions, including edge packet switching, virtualization for our cloud platform, traffic shaping policy enforcement, and a high-performance reliable messaging and RDMA-like service
networking  performance 
january 2020
Go channels on steroids
Channels are the main synchronization and communication primitive in Go, they need to be fast and scalable
concurrency  go  performance 
january 2020
The Concurrent 2-Trie
The concurrent 2-trie is a dictionary-like data structure that is designed and optimized specifically to be used for translation tables in file buffer pools. When the concurrent 2-trie replaced the lock-striped hop-scotch hash tables in the Neo4j file buffer pool, file buffer accesses became 30% faster.

The concurrent 2-trie optimizes for the domain of file buffer translation tables by making the following observations: First, the file page identifiers form a sequence, starting from zero until the last page in the file. Second, files can only grow by being extended at the end. Third, most database deployments are able to fit the majority of their data in memory, and often the data set fits entirely in memory. This means the translation tables are usually densely packed.
datastructure  memory  database 
january 2020
Markers of Progress in Incident Analysis
Detecting that effective “learning” from an incident has taken place is quite difficult to do. Making progress in learning from incidents is difficult to capture and characterize. However, there are a number of potential indicators that, taken together, could provide evidence of progress in learning from incidents.
management 
january 2020
mrloop
An event loop for C using io_uring
concurrency  systems 
january 2020
tlaplus_jupyter
Jupyter kernel for TLA⁺ and Pluscal specification languages
tla+  formal-methods 
january 2020
Xor Filters: Faster and Smaller Than Bloom Filters
We just published a paper called Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters that will appear in the Journal of Experimental Algorithmics.
probabilistic-algorithm  bloomfilter 
january 2020
Things we (finally) know about network queues
How big should your queue be, and what should you do when it fills up? Many times, we implement or even deploy a networking system before we have answered those questions. Luckily, recent research has given us some guidelines. Here's what we know
queuing  systems-architecture 
january 2020
natlab
NATlab is a testbed for NAT traversal software. It intercepts packets before Linux's NAT implementation can process them, and does its own translation according to configurable policies. This enables NATlab to emulate any NAT behavior you desire.

Combined with something like docker-compose, you can construct complex network topologies, featuring multiple NAT gateways with different behaviors, and see how well your software is able to traverse them
networking  testing 
january 2020
Great Works in Programming Languages
In September, 2004, I posted a query to the Types list asking people to name the five most important papers ever written in the area of programming languages. This page collects the responses I received.
language 
december 2019
The Amazon Builders' Library
The Amazon Builders’ Library is a collection of living articles that describe how Amazon develops, architects, releases, and operates technology
architecture  queuing  availability  load-balancing 
december 2019
katran
A high performance layer 4 load balancer.
networking  performance 
december 2019
mdBook
mdBook is a utility to create modern online books from Markdown files.
documentation 
december 2019
Docsy
A Hugo theme for creating great technical documentation sites
documentation 
december 2019
MkDocs
Project documentation with Markdown.
documentation 
december 2019
Implementation of Reusable Workflow Engines
Every time a workflow solution is conceived there is a large amount of functionality that is eventually reinvented and redeveloped from scratch. Workflow management systems from academia to the commercial arena exhibit a myriad of approaches having as much in common as in contrast with each other. Efforts in standardizing a workflow reference model and the gradual endorsement of those standards have also not precluded developers from designing workflow systems tailored to specific user needs. This article is written in the belief that an appropriate set of common workflow functionality can be abstracted and reused in forthcoming systems or embedded in applications intended to become workflow-enabled. Specific requirements and a prototype implementation of such functionality, named Workflow Kernel, are discussed.
workflow 
november 2019
Coordinating Distributed System Configuration with Humming Consensus
Humming Consensus is a new algorithm for managing configuration metadata changes in eventual consistency systems. Service is available even when only a single participant is isolated by network partition; when the network recovers, it is re-integrated safely with its peers. Humming Consensus can also manage strong consistency systems with only modest adaptation. Though further research is required, the unification across consistency modes appears novel.
configuration  consistency 
november 2019
Workflow Patterns Home Page
The aim of this initiative is to provide a conceptual basis for process technology. In particular, the research provides a thorough examination of the various perspectives (control flow, data, resource, and exception handling) that need to be supported by a workflow language or a business process modelling language
workflow 
november 2019
Tearing apart printf()
If 'Hello World' is the first program for C students, then printf() is probably the first function. I've had to answer questions about printf() many times over the years, so I've finally set aside time for an informal writeup.
linux  compiler 
november 2019
Hyaline: Fast and Transparent Lock-Free Memory Reclamation
We present a new lock-free safe memory reclamation algorithm, Hyaline, which is fast, scalable, and transparent to the underlying data structures. Due to very low contention, Hyaline is generally faster than existing approaches, including epoch-based reclamation. Moreover, Hyaline easily handles virtually unbounded number of threads (or any concurrent entities) that can be created and deleted dynamically, while retaining O(1) reclamation cost. Hyaline also does not depend on any OS abstractions or compiler support. Hyaline's full transparency and simple API enable easy integration into unmanaged C/C++ code, while not causing any undue burden on programmers, potentially making the entire process fully automatic through special language constructs. We also extend Hyaline to avoid situations where stalled threads prevent others from reclaiming newly allocated objects, a common problem with epoch-based reclamation. We have implemented and tested Hyaline on the x86(-64), ARM32/64, PPC, and MIPS architectures. The general approach typically requires LL/SC or double-width CAS, while a specialized version also works with single-width CAS. Our evaluation reveals that Hyaline's throughput is very high -- it steadily outperformed epoch-based reclamation by 10% in one test and yielded even higher gains in oversubscribed scenarios
memory  non-blocking 
november 2019
Fail at Scale
Our services process requests using adaptive LIFO. During normal operating conditions, requests are processed in FIFO order, but when a queue is starting to form, the server switches to LIFO mode
queuing  scalability 
october 2019
Data Laced with History: Causal Trees & Operational CRDTs
if two documents could always be made to merge, then most of that coordination hullabaloo could go out the window. Each part of the system could be made to work at its own pace.
crdt  causal  consistency 
october 2019
Benefits of Stabilization versus Rollback in Eventually Consistent Key-Value Stores
In this paper, we evaluate and compare the performance of two approaches, namely self-stabilization and rollback, to handling consistency violation faults (cvf) that occurred when a distributed program is executed on eventually consistent key-value store. We observe that self-stabilization is usually better than rollbacks in our experiments. Moreover, when we aggressively allow more cvf in exchange of eliminating mechanisms for guaranteeing atomicity requirements of actions, we observe the programs in our case studies achieve a speedup between 2--15 times compared with the standard implementation. We also analyze different factors that contribute to the results. Our results and analysis are useful in helping a system designer choose proper design options for their program.
self-stabilization  consistency  database 
october 2019
Atomic Durability - How do databases recover from a crash ?
In this post we're going to talk about the four different ways of having durable transactions.
database  consistency  safety  storage 
october 2019
schedviz
Linux kernel scheduling behavior can be a key factor in application responsiveness and system utilization. Today, we’re announcing SchedViz, a new tool for visualizing Linux kernel scheduling behavior. We’ve used it inside Google to discover many opportunities for better scheduling choices and to root-cause many latency issues.
scheduling  linux 
october 2019
Making the Tokio scheduler 10x faster
We’ve been hard at work on the next major revision of Tokio, Rust’s asynchronous runtime. Today, a complete rewrite of the scheduler has been submitted as a pull request. The result is huge performance and latency improvements.
scheduler  performance 
october 2019
Differences between gRPC and RSocket]
In general, gRPC and RSocket attempt to solve different problems. gRPC is an RPC framework using HTTP/2. RSocket is useful.
rsocket  protocol 
october 2019
nic-firmware: CoreNIC: a flexible SR-IOV SmartNIC firmware implementation supporting BPF and stateless offloads.
CoreNIC is the product name of Netronome's NIC firmware implementation for Agilio SmartNICs. It provides a network interface compatible with the nfp Linux driver and DPDK
smartnic 
september 2019
Memory Layouts for Binary Search
The Eytzinger layout offers the best all-around performance over a wide range of array lengths
algorithm  memory  performance 
september 2019
Protocol-Aware Recovery for Consensus-Based Storage
We introduce protocol-aware recovery(PAR), a new approach that exploits protocol-specific knowledge to correctly recover from storage faults in distributed systems.
consensus  storage 
september 2019
λ-NIC: Interactive Serverless Compute on Programmable SmartNICs
In this paper, we argue that server CPUs are ill-suited to run serverless workloads (i.e., lambdas) and present λ-NIC, an open-source framework, that runs interactive workloads directly on a SmartNIC
networking  performance  smartnic 
september 2019
Text Rendering Hates You
Rendering text, how hard could it be? As it turns out, incredibly hard! To my knowledge, literally no system renders text "perfectly". It's all best-effort, although some efforts are more important than others.
typography 
september 2019
Isolating workloads with Systemd slices
If we want to achieve a result like docker, without paying the full complexity price of docker, it pays to ask ourselves “what is docker made of?”
deployment  linux 
september 2019
Modern C
A new edition of the book Modern C is now available under a CC license
c  book 
september 2019
Tecton Project
Tecton is a language and tool framework whose purpose is to foster structured development of computational systems (both hardware designs and software), using abstraction and specialization as the key structuring mechanisms
formal-methods 
september 2019
Ristretto
A Go cache library worthy of being compared to non-Go cache implementations
go  cache 
september 2019
Feedback Systems Wiki
Welcome to FBSwiki. This is the wiki for the Second Edition of text Feedback Systems by Karl J. Åström and Richard M. Murray.
feedback-control 
september 2019
Dynamic Resource Allocation in the Cloud with Near-Optimal Efficiency
Cloud computing has motivated renewed interest in resource allocation problems with new consumption models. A common goal is to share a resource, such as CPU or I/O bandwidth, among distinct users with different demand patterns as well as different quality of service requirements. To ensure these service requirements, cloud offerings often come with a service level agreement (SLA) between the provider and the users. An SLA specifies the amount of a resource a user is entitled to utilize. In many cloud settings, providers would like to operate resources at high utilization while simultaneously respecting individual SLAs. There is typically a tradeoff between these two objectives; for example, utilization can be increased by shifting away resources from idle users to "scavenger" workload, but with the risk of the former then becoming active again. We study this fundamental tradeoff by formulating a resource allocation model that captures basic properties of cloud computing systems, including SLAs, highly limited feedback about the state of the system, and variable and unpredictable input sequences. Our main result is a simple and practical algorithm that achieves near-optimal performance on the above two objectives. First, we guarantee nearly optimal utilization of the resource even if compared to the omniscient offline dynamic optimum. Second, we simultaneously satisfy all individual SLAs up to a small error. The main algorithmic tool is a multiplicative weight update algorithm, and a duality argument to obtain its guarantees. Experiments on both synthetic and real production traces demonstrate the merits of our algorithm in practical settings.
scheduling 
august 2019
A Scalable, Portable, and Memory-Efficient Lock-Free FIFO Queue
We present a new lock-free multiple-producer and multiple-consumer (MPMC) FIFO queue design which is scalable and, unlike existing high-performant queues, very memory efficient. Moreover, the design is ABA safe and does not require any external memory allocators or safe memory reclamation techniques, typically needed by other scalable designs. In fact, this queue itself can be leveraged for object allocation and reclamation, as in data pools. We use FAA (fetch-and-add), a specialized and more scalable than CAS (compare-and-set) instruction, on the most contended hot spots of the algorithm. However, unlike prior attempts with FAA, our queue is both lock-free and linearizable.
We propose a general approach, SCQ, for bounded queues. This approach can easily be extended to support unbounded FIFO queues which can store an arbitrary number of elements. SCQ is portable across virtually all existing architectures and flexible enough for a wide variety of uses. We measure the performance of our algorithm on the x86-64 and PowerPC architectures. Our evaluation validates that our queue has exceptional memory efficiency compared to other algorithms and its performance is often comparable to, or exceeding that of state-of-the-art scalable algorithms.
non-blocking  datastructure 
august 2019
The Promise and Limitations of Gossip Protocols
What are the uses to which gossip is particularly well-matched, and what are its limitations? What alternatives are there to gossip-based solutions, and when would we be better-off using a non-gossip protocol? When, in effect, is gossip the technology of choice?
gossip 
august 2019
Why do some developers at strong companies like Google consider Agile development to be nonsense?
This style of short-term planning, direct customer contact, and continuous iteration is well suited to software with a simple core and lots of customer visible features that are incrementally useful. It is not so well suited to software which has a very simple interface and tons of hidden internal complexity, software which isn’t useful until it’s fairly complete, or leapfrog solutions the customer can’t imagine.
agile  projectmanagement  process 
august 2019
The Haskell Phrasebook
The Haskell Phrasebook is a free quick-start Haskell guide comprised of a sequence of small annotated programs. It provides a cursory overview of selected Haskell features, jumping-off points for further reading, and recommendations to help get you writing programs as soon as possible
haskell 
august 2019
EBtree - Design for a Scheduler and Use (Almost) Everywhere
Andjelko Iharos explores the goals, design and the choices behind the implementations of EBtree, and how they produce a very fast and versatile data storage for many of HAProxys advanced features. EBtree is a different take on the ubiquitous tree data structure, and has been helping HAProxy, a high performance open source software load balancer, to keep up with demands for over a decade.
datastructure  concurrency 
august 2019
Hash table tradeoffs: CPU, memory, and variability
In this post, I describe a tri-factor model which I find more useful in the analysis of hash table algorithms and discuss several state-of-the-art algorithms in the context of this model.
datastructure  memory  performance 
august 2019
A Performance Comparison of Different Backoff Algorithms under Different Rebroadcast Probabilities for MANET's
The backoff algorithm is a part of Media Access Control (MAC) protocol which used to avoid collision in the Mobile Ad hoc Network (MANET). When the nodes in the network try to access the channel, one of these nodes gains access the channel while the other nodes still contend for a time period. Many backoff algorithms have been proposed to improve network performance. One of these algorithms is Fibonacci increment backoff (FIB), FIB algorithm achieves higher throughput than the exponential backoff that is used by the standard IEEE 802.11 when it used in a mobile ad hoc network. The Pessimistic Linear-exponential Backoff (PLEB) is another proposed backoff algorithm which uses a combination of two increment behaviors; Exponential backoff and Linear backoff this scheme merges the advantages of the two increment behaviors. Exponential increments give enough backoff time to enhance the network throughput by reducing the number of transmission failures, and the linear increment reduces the average packet delay. Ad hoc On demand Distance Vector (AODV) routing protocol use a demand-driven route establishment procedure. AODV maintain the route table at each node. This paper uses different backoff algorithms at different values of rebroadcast probability.
backpressure  protocol 
august 2019
cntr: A container debugging tool based on FUSE
Say no to $ apt install vim in containers! cntr is a replacement for docker exec that brings all your developers tools with you. This allows to ship minimal runtime image in production and limit the surface for exploits.
debugging  deployment 
august 2019
Get the most out of the linker map file
In this article, I want to highlight how simple linker map files are and how much they can teach you about the program you are working on.
compiler  memory  debugging 
july 2019
A Survey of Preferences in Virtual Machine Placement
With the rapid developments of virtualization techniques, cloud data centers have had enabled cost effective, flexible, and customizable deployments of applications in virtualized infrastructure. Virtual machine placement problem is a problem of paramount importance to the design of cloud data centers. It aims to assign each virtual machine to a server in the cloud environment. Typically, the problem involves complex relations and multiple design factors as well as local policies that govern the assignment decisions. It also involves different parties including cloud administrators and customers that we need to consider their preferences while opting for a solution. Thus, it is significant to not only return an optimized solution to the underlying placement problem but also a solution that reflects the given preferences of these parties. In this paper, we provide a detailed review on the role of preferences in the current literature of the virtual machine placement problem. We further discuss some challenges and identify possible research opportunities to better incorporate preferences within the problem.
scheduling 
july 2019
netdata
Real-time performance monitoring, done right
monitoring 
july 2019
cbuffer
A circular buffer written in C using Posix calls to create a contiguously mapped memory space.
datastructure  memory 
july 2019
WaitFreeHierarchy
The wait-free hierarchy hrm classifies AsynchronousSharedMemory object types T by consensus number, where a type T has consensus number n if with objects of type T and atomic registers (all initialized to appropriate values) it is possible to solve wait-free consensus (i.e., agreement, validity, wait-free termination) for n processes but not for n 1 processes. The consensus number of any type is at least 1, since 1-process consensus requires no interaction, and may range up to ∞ for particularly powerful objects.
concurrency  non-blocking 
july 2019
Designing an Efficient Replicated Log Store with Consensus Protocol
Highly available and high-performance message logging system is critical building block for various use cases that require global ordering, especially for deterministic distributed transactions. To achieve availability, we maintain multiple replicas that have the same payloads in exactly the same order. This introduces various challenging issues such as consistency between replicas after failure, while minimizing performance degradation. Replicated state machine-based consensus protocols are the most suitable candidates to fulfill those requirements, but double-write problem and different logging granularity make it hard to keep the system efficient. This paper suggests a novel way to build a replicated log store on top of Raft consensus protocol, aiming at providing the same level of consistency as well as fault-tolerance without sacrificing the throughput of the system.
consensus  consistency  replication  storage  database 
july 2019
Co-evolving Tracing and Fault Injection with Box of Pain
Distributed systems are hard to reason about largely because of uncertainty about what may go wrong in a particular execution, and about whether the system will mitigate those faults. Tools that perturb executions can help test whether a system is robust to faults, while tools that observe executions can help better understand their system-wide effects. We present Box of Pain, a tracer and fault injector for unmodified distributed systems that addresses both concerns by interposing at the system call level and dynamically reconstructing the partial order of communication events based on causal relationships. Box of Pain’s lightweight approach to tracing and focus on simulating the effects of partial failures on communication rather than the failures themselves sets it apart from other tracing and fault injection systems. We present evidence of the promise of Box of Pain and its approach to lightweight observation and perturbation of distributed systems.
testing  observability 
july 2019
OneFile
OneFile is a Software Transactional Memory (STM) meant to make it easy to implement lock-free and wait-free data structures
concurrency  non-blocking 
july 2019
QuickJS Javascript Engine
QuickJS is a small and embeddable Javascript engine. It supports the ES2019 specification including modules, asynchronous generators and proxies
compiler  javascript 
july 2019
SHAPE UP
This book is a guide to how we do product development at Basecamp. It’s also a toolbox full of techniques that you can apply in your own way to your own process.
projectmanagement  process 
july 2019
Partisan
Partisan is the design of an alternative runtime system for improved scalability and reduced latency in actor applications
erlang  actors  performance  overlay 
july 2019
« earlier      
actors addressing agile aha algorithm algorithms alm amqp antiscrum antlr ants apache architecture auction authentication authorization availability backpressure base bash bees bio-inspired blackwave bloomfilter book broadcast browser bsd build business byzantine c c++ cache caching causal ccn charts cli clojure cluster cm collaboration compiler compression concurrency confidentiality configuration consensus consistency constraints containers control-theory coordination coroutines cpu cqrs crdt csp css data data-structures database datacenter dataflow datastructure ddd debugging deployment design dht diagram discovery distributed dns documentation dsl ebpf elixir erasure-coding erlang etl event faas failure-detector fault-detection fault-forecasting fault-prevention fault-removal fault-tolerance feedback-control filesystem firebird flame formal-methods fsm fun functional gcs git go gossip graph graphing grid hadoop haskell hg hiring home-it http hypervisor i18n icons identity init integrity io java javascript jquery json jvm lamport language latency latex laws leader-election lean linux lisp load-balancing logging lore lua maintainability managability management markup math media membership memory messaging metrics mime ml modeling monitoring mttr mysql nat-traversal native nc networking non-blocking observability ocaml oo operations outage overlay p2p pandoc parquet parser paxos pdf performance php postgresql presentation probabilistic-algorithm probability process projectmanagement protocol proxy pubsub python queuing quic quotes r rdf reliability replication resilience rest resume rina rpc rpm rsocket rust safety scalability scaling scheduler scheduling scm search self-stabilization serialization service-location shell simulation smartnic sql statistics storage stream svg systems systems-architecture tcp testing time time-series tla+ toplogy topology transactions tuplespace typeface types typography udp uml unicode unikernel unix usability versioning vim virtualization visualization web webdesign windows workflow writing wxpython xforms xml xmonad xslt zeroconf zmq zookeeper

Copy this bookmark:



description:


tags: