mpm + memory   95

Hash table tradeoffs: CPU, memory, and variability
In this post, I describe a tri-factor model which I find more useful in the analysis of hash table algorithms and discuss several state-of-the-art algorithms in the context of this model.
datastructure  memory  performance 
8 days ago by mpm
Get the most out of the linker map file
In this article, I want to highlight how simple linker map files are and how much they can teach you about the program you are working on.
compiler  memory  debugging 
21 days ago by mpm
cbuffer
A circular buffer written in C using Posix calls to create a contiguously mapped memory space.
datastructure  memory 
22 days ago by mpm
Arachne: Core-Aware Thread Management
Arachne is a new user-level implementation of threads that provides both low latency and high throughput for applications with extremely short-lived threads (only a few microseconds). Arachne is core-aware: each application determines how many cores it needs, based on its load; it always knows exactly which cores it has been allocated, and it controls the placement of its threads on those cores. A central core arbiter allocates cores between applications. Adding Arachne to memcached improved SLO-compliant throughput by 37%, reduced tail latency by more than 10x, and allowed memcached to coexist with background applications with almost no performance impact. Adding Arachne to the RAMCloud storage system increased its write throughput by more than 2.5x. The Arachne threading library is optimized to minimize cache misses; it can initiate a new user thread on a different core (with load balancing) in 320 ns. Arachne is implemented entirely at user level on Linux; no kernel modifications are needed
concurrency  memory 
6 weeks ago by mpm
Black-box Concurrent Data Structures for NUMA Architectures
High-performance servers are non-uniform memory access (NUMA) machines. To fully leverage these machines, programmers need efficient concurrent data structures that are aware of the NUMA performance artifacts. We propose Node Replication (NR), a black-box approach to obtaining such data structures. NR takes an arbitrary sequential data structure and automatically transforms it into a NUMA-aware concurrent data structure satisfying linearizability. Using NR requires no expertise in concurrent data structure design, and the result is free of concurrency bugs. NR draws ideas from two disciplines: shared-memory algorithms and distributed systems. Briefly, NR implements a NUMA-aware shared log, and then uses the log to replicate data structures consistently across NUMA nodes. NR is best suited for contended data structures, where it can outperform lock-free algorithms by 3.1x, and lock-based solutions by 30x. To show the benefits of NR to a real application, we apply NR to the data structures of Redis, an in-memory storage system. The result outperforms other methods by up to 14x. The cost of NR is additional memory for its log and replicas.
concurrency  datastructure  memory 
6 weeks ago by mpm
mimalloc
mimalloc (pronounced "me-malloc") is a general purpose allocator with excellent performance characteristics. Initially developed by Daan Leijen for the run-time systems of the Koka and Lean languages.

It is a drop-in replacement for malloc and can be used in other programs without code changes
memory 
6 weeks ago by mpm
snmalloc
snmalloc is a research allocator. Its key design features are: Memory that is freed by the same thread that allocated it does not require any synchronising operations. Freeing memory in a different thread to initially allocated it, does not take any locks and instead uses a novel message passing scheme to return the memory to the original allocator, where it is recycled. The allocator uses large ranges of pages to reduce the amount of meta-data required.
memory  messaging 
7 weeks ago by mpm
Design of a Modern Cache
Caching is a common approach for improving performance, yet most implementations use strictly classical techniques. In this article we will explore the modern methods used by Caffeine, an open-source Java caching library, that yield high hit rates and excellent concurrency.
caching  concurrency  memory  scalability 
7 weeks ago by mpm
mstat
fine-grained, cgroup-based tool for profiling memory usage over time of a process tree
linux  memory 
8 weeks ago by mpm
I/O Is Faster Than the CPU – Let’s Partition Resources and Eliminate (Most) OS Abstractions
We therefore proposea structure for an OS called parakernel, which eliminates most OS abstractions and provides interfaces for applications to leverage the full potential of the underlying hardware. The parakernel facilitates application-level parallelism by securely partitioning the resources and multiplexing only those resources that are not partitioned.
memory  performance  virtualization 
may 2019 by mpm
Mesh: Compacting Memory Management for C/C++ Applications
Programs written in C/C++ can suffer from serious memory fragmentation, leading to low utilization of memory, degraded performance, and application failure due to memory exhaustion. This paper introduces Mesh, a plug-in replacement for malloc that, for the first time, eliminates fragmentation in unmodified C/C++ applications. Mesh combines novel randomized algorithms with widely-supported virtual memory operations to provably reduce fragmentation, breaking the classical Robson bounds with high probability. Mesh generally matches the runtime performance of state-of-the-art memory allocators while reducing memory consumption; in particular, it reduces the memory of consumption of Firefox by 16% and Redis by 39%.
memory  c++ 
march 2019 by mpm
hardened_malloc
This is a security-focused general purpose memory allocator providing the malloc API along with various extensions. It provides substantial hardening against heap corruption vulnerabilities. The security-focused design also leads to much less metadata overhead and memory waste from fragmentation than a more traditional allocator design. It aims to provide decent overall performance with a focus on long-term performance and memory usage rather than allocator micro-benchmarks. It has relatively fine-grained locking and will offer good scalability once arenas are implemented.
memory  integrity 
february 2019 by mpm
MSG_ZEROCOPY
The MSG_ZEROCOPY flag enables copy avoidance for socket send calls. The feature is currently implemented for TCP sockets.
linux  networking  performance  memory 
january 2019 by mpm
Go Memory Management
In this post we will explore Go memory management
go  memory 
october 2018 by mpm
Memory Management Reference
Welcome to the Memory Management Reference! This is a resource for programmers and computer scientists interested in memory management and garbage collection.
memory 
july 2018 by mpm
Open sourcing oomd, a new approach to handling OOMs
We have developed oomd, a faster, more reliable solution to common out-of-memory (OOM) situations, which works in userspace rather than kernelspace. We designed oomd with two key features: pre-OOM hooks and a custom plugin system. Pre-OOM hooks offer visibility into an OOM before the workload is threatened. The plugin system allows us to specify custom policies that can handle each workload running on a host.
linux  memory 
july 2018 by mpm
Stamp-it: A more Thread-efficient, Concurrent Memory Reclamation Scheme in the C++ Memory Model
We present Stamp-it, a new, concurrent, lock-less memory reclamation scheme with amortized, constant-time (thread-count independent) reclamation overhead. Stamp-it has been implemented and proved correct in the C++ memory model using as weak memory-consistency assumptions as possible. We have likewise (re)implemented six other comparable reclamation schemes. We give a detailed performance comparison, showing that Stamp-it performs favorably (sometimes better, at least as good as) than most of these other schemes while being able to reclaim free memory nodes earlier.
c++  memory  concurrency 
july 2018 by mpm
Optimizing Linux Memory Management for Low-latency / High-throughput Databases
Details of how the Linux kernel manages virtual memory on NUMA (Non Uniform Memory Access) systems. In a nutshell, certain Linux optimizations for NUMA had severe side effects which directly impacted our latencies
memory  linux 
april 2018 by mpm
Allocation Efficiency in High-Performance Go Services
In this post we’ll cover common patterns that lead to inefficiency and production surprises related to memory allocation as well as practical ways of blunting or eliminating these issues. We’ll focus on the key mechanics of the allocator that provide developers a way to get a handle on their memory usage.
go  memory 
april 2018 by mpm
Adventures with Memory Barriers and Seastar on Linux
What does mprotect() have to do with injecting memory barriers on other cores?
memory  c++ 
february 2018 by mpm
Benchmarking and Analysis of Software Data Planes
This technical paper introduces the main concepts underpinning a systematic methodology for benchmarking and analysis of compute-native network SW data plane performance running on modern compute server HW
networking  performance  memory 
february 2018 by mpm
Main Memory Database Systems
Below are two resources that describe the landscape of modern main-memory database systems. The first is a survey/book from Foundations and Trends in Databases, and the second is a slide deck from a VLDB 2016 tutorial. The slides roughly match the content found in the survey. Feel free to contact me with any comments/errors/questions.
database  memory 
january 2018 by mpm
How To Measure the Working Set Size on Linux
The Working Set Size (WSS) is how much memory an application needs to keep working. Your app may have populated 100 Gbytes of main memory, but only uses 50 Mbytes each second to do its job. That's the working set size. It is used for capacity planning and scalability analysis.
memory  linux  scalability 
january 2018 by mpm
An Adaptive Packed-Memory Array
The packed-memory array (PMA) is a data structure that maintains a dynamic set of N elements in sorted order in a Θ(N)-sized array. The idea is to intersperse Θ(N) empty spaces or gaps among the elements so that only a small number of elements need to be shifted around on an insert or delete. Because the elements are stored physically in sorted order in memory or on disk, the PMA can be used to support extremely efficient range queries. Specifically, the cost to scan L consecutive elements is O(1+L/B) memory transfers.
datastructure  memory 
january 2018 by mpm
Reclaiming memory for lock-free data structures: there has to be a better way
Memory reclamation for lock-based data structures is typically easy. However, it is a significant challenge for lock-free data structures. Automatic techniques such as garbage collection are inefficient or use locks, and non-automatic techniques either have high overhead, or do not work for many data structures. For example, subtle problems can arise when hazard pointers, one of the most common non-automatic techniques, are applied to many lock-free data structures. Epoch based reclamation (EBR), which is by far the most efficient non-automatic technique, allows the number of unreclaimed objects to grow without bound, because one crashed process can prevent all other processes from reclaiming memory. We develop a more efficient, distributed variant of EBR that solves this problem. It is based on signaling, which is provided by many operating systems, such as Linux and UNIX. Our new scheme takes O(1) amortized steps per high-level operation on the data structure and O(1) steps in the worst case each time an object is removed from the data structure. At any point, O(mn2) objects are waiting to be freed, where n is the number of processes and m is a small constant for most data structures. Experiments show that our scheme has very low overhead: on average 10%, and at worst 28%, for a balanced binary search tree over many thread counts, operation mixes and contention levels. Our scheme also outperforms a highly tuned implementation of hazard pointers by an average of 75%. Typically, memory reclamation is tightly woven into lock-free data structure code. To improve modularity and facilitate the comparison of different memory reclamation schemes, we also introduce a highly flexible abstraction. It allows a programmer to easily interchange schemes for reclamation, object pooling, allocation and deallocation with virtually no overhead, by changing a single line of code.
non-blocking  memory 
december 2017 by mpm
diiskhash
A simple disk-based hash table (i.e., persistent hash table).

It is a hashtable implemented on memory-mapped disk, so that it can be loaded with a single mmap() system call and used in memory directly (being as fast as an in-memory hashtable once it is loaded from disk).
storage  database  memory  datastructure  c++  python 
july 2017 by mpm
Memory Barriers: a Hardware View for Software Hackers
memory barriers are a necessary evil that is required to enable good performance and scalability, an evil that stems from the fact that CPUs are orders of magnitude faster than are both the interconnects between them and the memory they are attempting to access.
memory  concurrency 
july 2017 by mpm
dangsan
DangSan instruments programs written in C or C++ to invalidate pointers whenever a block of memory is freed, preventing dangling pointers. Instead, whenever such a pointer is dereferenced, it refers to unmapped memory and results in a crash. As a consequence, attackers can no longer exploit dangling pointers.
c++  memory  availability 
april 2017 by mpm
rpmalloc
Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C
memory 
march 2017 by mpm
Using JDK 9 Memory Order Modes
This guide is mainly intended for expert programmers familiar with Java concurrency, but unfamiliar with the memory order modes available in JDK 9 provided by VarHandles. Mostly, it focuses on how to think about modes when developing parallel software
java  concurrency  memory 
march 2017 by mpm
x86 Paging Tutorial
Paging makes it easier to compile and run two programs at the same time on a single computer
memory 
march 2017 by mpm
Memory Error Detection Using GCC
GCC 7, in particular, contains a number of enhancements that help detect several new kinds of programming errors in this area. This article provides a brief overview of these new features
c++  memory 
february 2017 by mpm
C2C
At a high level, “perf c2c” will show you:
* The cachelines where false sharing was detected.
* The readers and writers to those cachelines, and the offsets where those accesses occurred.
linux  performance  memory 
february 2017 by mpm
Large Pages May Be Harmful on NUMA Systems
On NUMA systems the memory is spread across several physical nodes; using large pages may contribute to the imbalance in the distribution of memory controller requests and reduced locality of accesses, both of which can drive up memory latencies
linux  performance  memory 
february 2017 by mpm
Gallery of Processor Cache Effects
In this blog post, I will use code samples to illustrate various aspects of how caches work, and what is the impact on the performance of real-world programs
memory 
october 2016 by mpm
Memory management in C programs
There are several techniques available for memory management in C. In this blog post, I'd like to look at them and discuss their advantages and disadvantages
memory 
july 2016 by mpm
MallocInternals
The GNU C library's (glibc's) malloc library contains a handful of functions that manage allocated memory in the application's address space. The glibc malloc is derived from ptmalloc (pthreads malloc), which is derived from dlmalloc (Doug Lea malloc). This malloc is a "heap" style malloc, which means that chunks of various sizes exist within a larger region of memory (a "heap") as opposed to, for example, an implementation that uses bitmaps and arrays, or regions of same-sized blocks, etc.
memory 
july 2016 by mpm
Close Encounters of The Java Memory Model Kind
In this post, we will try to follow up on particular misunderstandings about Java Memory Model, hopefully on the practical examples
java  memory  concurrency 
june 2016 by mpm
ltalloc
LightweighT Almost Lock-Less Oriented for C++ programs memory allocator
c++  memory  performance 
may 2016 by mpm
Fast Non-intrusive Memory Reclamation for Highly-Concurrent Data Structures
This paper proposes three novel ways to alleviate the costs of the memory barriers associated with hazard pointers and related techniques. These new proposals are backward-compatible with existing code that uses hazard pointers. They move the cost of memory management from the principal code path to the infrequent memory reclamation procedure, significantly reducing or eliminating memory barriers executed on the principal code path.
memory  performance  non-blocking 
may 2016 by mpm
Heap Layers
Heap Layers provides a flexible, template-based infrastructure for composing high-performance memory allocators out of C++ "layers". Heap Layers makes it easy to write high-quality custom and general-purpose memory allocators
c++  memory 
march 2016 by mpm
log-malloc2
log-malloc2 is pre-loadable library tracking all memory allocations of a program. It produces simple text trace output, that makes it easy to find leaks and also identify their origin
memory 
march 2016 by mpm
Controlling Queue Delay
The solution for persistently full buffers, AQM (active queue management), has been known for two decades but has not been widely deployed because of implementation difficulties and general misunderstanding about Internet packet loss and queue dynamics. Unmanaged buffers are more critical today since buffer sizes are larger, delay-sensitive applications are more prevalent, and large (streaming) downloads common. The continued existence of extreme delays at the Internet’s edge can impact its usef...
networking  performance  memory 
january 2016 by mpm
Portable Hardware Locality (hwloc)
The Portable Hardware Locality (hwloc) software package provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various system attributes such as cache and memory information as well as the locality of I/O devices such as network interfaces, InfiniBand HCAs or GPUs. It primarily aims at helping applications with gather...
memory 
january 2016 by mpm
Array Layouts for Comparison-Based Searching
We attempt to determine the best order and search algorithm to store n comparable data items in an array, A, of length n so that we can, for any query value, x, quickly find the smallest value in A that is greater than or equal to x. In particular, we consider the important case where there are many such queries to the same array, A, which resides entirely in RAM. In addition to the obvious sorted order/binary search combination we consider the Eytzinger (BFS) layout normally used for heaps, an implicit B-tree layout that generalizes the Eytzinger layout, and the van Emde Boas layout commonly used in the cache-oblivious algorithms literature.
After extensive testing and tuning on a wide variety of modern hardware, we arrive at the conclusion that, for small values of n, sorted order, combined with a good implementation of binary search is best. For larger values of n, we arrive at the surprising conclusion that the Eytzinger layout is usually the fastest. The latter conclusion is unexpected and goes counter to earlier experimental work by Brodal, Fagerberg, and Jacob (SODA~2003), who concluded that both the B-tree and van Emde Boas layouts were faster than the Eytzinger layout for large values of n.
memory  performance 
january 2016 by mpm
Split-Ordered Lists: Lock-Free Extensible Hash Tables
We present the first lock-free implementation of an extensible hash table running on current architectures. Our algorithm provides concurrent insert, delete, and find operations with an expected O(1) cost. It consists of very simple code, easily implementable using only load, store, and compareand-swap operations. The new mathematical structure at the core of our algorithm is recursive splitordering, a way of ordering elements in a linked list so that they can be repeatedly “split ” using a single compare-and-swap operation. Metaphorically speaking, our algorithm differs from prior known algorithms in that extensibility is derived by “moving the buckets among the items ” rather than “the items among the buckets. ” Though lock-free algorithms are expected to work best in multiprogrammed environments, empirical tests we conducted on a large shared memory multiprocessor show that even in non-multiprogrammed environments, the new algorithm performs as well as the most efficient known lock-based resizable hash-table algorithm, and in high load cases it significantly outperforms it.
datastructure  c++  memory  non-blocking 
december 2015 by mpm
SmoothieMap
SmoothieMap is a java.util.Map implementation with worst write (put(k, v)) operation latencies more than 100 times smaller than in ordinary hash table implementations like java.util.HashMap
java  datastructure  memory 
november 2015 by mpm
scalloc
The problem of concurrent memory allocation is to find the right balance between temporal and spatial performance and scalability across a large range of workloads. Our contributions to address this problem are: uniform treatment of small and big objects through the idea of virtual spans, efficiently and effectively reclaiming unused memory through fast and scalable global data structures, and constant-time (modulo synchronization) allocation and deallocation operations that trade off memory reuse and spatial locality without being subject to false sharing
memory  concurrency  performance 
october 2015 by mpm
Portable umem
This is a port of the Solaris umem memory allocator to other popular operating systems, such as Linux, Windows and BSDish systems (including Darwin/OSX).
memory  linux 
october 2015 by mpm
Fast, Multicore-Scalable, Low-Fragmentation Memory Allocation through Large Virtual Memory and Global Data Structures
We demonstrate that general-purpose memory allocation involving many threads on many cores can be done with high performance, multicore scalability, and low memory consumption. For this purpose, we have designed and implemented scalloc, a concurrent allocator that generally performs and scales in our experiments better than other allocators while using less memory, and is still competitive otherwise. The main ideas behind the design of scalloc are: uniform treatment of small and big objects through so-called virtual spans, efficiently and effectively reclaiming free memory through fast and scalable global data structures, and constant-time (modulo synchronization) allocation and deallocation operations that trade off memory reuse and spatial locality without being subject to false sharing
memory 
september 2015 by mpm
SuperMalloc
A Super Fast Multithreaded malloc() for 64-bit Machines
memory  concurrency 
september 2015 by mpm
scope_stack_alloc
A scoped stack allocator
c++  memory 
july 2015 by mpm
CppMem: Interactive C/C++ memory model
Visualize C++ memory_order constraints
c++  concurrency  memory 
july 2015 by mpm
Linear-log Bucketing: Fast, Versatile, Simple
What do memory allocation, histograms, and event scheduling have in common? They all benefit from rounding values to predetermined buckets, and the same bucketing strategy combines acceptable precision with reasonable space usage for a wide range of values
memory  algorithm 
june 2015 by mpm
Comparative Performance of Memory Reclamation Strategies for Lock-free and Concurrently-readable Data Structures
We compare the costs of three memory reclamation strategies: quiescent-state-based reclamation, epoch-based reclamation, and safe memory reclamation. Our experiments show that changing the workload or execution environment can change which of these schemes is the most efficient.
memory  concurrency  non-blocking 
march 2015 by mpm
stack_alloc
Sometimes you need a container that is almost always going to hold just a few elements, but it must be prepared for the "large" use case as well. It is often advantageous to have the container allocate off of the local stack up to a given size, in order to avoid a dynamic allocation for the common case.
memory  c++ 
march 2015 by mpm
pahole
pahole shows data structure layouts encoded in debugging information formats, DWARF and CTF being supported. This is useful for, among other things: optimizing important data structures by reducing its size, figuring out what is the field sitting at an offset from the start of a data structure, investigating ABI changes and more generally understanding a new codebase you have to work with
datastructure  memory 
february 2015 by mpm
slab
Offheap Java Tuples that look like POJOs with guaranteed memory alignment
java  memory 
february 2015 by mpm
A quick tutorial on implementing and debugging malloc, free, calloc, and realloc
Let’s write a malloc and see how it works with existing programs
memory 
december 2014 by mpm
Portable Hardware Locality
The Portable Hardware Locality (hwloc) software package provides a portable abstraction (across OS, versions, architectures, ...) of the hierarchical topology of modern architectures, including NUMA memory nodes, sockets, shared caches, cores and simultaneous multithreading. It also gathers various system attributes such as cache and memory information as well as the locality of I/O devices such as network interfaces, InfiniBand HCAs or GPUs. It primarily aims at helping applications with gathering information about modern computing hardware so as to exploit it accordingly and efficiently
memory 
december 2014 by mpm
Heaptrack
A heap memory profiler for Linux
memory  c++ 
december 2014 by mpm
plalloc: A simple stateful allocator for node based containers
The problem I’m trying to solve is that node based containers have bad cache behavior because they allocate all over the place. So I wrote a small allocator which gives out memory from a contiguous block. It speeds up std::map and std::unordered_map
c++  memory 
november 2014 by mpm
Nah Lock: A Lock-Free Memory Allocator
The first allocator had poor scaling on par with libc, but I learned enough from it to write a second lockfree allocator that scales approximately linearly up to 30 cores. It scales sublinearly but slightly better than tcmalloc up to 64 cores
memory  performance  non-blocking 
november 2014 by mpm
Atomic/GCCMM/AtomicSync
This is the area most people find confusing when looking at the memory model. Atomic variables are primarily used to synchronize shared memory accesses between threads. Typically one thread creates data, then stores to an atomic. Other threads read from this atomic, and when the expected value is seen, the data the other thread was creating is going to be complete and visible in this thread. The different memory model modes are used to indicate how strong this data-sharing bond is between threads. Knowledgeable programmers can utilize the weaker models to make more efficient software.
memory  concurrency  c++ 
november 2014 by mpm
Relaxed-Memory Concurrency
Multiprocessors are now pervasive and concurrent programming is becoming mainstream, but typical multiprocessors (x86, Sparc, Power, ARM, Itanium) and programming languages (C, C++, Java) do not provide the sequentially consistent shared memory that has been assumed by most work on semantics and verification. Instead, they have subtle relaxed (or weak) memory models, exposing behaviour that arises from hardware and compiler optimisations to the programmer. Moreover, these memory models have usua...
concurrency  c++  memory 
november 2014 by mpm
A visitor’s guide to allocators
Allocators are perhaps one of the most pervasive, yet least loved parts of the C++ standard library. This guide attempts to explain what allocators are and how library authors are supposed to use them. On this brief journey through one of the wilder realms of C++ we will meet allocator traits, pointer traits, fancy pointers and cast operations that will shake even the strongest minds
c++  memory 
august 2014 by mpm
Atomic operations and contention
Today, let’s talk about some of the primitives necessary to build useful systems on top of a coherent cache, and how they work
concurrency  memory 
august 2014 by mpm
Java Memory Model Pragmatics
The Java Memory Model is the most complicated part of Java spec that must be understood by at least library and runtime developers. Unfortunately, it is worded in such a way that it takes a few senior guys to decipher it for each other. Most developers, of course, are not using JMM rules as stated, and instead make a few constructions out of its rules, or worse, blindly copy the constructions from senior developers without understanding the limits of their applicability
java  memory 
july 2014 by mpm
Sharing memory robustly in message-passing systems
Emulators that translate algorithms from the shared-memory model to two different message-passing models are presented. Both are achieved by implementing a wait-free, atomic, single-writer multi-reader register in unreliable, asynchronous networks. The two message-passing models considered are a complete network with processor failures and an arbitrary network with dynamic link failures.
messaging  memory 
june 2014 by mpm
MICA: A Holistic Approach to Fast In-Memory Key-Value Storage
MICA is a scalable in-memory key-value store that handles 65.6 to 76.9 million key-value operations per second using a single general-purpose multi-core system. MICA is over 4–13.5x faster than current state-of-the-art systems, while providing consistently high throughput over a variety of mixed read and write workloads. MICA takes a holistic approach that encompasses all aspects of request handling, including parallel data access, network request handling, and data structure design, but makes unconventional choices in each of the three domains. First, MICA optimizes for multi-core architectures by enabling parallel access to partitioned data. Second, for efficient parallel data access, MICA maps client requests directly to specific CPU cores at the server NIC level by using client-supplied information and adopts a light-weight networking stack that bypasses the kernel. Finally, MICA’s new data structures—circular logs, lossy concurrent hash indexes, and bulk chaining—handle both read- and write-intensive workloads at low overhead
memory  performance 
april 2014 by mpm
The Bw-Tree: A B-tree for New Hardware Platforms
The emergence of new hardware and platforms has led to reconsideration of how data management systems are designed. However, certain basic functions such as key indexed access to records remain essential. While we exploit the common architectural layering of prior systems, we make radically new design decisions about each layer. Our new form of B-tree, called the Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. Our storage manager uses a unique form of log structuring that blurs the distinction between a page and a record store and works well with flash storage.
datastructure  memory  storage 
april 2014 by mpm
Memory Pool System
The Memory Pool System is a flexible and adaptable memory manager. Among its many advantages are an incremental garbage collector with very low pause times, and an extremely robust implementation
memory 
march 2014 by mpm
The Lost Art of C Structure Packing
This page is about a technique for reducing the memory footprint of C programs - manually repacking C structure declarations for reduced size.
memory 
january 2014 by mpm
jol
jol (Java Object Layout) is the tiny toolbox to analyze object layout schemes in JVMs. These tools are using Unsafe heavily to deduce the actual object layout and footprint. This makes the tools much more accurate than others relying on heap dumps, specification assumptions, etc.
java  jvm  memory 
december 2013 by mpm
nedtries
So what if I told you that for the very common case of a pointer sized key lookup that the standard assumptions are wrong? What if there was an algorithm which provides one of the big advantages of ordered indexation which is close fit finds, except it has nearly O(1) complexity rather than O(log N) and is therefore 100% faster? What if, in fact, this algorithm is a good 20% faster than the typical O(1) hash table implementation for medium sized collections and is no slower even at 10,000 items?
datastructure  memory 
october 2013 by mpm
Optimizing Linux Memory Management for Low-latency / High-throughput Databases
The first part of the document provides the relevant background information: an outline of how GraphDB manages its data, the symptoms of our problem, and how the Linux Virtual Memory Management (VMM) subsystem works. In the second part of the document, we will detail the methodology, observations and conclusions from our experiments in getting to the root cause of the problem. We end with a summary of the lessons we have learned.
database  linux  performance  memory 
october 2013 by mpm
Apache DirectMemory
Apache DirectMemory is a off-heap cache for the Java Virtual Machine
java  jvm  memory  caching 
september 2013 by mpm
java-object-layout
These are the very tiny tools to analyze object layout schemes in JVMs. These tools are using Unsafe heavily to deduce the *actual* object layout and footprint. This makes the tools much more accurate than others relying on heap dumps, specification assumptions, etc.
java  memory 
july 2013 by mpm
« earlier      
per page:    204080120160

Copy this bookmark:



description:


tags: