jm + numa   4

How to receive a million packets per second on Linux

To sum up, if you want a perfect performance you need to:
Ensure traffic is distributed evenly across many RX queues and SO_REUSEPORT processes. In practice, the load usually is well distributed as long as there are a large number of connections (or flows).
You need to have enough spare CPU capacity to actually pick up the packets from the kernel.
To make the things harder, both RX queues and receiver processes should be on a single NUMA node.
linux  networking  performance  cloudflare  packets  numa  so_reuseport  sockets  udp 
june 2015 by jm
likwid
'Lightweight performance tools'.
Likwid stands for 'Like I knew what I am doing'. This project contributes easy to use command line tools for Linux to support programmers in developing high performance multi-threaded programs. It contains the following tools:

likwid-topology: Show the thread and cache topology
likwid-perfctr: Measure hardware performance counters on Intel and AMD processors
likwid-features: Show and Toggle hardware prefetch control bits on Intel Core 2 processors
likwid-pin: Pin your threaded application without touching your code (supports pthreads, Intel OpenMP and gcc OpenMP)
likwid-bench: Benchmarking framework allowing rapid prototyping of threaded assembly kernels
likwid-mpirun: Script enabling simple and flexible pinning of MPI and MPI/threaded hybrid applications
likwid-perfscope: Frontend for likwid-perfctr timeline mode. Allows live plotting of performance metrics.
likwid-powermeter: Tool for accessing RAPL counters and query Turbo mode steps on Intel processor.
likwid-memsweeper: Tool to cleanup ccNUMA memory domains.


No kernel patching required. (via kellabyte)
via:kellabyte  linux  performance  testing  perf  likwid  threading  multithreading  multicore  mpi  numa 
january 2014 by jm
Response to "Optimizing Linux Memory Management..."
A follow up to the LinkedIn VM-tuning blog post at http://engineering.linkedin.com/performance/optimizing-linux-memory-management-low-latency-high-throughput-databases --
Do not read in to this article too much, especially for trying to understand how the Linux VM or the kernel works.  The authors misread the "global spinlock on the zone" source code and the interpretation in the article is dead wrong.
linux  tuning  vm  kernel  linkedin  memory  numa 
october 2013 by jm
The MySQL “swap insanity” problem and the effects of the NUMA architecture
very interesting; modern multicore x86 architectures use a NUMA memory architecture, which can cause a dip into swap, even when there appears to be plenty of free RAM available
linux  memory  mysql  optimization  performance  swap  tuning  vm  numa  swap-insanity  swapping  from delicious
september 2010 by jm

Copy this bookmark:



description:


tags: