jm + linux   187

Linux Load Averages: Solving the Mystery
Nice bit of OS archaeology by Brendan Gregg.
In 1993, a Linux engineer found a nonintuitive case with load averages, and with a three-line patch changed them forever from "CPU load averages" to what one might call "system load averages." His change included tasks in the uninterruptible state, so that load averages reflected demand for disk resources and not just CPUs. These system load averages count the number of threads working and waiting to work, and are summarized as a triplet of exponentially-damped moving sum averages that use 1, 5, and 15 minutes as constants in an equation. This triplet of numbers lets you see if load is increasing or decreasing, and their greatest value may be for relative comparisons with themselves.
load  monitoring  linux  unix  performance  ops  brendan-gregg  history  cpu 
5 weeks ago by jm
Amazon Web Services Elastic Compute Cloud (EC2) Rescue for Linux is a python-based tool that allows for the automatic diagnosis of common problems found on EC2 Linux instances.

Most of the modules appear to be log-greppers looking for common kernel issues.
ec2  aws  kernel  linux  ec2rl  ops 
9 weeks ago by jm
NVD - CVE-2016-10229
udp.c in the Linux kernel before 4.5 allows remote attackers to execute arbitrary code via UDP traffic that triggers an unsafe second checksum calculation during execution of a recv system call with the MSG_PEEK flag.
udp  security  cve  linux  msg_peek  exploits 
april 2017 by jm
Ubuntu on AWS gets serious performance boost with AWS-tuned kernel
interesting -- faster boots, CPU throttling resolved on t2.micros, other nice stuff
aws  ubuntu  ec2  kernel  linux  ops 
april 2017 by jm
How to stop Ubuntu Xenial (16.04) from randomly killing your big processes
Unfortunately, a bug was recently introduced into the allocator which made it sometimes not try hard enough to free kernel cache memory before giving up and invoking the OOM killer. In practice, this means that at random times, the OOM killer would strike at big processes when the kernel tries to allocate, say, 16 kilobytes of memory for a new process’s thread stack — even when there are many gigabytes of memory in reclaimable kernel caches!
oom-killer  ooms  linux  ops  16.04 
march 2017 by jm
a decent SBC which apparently has enough power to drive Plex transcoding
plex  linux  sbc  embedded  odroid  xu4 
february 2017 by jm
[net-next,14/14] tcp_bbr: add BBR congestion control
This commit implements a new TCP congestion control algorithm: BBR
(Bottleneck Bandwidth and RTT). A detailed description of BBR will be
published in ACM Queue, Vol. 14 No. 5, September-October 2016, as
"BBR: Congestion-Based Congestion Control".

BBR has significantly increased throughput and reduced latency for
connections on Google's internal backbone networks and and
YouTube Web servers.

BBR requires only changes on the sender side, not in the network or
the receiver side. Thus it can be incrementally deployed on today's
Internet, or in datacenters. [....]

Signed-off-by: Van Jacobson <>
google  linux  tcp  ip  congestion-control  bufferbloat  patches  algorithms  rtt  bandwidth  youtube  via:bradfitz 
september 2016 by jm
Ratas - A hierarchical timer wheel
excellent explanation and benchmarks of a timer wheel implementation
timer-wheels  timing-wheels  algorithms  c  linux  timers  data-structures 
august 2016 by jm
USE Method: Linux Performance Checklist
Really late in bookmarking this, but has some up-to-date sample commandlines for sar, mpstat and iostat on linux
linux  sar  iostat  mpstat  cli  ops  sysadmin  performance  tuning  use  metrics 
june 2016 by jm
[RFE] add a way to run in a new systemd scope automatically · Issue #428 · tmux/tmux
omgwtfbbq. 1: User reports that their gnome session leaks processes; 2: systemd modifies default session behaviour to kill all processes, including screen/tmux; 3: _everyone_ complains because they break 30 years of UNIX process semantics, then 4: they request that tmux/screen hack their shit to workaround their brokenness. Get fucked, systemd. This is the kind of shit that would finally drive me to BSDland
systemd  horror  linux  fail  unix  gnome  tmux  bugs  omgwtfbbq 
may 2016 by jm
#825394 - systemd kill background processes after user logs out - Debian Bug report logs
Systemd breaks UNIX behaviour which has been standard practice for 30 years:
It is now indeed the case that any background processes that were still
running are killed automatically when the user logs out of a session,
whether it was a desktop session, a VT session, or when you SSHed into a
machine. Now you can no longer expect a long running background processes to
continue after logging out. I believe this breaks the expectations of
many users. For example, you can no longer start a screen or tmux
session, log out, and expect to come back to it.
systemd  ops  debian  linux  fail  background  cli  commandline 
may 2016 by jm
command line utility that performs an HTML element selection on HTML content passed to the stdin. Using css selectors that everybody knows. Since input comes from stdin and output is sent to stdout, it can easily be used inside traditional UNIX pipelines to extract content from webpages and html files. tq provides extra formating options such as json-encoding or newlines squashing, so it can play nicely with everyones favourite command line tooling.
tq  linux  unix  cli  command-line  html  parsing  css  tools 
may 2016 by jm
raboof/nethogs: Linux 'net top' tool
NetHogs is a small 'net top' tool. Instead of breaking the traffic down per protocol or per subnet, like most tools do, it groups bandwidth by process.
nethogs  cli  networking  performance  measurement  ops  linux  top 
may 2016 by jm
Linux kernel bug delivers corrupt TCP/IP data to Mesos, Kubernetes, Docker containers — Vijay Pandurangan
Bug in the "veth" driver skips TCP checksums. Reminder: app-level checksums are important
checksums  tcp  veth  ethernet  drivers  linux  kernel  bugs  docker 
april 2016 by jm
The revenge of the listening sockets
More adventures in debugging the Linux kernel:
You can't have a very large number of bound TCP sockets and we learned that the hard way. We learned a bit about the Linux networking stack: the fact that LHTABLE is fixed size and is hashed by destination port only. Once again we showed a couple of powerful of System Tap scripts.
ops  linux  networking  tcp  network  lhtable  kernel 
april 2016 by jm
'Devastating' bug pops secure doors at airports, hospitals
"A command injection vulnerability exists in this function due to a lack of any sanitisation on the user-supplied input that is fed to the system() call," Lawshae says.

security  iot  funny  fail  linux  unix  backticks  system  udp  hid  vertx  edge 
april 2016 by jm
Dynamic tracing tools for Linux, a la dtrace, ktrace, etc. Built using BPF, using kernel features in the 4.x kernel series, requiring at least version 4.1 of the kernel
linux  tracing  bpf  dynamic  ops 
april 2016 by jm
The Nyquist theorem and limitations of sampling profilers today, with glimpses of tracing tools from the future
Awesome post from Dan Luu with data from Google:
The cause [of some mystery widespread 250ms hangs] was kernel throttling of the CPU for processes that went beyond their usage quota. To enforce the quota, the kernel puts all of the relevant threads to sleep until the next multiple of a quarter second. When the quarter-second hand of the clock rolls around, it wakes up all the threads, and if those threads are still using too much CPU, the threads get put back to sleep for another quarter second. The phase change out of this mode happens when, by happenstance, there aren’t too many requests in a quarter second interval and the kernel stops throttling the threads. After finding the cause, an engineer found that this was happening on 25% of disk servers at Google, for an average of half an hour a day, with periods of high latency as long as 23 hours. This had been happening for three years. Dick Sites says that fixing this bug paid for his salary for a decade. This is another bug where traditional sampling profilers would have had a hard time. The key insight was that the slowdowns were correlated and machine wide, which isn’t something you can see in a profile.
debugging  performance  visualization  instrumentation  metrics  dan-luu  latency  google  dick-sites  linux  scheduler  throttling  kernel  hangs 
february 2016 by jm
interactive menu selection for the UNIX command line
cli  linux  unix  grep  menus  selection  ui  interactive  terminal 
february 2016 by jm
Files Are Hard
This is basically terrifying. A catalog of race conditions and reliability horrors around the POSIX filesystem abstraction in Linux -- it's a wonder anything works.

'Where’s this documented? Oh, in some mailing list post 6-8 years ago (which makes it 12-14 years from today). The fs devs whose posts I’ve read are quite polite compared to LKML’s reputation, and they generously spend a lot of time responding to basic questions, but it’s hard for outsiders to troll [sic] through a decade and a half of mailing list postings to figure out which ones are still valid and which ones have been obsoleted! I don’t mean to pick on filesystem devs. In their OSDI 2014 talk, the authors of the paper we’re discussing noted that when they reported bugs they’d found, developers would often respond “POSIX doesn’t let filesystems do that”, without being able to point to any specific POSIX documentation to support their statement. If you’ve followed Kyle Kingsbury’s Jepsen work, this may sound familiar, except devs respond with “filesystems don’t do that” instead of “networks don’t do that”.I think this is understandable, given how much misinformation is out there. Not being a filesystem dev myself, I’d be a bit surprised if I don’t have at least one bug in this post.'
filesystems  linux  unix  files  operating-systems  posix  fsync  osdi  papers  reliability 
december 2015 by jm
Low-latency journalling file write latency on Linux
great research from LMAX: xfs/ext4 are the best choices, and they explain why in detail, referring to the code
linux  xfs  ext3  ext4  filesystems  lmax  performance  latency  journalling  ops 
december 2015 by jm
Structural and semantic deficiencies in the systemd architecture for real-world service management, a technical treatise
Despite its overarching abstractions, it is semantically non-uniform and its complicated transaction and job scheduling heuristics ordered around a dependently networked object system create pathological failure cases with little debugging context that would otherwise not necessarily occur on systems with less layers of indirection. The use of bus APIs complicate communication with the service manager and lead to duplication of the object model for little gain. Further, the unit file options often carry implicit state or are not sufficiently expressive. There is an imbalance with regards to features of an eager service manager and that of a lazy loading service manager, having rusty edge cases of both with non-generic, manager-specific facilities. The approach to logging and the circularly dependent architecture seem to imply that lots of prior art has been ignored or understudied.
analysis  systemd  linux  unix  ops  init  critiques  software  logging 
november 2015 by jm
Amazon ECS CLI Tutorial - Amazon EC2 Container Service
super-basic ECS tutorial, using a docker-compose.yml to create a new ECS-managed service fleet
ecs  cli  linux  aws  ec2  hosting  docker  tutorials 
october 2015 by jm
C++ high-performance app framework; 'currently focused on high-throughput, low-latency I/O intensive applications.'

Scylla (Cassandra-compatible NoSQL store) is written in this.
c++  opensource  performance  framework  scylla  seastar  latency  linux  shared-nothing  multicore 
september 2015 by jm
Open source security team has had enough of embedded-systems vendors taking the piss with licensing:
This announcement is our public statement that we've had enough. Companies in the embedded industry not playing by the same rules as every other company using our software violates users' rights, misleads users and developers, and harms our ability to continue our work. Though I've only gone into depth in this announcement on the latest trademark violation against us, our experience with two GPL violations over the previous year have caused an incredible amount of frustration. These concerns are echoed by the complaints of many others about the treatment of the GPL by the embedded Linux industry in particular over many years.

With that in mind, today's announcement is concerned with the future availability of our stable series of patches. We decided that it is unfair to our sponsors that the above mentioned unlawful players can get away with their activity. Therefore, two weeks from now, we will cease the public dissemination of the stable series and will make it available to sponsors only. The test series, unfit in our view for production use, will however continue to be available to the public to avoid impact to the Gentoo Hardened and Arch Linux communities. If this does not resolve the issue, despite strong indications that it will have a large impact, we may need to resort to a policy similar to Red Hat's, described here or eventually stop the stable series entirely as it will be an unsustainable development model.
culture  gpl  linux  opensource  security  grsecurity  via:nelson  gentoo  arch-linux  gnu 
august 2015 by jm
Why Docker is Not Yet Succeeding Widely in Production
Spot-on points which Docker needs to address. It's still production-ready, and _should_ be used there, it just has significant rough edges...
docker  containers  devops  deployment  releases  linux  ops 
july 2015 by jm
Benchmarking GitHub Enterprise - GitHub Engineering
Walkthrough of debugging connection timeouts in a load test. Nice graphs (using matplotlib)
github  listen-backlog  tcp  debugging  timeouts  load-testing  benchmarking  testing  ops  linux 
july 2015 by jm
Patrick Shuff - Building A Billion User Load Balancer - SCALE 13x - YouTube
'Want to learn how Facebook scales their load balancing infrastructure to support more than 1.3 billion users? We will be revealing the technologies and methods we use to route and balance Facebook's traffic. The Traffic team at Facebook has built several systems for managing and balancing our site traffic, including both a DNS load balancer and a software load balancer capable of handling several protocols. This talk will focus on these technologies and how they have helped improve user performance, manage capacity, and increase reliability.'

Can't find the standalone slides, unfortunately.
facebook  video  talks  lbs  load-balancing  http  https  scalability  scale  linux 
june 2015 by jm
How to receive a million packets per second on Linux

To sum up, if you want a perfect performance you need to:
Ensure traffic is distributed evenly across many RX queues and SO_REUSEPORT processes. In practice, the load usually is well distributed as long as there are a large number of connections (or flows).
You need to have enough spare CPU capacity to actually pick up the packets from the kernel.
To make the things harder, both RX queues and receiver processes should be on a single NUMA node.
linux  networking  performance  cloudflare  packets  numa  so_reuseport  sockets  udp 
june 2015 by jm
Why I dislike systemd
Good post, and hard to disagree.
One of the "features" of systemd is that it allows you to boot a system without needing a shell at all. This seems like such a senseless manoeuvre that I can't help but think of it as a knee-jerk reaction to the perception of Too Much Shell in sysv init scripts.
In exactly which universe is it reasonable to assume that you have a running D-Bus service (or kdbus) and a filesystem containing unit files, all the binaries they refer to, all the libraries they link against, and all the configuration files any of them reference, but that you lack that most ubiquitous of UNIX binaries, /bin/sh?
history  linux  unix  systemd  bsd  system-v  init  ops  dbus 
june 2015 by jm
Five different ways to handle leap seconds with NTP
Without switching to chronyd, ntpd -x sounds not too suboptimal:
With ntpd, the kernel backward step is used by default. With ntpd versions before 4.2.6, or 4.2.6 and later patched for this bug, the -x option (added to /etc/sysconfig/ntpd) can be used to disable the kernel leap second correction and ignore the leap second as far as the local clock is concerned. The one-second error gained after the leap second will be measured and corrected later by slewing in normal operation using NTP servers which already corrected their local clocks.

It's all pretty messy though :(
ntpd  ntp  chronyd  clocks  time  synchronization  via:fanf  linux  leap-seconds 
june 2015 by jm
Linux futex_wait() bug
major bug in kernel versions 3.14 - 3.18 on Haswell hardware
haswell  linux  futex_wait  futexes  kernel  bugs  hang 
may 2015 by jm
Intel speeds up etcd throughput using ADR Xeon-only hardware feature
To reduce the latency impact of storing to disk, Weaver’s team looked to buffering as a means to absorb the writes and sync them to disk periodically, rather than for each entry. Tradeoffs? They knew memory buffers would help, but there would be potential difficulties with smaller clusters if they violated the stable storage requirement.

Instead, they turned to Intel’s silicon architects about features available in the Xeon line. After describing the core problem, they found out this had been solved in other areas with ADR. After some work to prove out a Linux OS supported use for this, they were confident they had a best-of-both-worlds angle. And it worked. As Weaver detailed in his CoreOS Fest discussion, the response time proved stable. ADR can grab a section of memory, persist it to disk and power it back. It can return entries back to disk and restore back to the buffer. ADR provides the ability to make small (<100MB) segments of memory “stable” enough for Raft log entries. It means it does not need battery-backed memory. It can be orchestrated using Linux or Windows OS libraries. ADR allows the capability to define target memory and determine where to recover. It can also be exposed directly into libs for runtimes like Golang. And it uses silicon features that are accessible on current Intel servers.
kubernetes  coreos  adr  performance  intel  raft  etcd  hardware  linux  persistence  disk  storage  xeon 
may 2015 by jm
Red Hat on rkt vs Docker
This is like watching a train-wreck in slow motion on Groundhog Day. We, in the broader Linux and open source community, have been down this path multiple times over the past fifteen years, specifically with package formats. While there needs to be room for experimentation, having two incompatible specs driven by two startups trying to differentiate and in direct competition is *not* a good thing. It would be better for the community and for everyone who depends on our collective efforts if CoreOS and Docker collaborated on a standardized common spec, image format, and distribution protocol. To this end, we at Red Hat will continue to contribute to both initiatives with the goal of driving convergence.
rkt  docker  appc  coreos  red-hat  dpkg  rpm  linux  packaging  collaboration  open-source 
may 2015 by jm
The Discovery of Apache ZooKeeper's Poison Packet - PagerDuty
Excellent deep dive into a production issue. Root causes: crappy error handling code in Zookeeper; lack of bounds checking in ZK; and a nasty kernel bug.
zookeeper  bugs  error-handling  bounds-checking  oom  poison-packets  pagerduty  packets  tcpdump  xen  aes  linux  kernel 
may 2015 by jm
Static code analysis for shell scripts (via Tony Finch)
bash  cli  sh  linux  shell  coding  static-analysis  lint 
april 2015 by jm
'a secret management and distribution service [from Square] that is now available for everyone. Keywhiz helps us with infrastructure secrets, including TLS certificates and keys, GPG keyrings, symmetric keys, database credentials, API tokens, and SSH keys for external services — and even some non-secrets like TLS trust stores. Automation with Keywhiz allows us to seamlessly distribute and generate the necessary secrets for our services, which provides a consistent and secure environment, and ultimately helps us ship faster. [...]

Keywhiz has been extremely useful to Square. It’s supported both widespread internal use of cryptography and a dynamic microservice architecture. Initially, Keywhiz use decoupled many amalgamations of configuration from secret content, which made secrets more secure and configuration more accessible. Over time, improvements have led to engineers not even realizing Keywhiz is there. It just works. Please check it out.'
square  security  ops  keys  pki  key-distribution  key-rotation  fuse  linux  deployment  secrets  keywhiz 
april 2015 by jm
Yelp Product & Engineering Blog | True Zero Downtime HAProxy Reloads
Using tc and qdisc to delay SYNs while haproxy restarts. Definitely feels like on-host NAT between 2 haproxy processes would be cleaner and easier though!
linux  networking  hacks  yelp  haproxy  uptime  reliability  tcp  tc  qdisc  ops 
april 2015 by jm
Introducing Vector: Netflix's On-Host Performance Monitoring Tool
It gives pinpoint real-time performance metric visibility to engineers working on specific hosts -- basically sending back system-level performance data to their browser, where a client-side renderer turns it into a usable dashboard. Essentially the idea is to replace having to ssh onto instances, run "top", systat, iostat, and so on.
vector  netflix  performance  monitoring  sysstat  top  iostat  netstat  metrics  ops  dashboards  real-time  linux 
april 2015 by jm
Gil Tene's "usual suspects" to reduce system-level hiccups/latency jitters in a Linux system
Based on empirical evidence (across many tens of sites thus far) and note-comparing with others, I use a list of "usual suspects" that I blame whenever they are not set to my liking and system-level hiccups are detected. Getting these settings right from the start often saves a bunch of playing around (and no, there is no "priority" to this - you should set them all right before looking for more advice...).
performance  latency  hiccups  gil-tene  tuning  mechanical-sympathy  hyperthreading  linux  ops 
april 2015 by jm
Transparent huge pages implicated in Redis OOM
A nasty real-world prod error scenario worsened by THPs:
jemalloc(3) extensively uses madvise(2) to notify the operating system that it's done with a range of memory which it had previously malloc'ed. The page size on this machine is 2MB because transparent huge pages are in use. As such, a lot of the memory which is being marked with madvise(..., MADV_DONTNEED) is within substantially smaller ranges than 2MB. This means that the operating system never was able to evict pages which had ranges marked as MADV_DONTNEED because the entire page has to be unneeded to allow a page to be reused. Despite initially looking like a leak, the operating system itself was unable to free memory because of madvise(2) and transparent huge pages. This led to sustained memory pressure on the machine and redis-server eventually getting OOM killed.
oom-killer  oom  linux  ops  thp  jemalloc  huge-pages  madvise  redis  memory 
march 2015 by jm
Using Named Pipes and Process Substitution in Bioinformatics
Wow. I've been using bash for nigh on 14 years and I didn't know about process substitution. Nifty trick
bash  linux  pipes  shell  unix  via:igrigorik  cli  named-pipes  process-substitution 
march 2015 by jm
A project to reduce systemd to a base initd, process supervisor and transactional dependency system, while minimizing intrusiveness and isolationism. Basically, it’s systemd with the superfluous stuff cut out, a (relatively) coherent idea of what it wants to be, support for non-glibc platforms and an approach that aims to minimize complicated design. uselessd is still in its early stages and it is not recommended for regular use or system integration.

This may be the best option to evade the horrors of systemd.
init  linux  systemd  unix  ops  uselessd 
march 2015 by jm
Ubuntu To Officially Switch To systemd Next Monday - Slashdot
Jesus. This is going to be the biggest shitfest in the history of Linux...
linux  slashdot  ubuntu  systemd  init  unix  ops 
march 2015 by jm
What Color Is Your Xen?
What a mess.
What's faster: PV, HVM, HVM with PV drivers, PVHVM, or PVH? Cloud computing providers using Xen can offer different virtualization "modes", based on paravirtualization (PV), hardware virtual machine (HVM), or a hybrid of them. As a customer, you may be required to choose one of these. So, which one?
ec2  linux  performance  aws  ops  pv  hvm  xen  virtualization 
february 2015 by jm
Nice wrapper for 'tc' and 'netem', for network latency/packet loss emulation
networking  testing  linux  tc  netem  latency  packet-loss  iptables 
january 2015 by jm
How TCP backlog works in Linux
good description of the process
ip  linux  tcp  networking  backlog  ops 
january 2015 by jm
ODROID-C1 - Multicore credit card computer
Pretty amazing specs for a 33 quid SBC.
Amlogic ARM® Cortex®-A5(ARMv7) 1.5Ghz quad core CPUs 

* Mali™-450 MP2 GPU (OpenGL ES 2.0/1.1 enabled for Linux and Android)

* 1Gbyte DDR3 SDRAM

* Gigabit Ethernet

* 40pin GPIOs

* eMMC4.5 HS200 Flash Storage slot / UHS-1 SDR50 MicroSD Card slot

* USB 2.0 Host x 4, USB OTG x 1,

* Infrared(IR) Receiver

* Uses Ubuntu 14.04 or Android KitKat operating systems

Includes HDMI out. (via Conor O'Neill)
via:conoro  uk  sbc  hacking  linux  hardware  odroid  gadgets 
january 2015 by jm
Nice trick -- wrap servers with a libc wrapper to intercept bind(2) and accept(2) calls, so that transparent restarts becode possible
linux  ops  servers  uptime  restarting  libc  bind  accept  sockets 
january 2015 by jm
Hack workaround to get JVM thread priorities working on Linux
As used in Cassandra ( )!
if you just set the "ThreadPriorityPolicy" to something else than the legal values 0 or 1, [...] a slight logic bug in Sun's JVM code kicks in, and thus sets the policy to be as if running with root - thus you get exactly what one desire. The operating system, Linux, won't allow priorities to be heightened above "Normal" (negative nice value), and thus just ignores those requests (setting it to normal instead, nice value 0) - but it lets through the requests to set it lower (setting the nice value to some positive value).
cassandra  thread-priorities  threads  java  jvm  linux  nice  hacks 
january 2015 by jm
Sportsfriends on Steam
omg, Die Gute Fabrik's game collection featuring the AMAZING Johann Sebastian Joust -- now available on Mac, Linux and (missing JSJ) Windows. Time to buy an assload of Move controllers!
jsj  johann-sebastian-joust  games  fun  die-gute-fabrik  sportsfriends  gaming  linux  mac 
december 2014 by jm
Two recent systemd crashes
Hey look, PID 1 segfaulting! I haven't seen that happen since we managed to corrupt /bin/sh on Ultrix in 1992. Nice work Fedora
fedora  reliability  unix  linux  systemd  ops  bugs 
december 2014 by jm
"git for operating system binaries".

OSTree is a tool for managing bootable, immutable, versioned filesystem trees. It is not a package system; nor is it a tool for managing full disk images. Instead, it sits between those levels, offering a blend of the advantages (and disadvantages) of both.

You can use any build system you like to place content into it on a build server, then export an OSTree repository via static HTTP. On each client system, "ostree admin upgrade" can incrementally replicate that content, creating a new root for the next reboot. This provides fully atomic upgrades. Any changes made to /etc are propagated forwards, and all local state in /var is shared.

A key goal of the project is to complement existing package systems like RPM and Debian packages, and help further their evolution. In particular for example, RPM-OSTree (linked below) has as a goal a hybrid tree/package model, where you replicate a base tree via OSTree, and then add packages on top.
os  gnome  git  linux  immutable  deployment  packaging  via:fanf 
december 2014 by jm
Announcing Snappy Ubuntu
Awesome! I was completely unaware this was coming down the pipeline.
A new, transactionally updated Ubuntu for the cloud. Ubuntu Core is a new rendition of Ubuntu for the cloud with transactional updates. Ubuntu Core is a minimal server image with the same libraries as today’s Ubuntu, but applications are provided through a simpler mechanism. The snappy approach is faster, more reliable, and lets us provide stronger security guarantees for apps and users — that’s why we call them “snappy” applications.

Snappy apps and Ubuntu Core itself can be upgraded atomically and rolled back if needed — a bulletproof approach to systems management that is perfect for container deployments. It’s called “transactional” or “image-based” systems management, and we’re delighted to make it available on every Ubuntu certified cloud.
ubuntu  linux  packaging  snappy  ubuntu-core  transactional-updates  apt  docker  ops 
december 2014 by jm
CoreOS is building a container runtime, Rocket
Whoa, trouble at mill in Dockerland!
When Docker was first introduced to us in early 2013, the idea of a “standard container” was striking and immediately attractive: a simple component, a composable unit, that could be used in a variety of systems. The Docker repository included a manifesto of what a standard container should be. This was a rally cry to the industry, and we quickly followed. Brandon Philips, co-founder/CTO of CoreOS, became a top Docker contributor, and now serves on the Docker governance board. CoreOS is one of the most widely used platforms for Docker containers, and ships releases to the community hours after they happen upstream. We thought Docker would become a simple unit that we can all agree on.

Unfortunately, a simple re-usable component is not how things are playing out. Docker now is building tools for launching cloud servers, systems for clustering, and a wide range of functions: building images, running images, uploading, downloading, and eventually even overlay networking, all compiled into one monolithic binary running primarily as root on your server. The standard container manifesto was removed. We should stop talking about Docker containers, and start talking about the Docker Platform. It is not becoming the simple composable building block we had envisioned.
coreos  docker  linux  containers  open-source  politics  rocket 
december 2014 by jm
/dev/full - Wikipedia, the free encyclopedia
This is handy!

'In Linux, /dev/full or the always full device[1][2] is a special file that always returns the error code ENOSPC (meaning "No space left on device") on writing, and provides an infinite number of null characters to any process that reads from it (similar to /dev/zero). This device is usually used when testing the behaviour of a program when it encounters a "disk full" error.'
dev  /dev/full  filesystems  devices  linux  testing  enospc  error-handling 
november 2014 by jm
A curated list of Docker resources.
linux  sysadmin  docker  ops  devops  containers  hosting 
november 2014 by jm
Linus Torvalds and others on Linux's systemd
ZDNet's Steven J. Vaughan-Nichols on the systemd mess (via Kragen)
via:kragen  systemd  linux  ubuntu  gnome  init  ops 
october 2014 by jm
a simple, lightweight HTTP server for storing and distributing custom Debian packages around your organisation. It is designed to make it as easy as possible to use Debian packages for code deployments and to ease other system administration tasks.
debian  apt  sysadmin  linux  ops  packaging 
october 2014 by jm
"Linux Containers And The Future Cloud" [slides]
by Rami Rosen -- extremely detailed presentation into the state of Linux containers, LXC, Docker, namespaces, cgroups, and checkpoint/restore in userspace (via lusis)
lsx  docker  criu  namespaces  cgroups  linux  via:lusis  ops  containers  rami-rosen  presentations 
october 2014 by jm
'a system for allowing servers with encrypted root file systems to reboot unattended and/or remotely.' (via Tony Finch)
via:fanf  mandos  encryption  security  server  ops  sysadmin  linux 
october 2014 by jm
$0.99 VPS hosting
this is nuts. 99 cents per month for a super-cheap host -- I'm sure there's a use case for this (via Elliot)
via:elliot  cheap  hosting  ssd  vps  linux  atlantic  1-dollar 
october 2014 by jm
The End of Linux
'Linux is becoming the thing that we adopted Linux to get away from.'

Great post on the horrible complexity of systemd. It reminds me of nothing more than mid-90s AIX, which I had the displeasure of opsing for a while -- the Linux distros have taken a very wrong turn here.
linux  unix  complexity  compatibility  ops  rant  systemd  bloat  aix 
september 2014 by jm
oss-sec: Re: CVE-2014-6271: remote code execution through bash
this is truly heinous. Given that any CGI which invokes popen()/system() on a Linux system where /bin/sh is a link to bash is vulnerable, there will be a lot of vulnerable services out there (via Elliot)
via:elliottucker  cgi  security  bash  sh  exploits  linux  popen  unix 
september 2014 by jm
get page cache statistics for files.
A common question when tuning databases and other IO-intensive applications is, "is Linux caching my data or not?" pcstat gets that information for you using the mincore(2) syscall. I wrote this is so that Apache Cassandra users can see if ssTables are being cached.
linux  page-cache  caching  go  performance  cassandra  ops  mincore  fincore 
september 2014 by jm
A nice curl/wget replacement which supports multi-TCP-connection downloads of HTTP/FTP resources. packaged for most Linux variants and OSX via brew
axel  curl  wget  via:johnke  downloading  tcp  http  ftp  ubuntu  debian  unix  linux 
september 2014 by jm
The State of ZFS on Linux
Linux users familiar with other filesystems or ZFS users from other platforms will often ask whether ZFS on Linux (ZoL) is “stable”. The short answer is yes, depending on your definition of stable. The term stable itself is somewhat ambiguous.

Oh dear. that's not a good start. Good reference page, though
zfs  linux  filesystems  ops  solaris 
september 2014 by jm
Chris Baus: TCP_CORK: More than you ever wanted to know
Even with buffered streams the application must be able to instruct the OS to forward all pending data when the stream has been flushed for optimal performance. The application does not know where packet boundaries reside, hence buffer flushes might not align on packet boundaries. TCP_CORK can pack data more effectively, because it has direct access to the TCP/IP layer. [..]

If you do use an application buffering and streaming mechanism (as does Apache), I highly recommend applying the TCP_NODELAY socket option which disables Nagle's algorithm. All calls to write() will then result in immediate transfer of data.
networking  tcp  via:nmaurer  performance  ip  tcp_cork  linux  syscalls  writev  tcp_nodelay  nagle  packets 
september 2014 by jm
Nix: The Purely Functional Package Manager
'a powerful package manager for Linux and other Unix systems that makes package management reliable and reproducible. It provides atomic upgrades and rollbacks, side-by-side installation of multiple versions of a package, multi-user package management and easy setup of build environments. '

Basically, this is a third-party open source reimplementation of Amazon's (excellent) internal packaging system, using symlinks to versioned package directories to ensure atomicity and the ability to roll back. This is definitely the *right* way to build packages -- I know what tool I'll be pushing for, next time this question comes up.

See also for a Linux distro built on Nix.
ops  linux  devops  unix  packaging  distros  nix  nixos  atomic  upgrades  rollback  versioning 
september 2014 by jm
Revisiting How We Put Together Linux Systems
Building a running OS out of layered btrfs filesystems. This sounds awesome.
Instantiating a new system or OS container (which is exactly the same in this scheme) just consists of creating a new appropriately named root sub-volume. Completely naturally you can share one vendor OS copy in one specific version with a multitude of container instances.

Everything is double-buffered (or actually, n-fold-buffered), because usr, runtime, framework, app sub-volumes can exist in multiple versions. Of course, by default the execution logic should always pick the newest release of each sub-volume, but it is up to the user keep multiple versions around, and possibly execute older versions, if he desires to do so. In fact, like on ChromeOS this could even be handled automatically: if a system fails to boot with a newer snapshot, the boot loader can automatically revert back to an older version of the OS.

(via Tony Finch)
via:fanf  linux  docker  btrfs  filesystems  unionfs  copy-on-write  os  hacking  unix 
september 2014 by jm
« earlier      
per page:    204080120160

related tags

/dev/full  1-dollar  9.10  16.04  501c3  acacia-technologies  accept  accessibility  adr  aes  airflow  aix  alestic  algorithms  alt-sysadmin-recovery  amazing  amazon  analysis  analytics  animation  apache  apis  appc  appengine  apt  arch-linux  architecture  asr  assembly  asteroids  async  atlantic  atomic  audio  aufs  automation  aws  axel  b+trees  b-trees  background  backlog  backticks  backup  backups  backwards-compatibility  bandwidth  bash  bcm2835  beanstalk  benchmarking  bind  bittorrent  bloat  blockdev  boobs  booting  bounds-checking  boxee  bpf  brendan-gregg  broadcast  broadcom  browser  bsd  btrfs  buffer-cache  bufferbloat  bug-reports  bugs  build  c  c++  c10k  cache  caching  calibre  call-graph  canonical  capture  cassandra  cgi  cgroups  charts  cheap  checklists  checksums  chef  chronyd  circus  cli  clocks  cloud  cloudflare  clustering  clusters  code  code-reviews  coding  collaboration  command-line  commandline  commands  compartmentalisation  compatibility  complexity  concurrency  configuration  congestion-control  constants  containerization  containers  controllers  cool  copy-on-write  coreos  coreutils  cpu  crash-only-software  crime  critiques  criu  cron  css  culture  cups  curl  cve  daemons  dan-luu  dashboards  data-structures  datacenters  dataviz  dbus  debian  debugging  defrag  deploy  deployment  design  desktop  detach  dev  devices  devops  diagnosis  dick-sites  die-gute-fabrik  disk  disks  distros  diy  django  dns  docker  downloading  dpkg  drivers  dropbox  dsl  dstat  dumb-init  dup  duplicity  duply  dvd  dynamic  e-coli  east-texas  ebs  ec2  ec2rl  ecs  edge  egos  email  embedded  embedded-linux  emulation  encryption  enospc  epoll  error-handling  etcd  ethernet  exploits  ext3  ext4  fabrice-bellard  facebook  fail  false-positives  fcron  fedora  ffmpeg  file  files  filesystems  FIN  fincore  firefox  firmware  flexget  floating-point  flock  floss  forking  foss  framework  free-software  freebsd  fs  fsync  ftp  ftrace  fun  funny  fuse  futexes  futex_wait  gadgets  games  gaming  gdb  genetics  genome  gentoo  geolocation  gil-tene  git  github  gltail  gnome  gnu  go  god  google  google-chrome  gossip  gpl  gpu  graphics  graphing  graphite  graphs  grep  grsecurity  hacking  hacks  hang  hangs  haproxy  hardware  haswell  hauppauge  hax  hdmi  hiccups  hid  history  horror  hosting  howto  html  htpc  http  https  huge-pages  hvm  hyperthreading  images  immutable  inept  infrastructure  init  inotify  input  instagram  instrumentation  intel  interactive  interface  io  ioprofile  iostat  iot  ioutil  ip  iptables  ireland  isps  jails  jaunty  java  javascript  jemalloc  job  johann-sebastian-joust  journalling  joystick  jsj  judges  jvm  kde  kernel  key-distribution  key-rotation  keys  keywhiz  kubernetes  lambda  languages  laptop  latency  law  lbs  leap-seconds  legal  lhtable  libc  licensing  likwid  linkedin  linker  linode  lint  linus-torvalds  linux  linuxjournal  listen-backlog  lmax  load  load-balancing  load-testing  lock  locking  locks  logfiles  logging  logs  lojack  lsof  lsx  lsyncd  luks  lwn  lxc  mac  madvise  magic-numbers  mailinator  mandos  mapping  maths  mdm  measurement  mechanical-sympathy  media-center  memory  menus  mesos  metrics  microsoft  mincore  mining  mirroring  mmap  mongrel  mongrel2  monitoring  moreutils  mozilla  mpi  mpstat  msg_peek  multicore  multithreading  mysql  mythtv  nagle  named-pipes  namespaces  nannies  netbsd  netdata  netem  netflix  nethogs  netstat  nettop  netty  network  networking  nginx  nice  nifty  nio  nix  nixos  node.js  nonprofits  novell  nsa  ntp  ntpd  numa  odroid  offline  oh-dear  omgwtfbbq  one-time-passwords  oom  oom-killer  ooms  open-source  openbsd  openelec  opengl  opensource  operating-systems  opie  ops  optimization  os  osdi  oss  osx  otp  overlayfs  oversight  packaging  packet-loss  packets  padraig-brady  page-cache  pagerduty  papers  parallel  parallelism  parsing  patches  patenting  patents  paul-tyma  pc  pedals  pee  peek  percona  perf  performance  perl  persistence  pipes  pki  planets  plex  poison-packets  poke  politics  poll  popen  port-forwarding  posix  presentations  prey  printf  printing  proc  process  process-substitution  processes  procfs  profiling  progress  progress-bar  protocols  provisioning  proxies  ptrace  pv  pvr  pvrusb2  python  qdisc  quality  racing  rackspace  raft  raid  rami-rosen  rant  raspberry-pi  real-time  recognition  record  recovery  red-hat  redhat  redis  reference  release-notes  releases  reliability  remapping  remote  replay  resolvers  restarting  rkt  rocket  rollback  root-cause  router  routers  routing  rpm  rr  rss  RST  rsync  rtt  ruby  runit  rusty-russell  s3  s3ql  sar  sbc  scalability  scale  scaling  scheduler  screensavers  scripting  scripts  scylla  sdd  seastar  secrets  security  selection  selectors  server  servers  service-discovery  setuid  sh  shared-libraries  shared-nothing  shell  shellcode  shutdown  signalling  signals  skey  slashdot  slew  slow-start  snappy  snapshots  sniffer  snooping  sockets  software  solaris  SO_LINGER  so_reuseport  space  spamassassin  spamd  spanning  speech  speech-recognition  sponge  sportsfriends  square  ss  ssd  ssh  sshd  ssl  static-analysis  statistics  stats  steering-wheel  stepping  storage  strace  sundtek  sup  supervisord  surveillance  svctm  swap  swap-insanity  swapping  swpats  sync  synchronization  sysadmin  syscalls  sysctl  sysctls  sysdig  sysstat  system  system-v  systemd  tail  tails  talks  tax  tc  tcp  tcp-ip  tcpdump  tcprstat  tcp_cork  tcp_nodelay  tdd  terminal  terrorism  testing  theft  thp  thread-priorities  threading  threads  throttling  time  time-synchronization  time-wait  timeouts  timer-wheels  timers  timing-wheels  tips  tmux  tomato  tools  top  tor  toread  torrents  tq  trace  tracedump  tracing  tracking  transactional-updates  transcriptional-regulatory-network  transparent-huge-pages  triangulation  troubleshooting  ts  tun  tuner  tuners  tuning  tunneling  tunnelling  turing-complete  tutorials  tv  ubuntu  ubuntu-core  udev  udp  ui  uk  uniloc-usa  unionfs  unity  unix  upgrades  upstart  uptime  urban-airship  us  usb  use  uselessd  uspto  vagrant  valgrind  vector  versioning  vertx  veth  via:adulau  via:bos  via:bradfitz  via:chughes  via:conoro  via:dnorrie  via:donncha  via:elliot  via:elliottucker  via:fanf  via:hackernews  via:hn  via:igrigorik  via:jacob  via:johnke  via:jzawodny  via:kellabyte  via:kevin-lyda  via:kragen  via:lusis  via:markkenny  via:nelson  via:nmaurer  via:obfuscurity  via:oisin  via:petermblair  via:pixelbeat  via:popey  via:reddit  via:scanlan  video  vipe  virtualenv  virtualization  visualization  vm  vms  vmtouch  vpn  vps  web  web-apps  wget  wifi  windows  wireless  writev  x11  x86  xargs  xbmc  xbox360  xboxdrv  xen  xeon  xfs  xkeyscore  xscreensaver  xu4  yelp  yorba-foundation  youtube  zed-shaw  zedshaw  zfs  zookeeper  zrun 

Copy this bookmark: