jm + networking   152

consistent hashing with bounded loads
'an algorithm that combined consistent hashing with an upper limit on any one server’s load, relative to the average load of the whole pool.'

Lovely blog post from Vimeo's eng blog on a new variation on consistent hashing -- incorporating a concept of overload-avoidance -- and adding it to HAProxy and using it in production in Vimeo. All sounds pretty nifty! (via Toby DiPasquale)
via:codeslinger  algorithms  networking  performance  haproxy  consistent-hashing  load-balancing  lbs  vimeo  overload  load 
5 weeks ago by jm
How the coffee-machine took down a factories control room : talesfromtechsupport
A coffee machine was plugged into both a secure network and also connected to the main wifi network, and became a vector for malware to take down the factory's control room. Security is hard
coffee-machines  fail  security  networking  wifi 
7 weeks ago by jm
RIPE Atlas Probes
Interesting! We discussed similar ideas in $prevjob, good to see one hitting production globally.
RIPE Atlas probes form the backbone of the RIPE Atlas infrastructure. Volunteers all over the world host these small hardware devices that actively measure Internet connectivity through ping, traceroute, DNS, SSL/TLS, NTP and HTTP measurements. This data is collected and aggregated by the RIPE NCC, which makes the data publicly available. Network operators, engineers, researchers and even home users have used this data for a wide range of purposes, from investigating network outages to DNS anycasting to testing IPv6 connectivity.

Anyone can apply to host a RIPE Atlas probe. If your application is successful (based on your location), we will ship you a probe free of charge. Hosts simply need to plug their probe into their home (or other) network.

Probes are USB-powered and are connected to an Ethernet port on the host’s router or switch. They then automatically and continuously perform active measurements about the Internet’s connectivity, and this data is sent to the RIPE NCC, where it is aggregated and made publicly available. We also use this data to create several Internet maps and data visualisations. [....]

The hardware of the first and second generation probes is a Lantronix XPort Pro module with custom powering and housing built around it. The third generation probe is a modified TP-Link wireless router (model TL-MR 3020) with a small USB thumb drive in it, but this probe does not support WiFi.

(via irldexter)
via:irldexter  ripe  ncc  probing  active-monitoring  networking  ping  traceroute  dns  testing  http  ipv6  anycast  hardware  devices  isps 
12 weeks ago by jm
V2V and the challenge of cooperating technology
A great deal of effort and attention has gone into a mobile data technology that you may not be aware of. This is "Vehicle to Vehicle" (V2V) communication designed so that cars can send data to other cars. There is special spectrum allocated at 5.9ghz, and a protocol named DSRC, derived from wifi, exists for communications from car-to-car and also between cars and roadside transmitters in the infrastructure, known as V2I.

This effort has been going on for some time, but those involved have had trouble finding a compelling application which users would pay for. Unable to find one, advocates hope that various national governments will mandate V2V radios in cars in the coming years for safety reasons. In December 2016, the U.S. Dept. of Transportation proposed just such a mandate. [....] "Connected Autonomous Vehicles -- Pick 2."
cars  self-driving  autonomous-vehicles  v2v  wireless  connectivity  networking  security 
may 2017 by jm
Instead of containerization, give me strong config & deployment primitives
Reasonable list of things Docker does badly at the moment, and a call to fix them. I still think Docker/rkt are a solid approach, if not 100% there yet though
docker  containers  complaining  whinge  networking  swarm  deployment  architecture  build  packaging 
april 2017 by jm
'FREE WiFi Site Survey Software for MAC OS X & Windows'.
Sadly reviews from pals are that it is 'shite' :(
osx  wifi  network  survey  netspot  networking  ops  dataviz  wireless 
april 2017 by jm
Spotify’s Love/Hate Relationship with DNS
omg somebody at Spotify really really loves DNS. They even store a DHT hash ring in it. whyyyyyyyyyyy
spotify  networking  architecture  dht  insane  scary  dns  unbound  ops 
april 2017 by jm
Spammergate: The Fall of an Empire
Featuring this interesting reactive-block evasion tactic:
In that screenshot, a RCM co-conspirator describes a technique in which the spammer seeks to open as many connections as possible between themselves and a Gmail server. This is done by purposefully configuring your own machine to send response packets extremely slowly, and in a fragmented manner, while constantly requesting more connections.
Then, when the Gmail server is almost ready to give up and drop all connections, the spammer suddenly sends as many emails as possible through the pile of connection tunnels. The receiving side is then overwhelmed with data and will quickly block the sender, but not before processing a large load of emails.

(via Tony Finch)
via:fanf  spam  antispam  gmail  blocklists  packets  tcp  networking 
march 2017 by jm
4 Wi-Fi Tips from Former Apple Wi-Fi Engineer
Good tips: use the same SSID for all radios; deal with congestion with more APs using less power; don't use "Wide" channels on 2.4Ghz; and place antennae perpendicular to each other.
wifi  2.4ghz  5ghz  networking  hardware  macs  apple  tips 
december 2016 by jm
Testing Docker multi-host network performance - Percona Database Performance Blog
wow, Docker Swarm looks like a turkey right now if performance is important. Only "host" gives reasonably perf numbers
docker  networking  performance  ops  benchmarks  testing  swarm  overlay  calico  weave  bridge 
november 2016 by jm
'Jupiter rising: A decade of Clos topologies and centralized control in Google’s datacenter networks'
Love the 'decade of' dig at FB and Amazon -- 'we were doing it first' ;)

Great details on how Google have built out and improved their DC networking. Includes a hint that they now use DCTCP (datacenter-optimized TCP congestion control) on their internal hosts....
datacenter  google  presentation  networks  networking  via:irldexter  ops  sre  clos-networks  fabrics  switching  history  datacenters 
october 2016 by jm
The History of the Irish Internet
This site is a companion effort to the techarchives website, except it is less well-researched, and is primarily a personal view of the development of the Internet in Ireland by your humble author, Niall Murphy.
niallm  internet  ireland  history  networking  heanet  ieunet 
june 2016 by jm
About to leave UPC due to (lack of) port forwarding -
Virgin Media/UPC seem to have silently deployed an IPv6 "carrier-grade NAT" setup called "DS-Lite" -- ie. all customers now get just a routable IPv6 address, and share a small pool of IPv4 NATs. This breaks a multitude of useful services, including UDP IPSec VPNs it seems
udp  vpns  isps  virgin-media  virgin  ireland  ds-lite  ipv6  tunnelling  networking  nat  ipv4 
may 2016 by jm
Open Whisper Systems >> Blog >> Reflections: The ecosystem is moving
Very interesting post on federation vs centralization for new services:
One of the controversial things we did with Signal early on was to build it as an unfederated service. Nothing about any of the protocols we've developed requires centralization; it's entirely possible to build a federated Signal Protocol based messenger, but I no longer believe that it is possible to build a competitive federated messenger at all.
development  encryption  communication  network-effects  federation  signal  ip  protocols  networking  smtp  platforms 
may 2016 by jm
raboof/nethogs: Linux 'net top' tool
NetHogs is a small 'net top' tool. Instead of breaking the traffic down per protocol or per subnet, like most tools do, it groups bandwidth by process.
nethogs  cli  networking  performance  measurement  ops  linux  top 
may 2016 by jm
Amazon S3 Transfer Acceleration
The AWS edge network has points of presence in more than 50 locations. Today, it is used to distribute content via Amazon CloudFront and to provide rapid responses to DNS queries made to Amazon Route 53. With today’s announcement, the edge network also helps to accelerate data transfers in to and out of Amazon S3. It will be of particular benefit to you if you are transferring data across or between continents, have a fast Internet connection, use large objects, or have a lot of content to upload.

You can think of the edge network as a bridge between your upload point (your desktop or your on-premises data center) and the target bucket. After you enable this feature for a bucket (by checking a checkbox in the AWS Management Console), you simply change the bucket’s endpoint to the form No other configuration changes are necessary! After you do this, your TCP connections will be routed to the best AWS edge location based on latency.  Transfer Acceleration will then send your uploads back to S3 over the AWS-managed backbone network using optimized network protocols, persistent connections from edge to origin, fully-open send and receive windows, and so forth.
aws  s3  networking  infrastructure  ops  internet  cdn 
april 2016 by jm
Google Cloud Status
Ouch, multi-region outage:
At 14:50 Pacific Time on April 11th, our engineers removed an unused GCE IP block from our network configuration, and instructed Google’s automated systems to propagate the new configuration across our network. By itself, this sort of change was harmless and had been performed previously without incident. However, on this occasion our network configuration management software detected an inconsistency in the newly supplied configuration. The inconsistency was triggered by a timing quirk in the IP block removal - the IP block had been removed from one configuration file, but this change had not yet propagated to a second configuration file also used in network configuration management. In attempting to resolve this inconsistency the network management software is designed to ‘fail safe’ and revert to its current configuration rather than proceeding with the new configuration. However, in this instance a previously-unseen software bug was triggered, and instead of retaining the previous known good configuration, the management software instead removed all GCE IP blocks from the new configuration and began to push this new, incomplete configuration to the network.

One of our core principles at Google is ‘defense in depth’, and Google’s networking systems have a number of safeguards to prevent them from propagating incorrect or invalid configurations in the event of an upstream failure or bug. These safeguards include a canary step where the configuration is deployed at a single site and that site is verified to still be working correctly, and a progressive rollout which makes changes to only a fraction of sites at a time, so that a novel failure can be caught at an early stage before it becomes widespread. In this event, the canary step correctly identified that the new configuration was unsafe. Crucially however, a second software bug in the management software did not propagate the canary step’s conclusion back to the push process, and thus the push system concluded that the new configuration was valid and began its progressive rollout.
multi-region  outages  google  ops  postmortems  gce  cloud  ip  networking  cascading-failures  bugs 
april 2016 by jm
The revenge of the listening sockets
More adventures in debugging the Linux kernel:
You can't have a very large number of bound TCP sockets and we learned that the hard way. We learned a bit about the Linux networking stack: the fact that LHTABLE is fixed size and is hashed by destination port only. Once again we showed a couple of powerful of System Tap scripts.
ops  linux  networking  tcp  network  lhtable  kernel 
april 2016 by jm
Charity Majors - AWS networking, VPC, environments and you
'VPC is the future and it is awesome, and unless you have some VERY SPECIFIC AND CONVINCING reasons to do otherwise, you should be spinning up a VPC per environment with orchestration and prob doing it from CI on every code commit, almost like it’s just like, you know, code.'
networking  ops  vpc  aws  environments  stacks  terraform 
march 2016 by jm
This is Why People Fear the ‘Internet of Things’
Ugh. This is a security nightmare. Nice work Foscam...
Imagine buying an internet-enabled surveillance camera, network attached storage device, or home automation gizmo, only to find that it secretly and constantly phones home to a vast peer-to-peer (P2P) network run by the Chinese manufacturer of the hardware. Now imagine that the geek gear you bought doesn’t actually let you block this P2P communication without some serious networking expertise or hardware surgery that few users would attempt. This is the nightmare “Internet of Things” (IoT) scenario for any system administrator: The IP cameras that you bought to secure your physical space suddenly turn into a vast cloud network designed to share your pictures and videos far and wide. The best part? It’s all plug-and-play, no configuration necessary!
foscam  cameras  iot  security  networking  p2p 
february 2016 by jm
Seesaw: scalable and robust load balancing from Google
After evaluating a number of platforms, including existing open source projects, we were unable to find one that met all of our needs and decided to set about developing a robust and scalable load balancing platform. The requirements were not exactly complex - we needed the ability to handle traffic for unicast and anycast VIPs, perform load balancing with NAT and DSR (also known as DR), and perform adequate health checks against the backends. Above all we wanted a platform that allowed for ease of management, including automated deployment of configuration changes.

One of the two existing platforms was built upon Linux LVS, which provided the necessary load balancing at the network level. This was known to work successfully and we opted to retain this for the new platform. Several design decisions were made early on in the project — the first of these was to use the Go programming language, since it provided an incredibly powerful way to implement concurrency (goroutines and channels), along with easy interprocess communication (net/rpc). The second was to implement a modular multi-process architecture. The third was to simply abort and terminate a process if we ended up in an unknown state, which would ideally allow for failover and/or self-recovery.
seesaw  load-balancers  google  load-balancing  vips  anycast  nat  lbs  go  ops  networking 
january 2016 by jm
About Microservices, Containers and their Underestimated Impact on Network Performance
shock horror, Docker-SDN layers have terrible performance. Still pretty lousy perf impacts from basic Docker containerization, presumably without "--net=host" (which is apparently vital)
docker  performance  network  containers  sdn  ops  networking  microservices 
january 2016 by jm
VPC NAT gateways : transactional uniqueness at scale
colmmacc introducing the VPC NAT gateway product from AWS, in a guest post on James Hamilton's blog no less!:
you can think of it as a new “even bigger” [NAT] box, but under the hood NAT gateways are different. The connections are managed by a fault-tolerant co-operation of devices in the VPC network fabric. Each new connection is assigned a port in a robust and transactional way, while also being replicated across an extensible set of multiple devices. In other words: the NAT gateway is internally horizontally scalable and resilient.
amazon  ec2  nat  networking  aws  colmmacc 
january 2016 by jm
ImperialViolet - Juniper: recording some Twitter conversations
Adam Langley on the Juniper VPN-snooping security hole:
... if it wasn't the NSA who did this, we have a case where a US gov­ern­ment back­door ef­fort (Dual-EC) laid the ground­work for some­one else to at­tack US in­ter­ests. Cer­tainly this at­tack would be a lot eas­ier given the pres­ence of a back­door-friendly RNG al­ready in place. And I've not even dis­cussed the SSH back­door. [...]
primes  ecc  security  juniper  holes  exploits  dual-ec-drbg  vpn  networking  crypto  prngs 
december 2015 by jm
How both TCP and Ethernet checksums fail
At Twitter, a team had a unusual failure where corrupt data ended up in memcache. The root cause appears to have been a switch that was corrupting packets. Most packets were being dropped and the throughput was much lower than normal, but some were still making it through. The hypothesis is that occasionally the corrupt packets had valid TCP and Ethernet checksums. One "lucky" packet stored corrupt data in memcache. Even after the switch was replaced, the errors continued until the cache was cleared.

YA occurrence of this bug. When it happens, it tends to _really_ screw things up, because it's so rare -- we had monitoring for this in Amazon, and when it occurred, it overwhelmingly occurred due to host-level kernel/libc/RAM issues rather than stuff in the network. Amazon design principles were to add app-level checksumming throughout, which of course catches the lot.
networking  tcp  ip  twitter  ethernet  checksums  packets  memcached 
october 2015 by jm
uses the techniques invented by the authors of Paris-traceroute to enumerate the paths of ECMP flow-based load balancing, but introduces a new technique for NAT detection.

handy. written by AWS SDE Andrea Barberio!
internet  tracing  traceroute  networking  ecmp  nat  ip 
october 2015 by jm
a specialized packet sniffer designed for displaying and logging HTTP traffic. It is not intended to perform analysis itself, but to capture, parse, and log the traffic for later analysis. It can be run in real-time displaying the traffic as it is parsed, or as a daemon process that logs to an output file. It is written to be as lightweight and flexible as possible, so that it can be easily adaptable to different applications.

via Eoin Brazil
via:eoinbrazil  httpry  http  networking  tools  ops  testing  tcpdump  tracing 
september 2015 by jm
Nelson recommends Ubiquiti
'The key thing about Ubiquiti gear is the high quality radios and antennas. It just seems much more reliable than most consumer WiFi gear. Their airOS firmware is good too, it’s a bit complicated to set up but very capable and flexible. And in addition to normal 802.11n or 802.11ac they also have an optional proprietary TDMA protocol called airMax that’s designed for serving several long haul links from a single basestation. They’re mostly marketing to business customers but the equipment is sold retail and well documented for ordinary nerds to figure out.'
ubiquiti  wifi  wireless  802.11  via:nelson  ethernet  networking  prosumer  hardware  wan 
september 2015 by jm
httpbin(1): HTTP Client Testing Service
Testing an HTTP Library can become difficult sometimes. RequestBin is fantastic for testing POST requests, but doesn't let you control the response. This exists to cover all kinds of HTTP scenarios. Additional endpoints are being considered.
http  httpbin  networking  testing  web  coding  hacks 
september 2015 by jm
The Netflix Test Video
Netflix' official test video -- contains various scenarios which exercise frequent tricky edge cases in video compression and playback; A/V sync, shades of black, running water, etc.
networking  netflix  streaming  video  compression  tests 
august 2015 by jm
Apple now biases towards IPv6 with a 25ms delay on connections
Interestingly, they claim that IPv6 tends to be more reliable and has lower latency now:
Based on our testing, this makes our Happy Eyeballs implementation go from roughly 50/50 IPv4/IPv6 in iOS 8 and Yosemite to ~99% IPv6 in iOS 9 and El Capitan betas. While our previous implementation from four years ago was designed to select the connection with lowest latency no matter what, we agree that the Internet has changed since then and reports indicate that biasing towards IPv6 is now beneficial for our customers: IPv6 is now mainstream instead of being an exception, there are less broken IPv6 tunnels, IPv4 carrier-grade NATs are increasing in numbers, and throughput may even be better on average over IPv6.
apple  ipv6  ip  tcp  networking  internet  happy-eyeballs  ios  osx 
july 2015 by jm
Hystrix-style Circuit Breakers and Bulkheads for Ruby/Rails, from Shopify
circuit-breaker  bulkhead  patterns  architecture  microservices  shopify  rails  ruby  networking  reliability  fallback  fail-fast 
june 2015 by jm
How to receive a million packets per second on Linux

To sum up, if you want a perfect performance you need to:
Ensure traffic is distributed evenly across many RX queues and SO_REUSEPORT processes. In practice, the load usually is well distributed as long as there are a large number of connections (or flows).
You need to have enough spare CPU capacity to actually pick up the packets from the kernel.
To make the things harder, both RX queues and receiver processes should be on a single NUMA node.
linux  networking  performance  cloudflare  packets  numa  so_reuseport  sockets  udp 
june 2015 by jm
Google Cloud Platform Blog: A look inside Google’s Data Center Networks
We used three key principles in designing our datacenter networks:
We arrange our network around a Clos topology, a network configuration where a collection of smaller (cheaper) switches are arranged to provide the properties of a much larger logical switch.
We use a centralized software control stack to manage thousands of switches within the data center, making them effectively act as one large fabric.
We build our own software and hardware using silicon from vendors, relying less on standard Internet protocols and more on custom protocols tailored to the data center.
clos-networks  google  data-centers  networking  sdn  gcp  ops 
june 2015 by jm
Please stop calling databases CP or AP
In his excellent blog post [...] Jeff Hodges recommends that you use the CAP theorem to critique systems. A lot of people have taken that advice to heart, describing their systems as “CP” (consistent but not available under network partitions), “AP” (available but not consistent under network partitions), or sometimes “CA” (meaning “I still haven’t read Coda’s post from almost 5 years ago”).

I agree with all of Jeff’s other points, but with regard to the CAP theorem, I must disagree. The CAP theorem is too simplistic and too widely misunderstood to be of much use for characterizing systems. Therefore I ask that we retire all references to the CAP theorem, stop talking about the CAP theorem, and put the poor thing to rest. Instead, we should use more precise terminology to reason about our trade-offs.
cap  databases  storage  distcomp  ca  ap  cp  zookeeper  consistency  reliability  networking 
may 2015 by jm
David P. Reed on the history of UDP
'UDP was actually “designed” in 30 minutes on a blackboard when we decided pull the original TCP protocol apart into TCP and IP, and created UDP on top of IP as an alternative for multiplexing and demultiplexing IP datagrams inside a host among the various host processes or tasks. But it was a placeholder that enabled all the non-virtual-circuit protocols since then to be invented, including encapsulation, RTP, DNS, …, without having to negotiate for permission either to define a new protocol or to extend TCP by adding “features”.'
udp  ip  tcp  networking  internet  dpr  history  protocols 
april 2015 by jm
Google Online Security Blog: A Javascript-based DDoS Attack [the Greatfire DDoS] as seen by Safe Browsing
We hope this report helps to round out the overall facts known about this attack. It also demonstrates that collectively there is a lot of visibility into what happens on the web. At the HTTP level seen by Safe Browsing, we cannot confidently attribute this attack to anyone. However, it makes it clear that hiding such attacks from detailed analysis after the fact is difficult.

Had the entire web already moved to encrypted traffic via TLS, such an injection attack would not have been possible. This provides further motivation for transitioning the web to encrypted and integrity-protected communication. Unfortunately, defending against such an attack is not easy for website operators. In this case, the attack Javascript requests web resources sequentially and slowing down responses might have helped with reducing the overall attack traffic. Another hope is that the external visibility of this attack will serve as a deterrent in the future.

Via Nelson.
google  security  via:nelson  ddos  javascript  tls  ssl  safe-browsing  networking  china  greatfire 
april 2015 by jm
Kubernetes compared to Borg
'Here are four Kubernetes features that came from our experiences with Borg.'
google  ops  kubernetes  borg  containers  docker  networking 
april 2015 by jm
Boeing 787 Dreamliner jets, as well as Airbus A350 and A380 aircraft, have Wi-Fi passenger networks that use the same network as the avionics systems of the planes

What the fucking fuck. Air-gap or gtfo
air-gap  security  planes  boeing  a380  a350  dreamliner  networking  firewalls  avionics 
april 2015 by jm
Yelp Product & Engineering Blog | True Zero Downtime HAProxy Reloads
Using tc and qdisc to delay SYNs while haproxy restarts. Definitely feels like on-host NAT between 2 haproxy processes would be cleaner and easier though!
linux  networking  hacks  yelp  haproxy  uptime  reliability  tcp  tc  qdisc  ops 
april 2015 by jm
How I doubled my Internet speed with OpenWRT
File under "silly network hacks":
Comcast has an initiative called Xfinity WiFi. When you rent a cable modem/router combo from Comcast (as one of my nearby neighbors apparently does), in addition to broadcasting your own WiFi network, it is kind enough to also broadcast “xfinitywifi,” a second “hotspot” network metered separately from your own.

By using his Buffalo WZR-HP-AG300H router's extra radio, he can load-balance across both his own paid-for connection, and the XFinity WiFi free one. ;)
comcast  diy  networking  openwrt  routing  home-network  hacks  xfinity-wifi  buffalo 
march 2015 by jm
You Cannot Have Exactly-Once Delivery
Cut out and keep:
Within the context of a distributed system, you cannot have exactly-once message delivery. Web browser and server? Distributed. Server and database? Distributed. Server and message queue? Distributed. You cannot have exactly-once delivery semantics in any of these situations.
distributed  distcomp  exactly-once-delivery  networking  outages  network-partitions  byzantine-generals  reference 
march 2015 by jm
Exponential Backoff And Jitter
Great go-to explainer blog post for this key distributed-systems reliability concept, from the always-solid Marc Brooker
marc-brooker  distsys  networking  backoff  exponential  jitter  retrying  retries  reliability  occ 
march 2015 by jm
2015-02-19 GCE outage
40 minutes of multi-zone network outage for majority of instances.

'The internal software system which programs GCE’s virtual network for VM
egress traffic stopped issuing updated routing information. The cause of
this interruption is still under active investigation. Cached route
information provided a defense in depth against missing updates, but GCE VM
egress traffic started to be dropped as the cached routes expired.'

I wonder if Google Pimms fired the alarms for this ;)
google  outages  gce  networking  routing  pimms  multi-az  cloud 
february 2015 by jm
NA Server Roadmap Update: PoPs, Peering, and the North Bridge
League of Legends has set up private network links to a variety of major US ISPs to avoid internet weather (via Nelson)
via:nelson  peering  games  networks  internet  ops  networking 
january 2015 by jm
Nice wrapper for 'tc' and 'netem', for network latency/packet loss emulation
networking  testing  linux  tc  netem  latency  packet-loss  iptables 
january 2015 by jm
How TCP backlog works in Linux
good description of the process
ip  linux  tcp  networking  backlog  ops 
january 2015 by jm
Use sshuttle to Keep Safe on Insecure Wi-Fi
I keep forgetting about sshuttle. It's by far the easiest way to get a cheapo IP-over-SSH VPN working with an OSX client, particularly since it's in homebrew
ssh  vpn  sshuttle  tunnelling  security  ip  wifi  networking  osx  homebrew 
december 2014 by jm
Solving the Mystery of Link Imbalance: A Metastable Failure State at Scale | Engineering Blog | Facebook Code
Excellent real-world war story from Facebook -- a long-running mystery bug was eventually revealed to be a combination of edge-case behaviours across all the layers of the networking stack, from L2 link aggregation at the agg-router level, up to the L7 behaviour of the MySQL client connection pool.
Facebook collocates many of a user’s nodes and edges in the social graph. That means that when somebody logs in after a while and their data isn’t in the cache, we might suddenly perform 50 or 100 database queries to a single database to load their data. This starts a race among those queries. The queries that go over a congested link will lose the race reliably, even if only by a few milliseconds. That loss makes them the most recently used when they are put back in the pool. The effect is that during a query burst we stack the deck against ourselves, putting all of the congested connections at the top of the deck.
architecture  debugging  devops  facebook  layer-7  mysql  connection-pooling  aggregation  networking  tcp-stack 
november 2014 by jm
TCP incast
a catastrophic TCP throughput collapse that occurs as the number of storage servers sending data to a client increases past the ability of an Ethernet switch to buffer packets. In a clustered file system, for example, a client application requests a data block striped across several storage servers, issuing the next data block request only when all servers have responded with their portion (Figure 1). This synchronized request workload can result in packets overfilling the buffers on the client's port on the switch, resulting in many losses. Under severe packet loss, TCP can experience a timeout that lasts a minimum of 200ms, determined by the TCP minimum retransmission timeout (RTOmin).
incast  networking  performance  tcp  bandwidth  buffering  switch  ethernet  capacity 
november 2014 by jm
Facebook's datacenter fabric
FB goes public with its take on the Clos network-based datacenter network architecture
networking  scaling  facebook  clos-networks  fabrics  datacenters  network-architecture 
november 2014 by jm
Eircom have run out of network capacity
This is due in part to huge growth in the data volumes and data traffic that is transported over our network, which has exceeded our forecasted growth. We are making a number of improvements to our international connectivity which will add significant capacity and this work will be completed in the next two or three weeks.

Guess this is what happens when Amazon poach your IP network engineers. doh!

More seriously though, if you're marketing eFibre heavily, shouldn't you be investing in the upstream capacity to go with it?
eircom  fail  internet  capacity  forecasting  networking 
november 2014 by jm
Is Docker ready for production? Feedbacks of a 2 weeks hands on
I have to agree with this assessment -- there are a lot of loose ends still for production use of Docker in a SOA stack environment:
From my point of view, Docker is probably the best thing I’ve seen in ages to automate a build. It allows to pre build and reuse shared dependencies, ensuring they’re up to date and reducing your build time. It avoids you to either pollute your Jenkins environment or boot a costly and slow Virtualbox virtual machine using Vagrant. But I don’t feel like it’s production ready in a complex environment, because it adds too much complexity. And I’m not even sure that’s what it was designed for.
docker  complexity  devops  ops  production  deployment  soa  web-services  provisioning  networking  logging 
october 2014 by jm
Chris Baus: TCP_CORK: More than you ever wanted to know
Even with buffered streams the application must be able to instruct the OS to forward all pending data when the stream has been flushed for optimal performance. The application does not know where packet boundaries reside, hence buffer flushes might not align on packet boundaries. TCP_CORK can pack data more effectively, because it has direct access to the TCP/IP layer. [..]

If you do use an application buffering and streaming mechanism (as does Apache), I highly recommend applying the TCP_NODELAY socket option which disables Nagle's algorithm. All calls to write() will then result in immediate transfer of data.
networking  tcp  via:nmaurer  performance  ip  tcp_cork  linux  syscalls  writev  tcp_nodelay  nagle  packets 
september 2014 by jm
AWS Speed Test: What are the Fastest EC2 and S3 Regions?
My god, this test is awful -- this is how NOT to test networked infrastructure. (1) testing from a single EC2 instance in each region; (2) uploading to a single test bucket for each test; (3) results don't include min/max or percentiles, just an averaged measurement for each test. FAIL
fail  testing  networking  performance  ec2  aws  s3  internet 
august 2014 by jm
The Network is Reliable - ACM Queue
Peter Bailis and Kyle Kingsbury accumulate a comprehensive, informal survey of real-world network failures observed in production. I remember that April 2011 EBS outage...
ec2  aws  networking  outages  partitions  jepsen  pbailis  aphyr  acm-queue  acm  survey  ops 
july 2014 by jm
a client side IPC library that is battle-tested in cloud. It provides the following features:

Load balancing;
Fault tolerance;
Multiple protocol (HTTP, TCP, UDP) support in an asynchronous and reactive model;
Caching and batching.

I like the integration of Eureka and Hystrix in particular, although I would really like to read more about Eureka's approach to availability during network partitions and CAP. has some interesting discussion on the topic. It actually sounds like the Eureka approach is more correct than using ZK: 'Eureka is available. ZooKeeper, while tolerant against single node failures, doesn't react well to long partitioning events. For us, it's vastly more important that we maintain an available registry than a necessary consistent registry. If us-east-1d sees 23 nodes, and us-east-1c sees 22 nodes for a little bit, that's OK with us.'

See also which corroborates this:

I went into one of the instances and quickly did an iptables DROP on all packets coming from the other two instances. This would simulate an availability zone continuing to function, but that zone losing network connectivity to the other availability zones. What I saw was that the two other instances noticed that the first server “going away”, but they continued to function as they still saw a majority (66%). More interestingly the first instance noticed the other two servers “going away” dropping the ensemble availability to 33%. This caused the first server to stop serving requests to clients (not only writes, but also reads). [...]

To me this seems like a concern, as network partitions should be considered an event that should be survived. In this case (with this specific configuration of zookeeper) no new clients in that availability zone would be able to register themselves with consumers within the same availability zone. Adding more zookeeper instances to the ensemble wouldn’t help considering a balanced deployment as in this case the availability would always be majority (66%) and non-majority (33%).
netflix  ribbon  availability  libraries  java  hystrix  eureka  aws  ec2  load-balancing  networking  http  tcp  architecture  clients  ipc 
july 2014 by jm
Facebook introduce “Wedge” and “FBOSS"
a new top-of-rack network switch, code-named “Wedge,” and a new Linux-based operating system for that switch, code-named “FBOSS.” These projects break down the hardware and software components of the network stack even further, to provide a new level of visibility, automation, and control in the operation of the network. By combining the hardware and software modules together in new ways, “Wedge” and “FBOSS” depart from current networking design paradigms to leverage our experience in operating hundreds of thousands of servers in our data centers. In other words, our goal with these projects was to make our network look, feel, and operate more like the OCP servers we've already deployed, both in terms of hardware and software.

Sayonara, Cisco, and good riddance.
cisco  juniper  wedge  fboss  facebook  tor  switches  racks  networking  datacenter  routers 
june 2014 by jm
Shutterbits replacing hardware load balancers with local BGP daemons and anycast
Interesting approach. Potentially risky, though -- heavy use of anycast on a large-scale datacenter network could increase the scale of the OSPF graph, which scales exponentially. This can have major side effects on OSPF reconvergence time, which creates an interesting class of network outage in the event of OSPF flapping.

Having said that, an active/passive failover LB pair will already announce a single anycast virtual IP anyway, so, assuming there are a similar number of anycast IPs in the end, it may not have any negative side effects.

There's also the inherent limitation noted in the second-to-last paragraph; 'It comes down to what your hardware router can handle for ECMP. I know a Juniper MX240 can handle 16 next-hops, and have heard rumors that a software update will bump this to 64, but again this is something to keep in mind'. Taking a leaf from the LB design, and using BGP to load-balance across a smaller set of haproxy instances, would seem like a good approach to scale up.
scalability  networking  performance  load-balancing  bgp  exabgp  ospf  anycast  routing  datacenters  scaling  vips  juniper  haproxy  shutterstock 
may 2014 by jm
a single application IP packet sniffer that captures all TCP and UDP packets of a single Linux process. It consists of the following elements:

* ptrace monitor - tracks bind(), connect() and sendto() syscalls and extracts local port numbers that the traced application uses;
* pcap sniffer - using information from the previous module, it captures IP packets on an AF_PACKET socket (with an appropriate BPF filter attached);
* garbage collector - periodically reads /proc/net/{tcp,udp} files in order to detect the sockets that the application no longer uses.

As the output, tracedump generates a PCAP file with SLL-encapsulated IP packets - readable by eg. Wireshark. This file can be later used for detailed analysis of the networking operations made by the application. For instance, it might be useful for IP traffic classification systems.
debugging  networking  linux  strace  ptrace  tracedump  tracing  tcp  udp  sniffer  ip  tcpdump 
may 2014 by jm
'Monitoring and detecting causes of failures of network paths', US patent 8,661,295 (B1)
The first software patent in my name -- couldn't avoid it forever :(
Systems and methods are provided for monitoring and detecting causes of failures of network paths. The system collects performance information from a plurality of nodes and links in a network, aggregates the collected performance information across paths in the network, processes the aggregated performance information for detecting failures on the paths, analyzes each of the detected failures to determine at least one root cause, and initiates a remedial workflow for the at least one root cause determined. In some aspects, processing the aggregated information may include performing a statistical regression analysis or otherwise solving a set of equations for the performance indications on each of a plurality of paths. In another aspect, the system may also include an interface which makes available for display one or more of the network topology, the collected and aggregated performance information, and indications of the detected failures in the topology.

The patent describes an early version of Pimms, the network failure detection and remediation system we built for Amazon.
amazon  pimms  swpats  patents  networking  ospf  autoremediation  outage-detection 
may 2014 by jm
"Replicated abstract data types: Building blocks for collaborative applications"
cited at as 'one of my favorite papers on CRDTs and provides practical pseudocode for learning how to implement CRDTs yourself', in a discussion on cemerick's "Distributed Systems and the End of the API":
distcomp  networking  distributed  crdts  algorithms  text  data-structures  cap 
may 2014 by jm
Observations of an Internet Middleman
That leaves the remaining six [consumer ISPs peering with Level3] with congestion on almost all of the interconnect ports between us. Congestion that is permanent, has been in place for well over a year and where our peer refuses to augment capacity. They are deliberately harming the service they deliver to their paying customers. They are not allowing us to fulfil the requests their customers make for content. Five of those congested peers are in the United States and one is in Europe. There are none in any other part of the world. All six are large Broadband consumer networks with a dominant or exclusive market share in their local market. In countries or markets where consumers have multiple Broadband choices (like the UK) there are no congested peers.

Amazing that L3 are happy to publish this -- that's where big monopoly ISPs have led their industry.
net-neutrality  networking  internet  level3  congestion  isps  us-politics 
may 2014 by jm
Flood IO Offering Network Emulation
Performance-testing-as-a-service company Flood.IO now offering emulation of various crappy end-user networks: GSM, DSL, GPRS, 3G, 4G etc. Great idea.  performance  networking  internet  load-testing  testing  jmeter  gatling  tests  gsm  3g  mobile  simulation 
april 2014 by jm
Uplink Latency of WiFi and 4G Networks
It's high. Wifi in particular shows high variability and long latency tails
wifi  3g  4g  mobile  networking  internet  latency  tcp 
april 2014 by jm
« earlier      
per page:    204080120160

related tags

2.4ghz  2fa  3g  4g  5ghz  802.11  802.11b  802.11g  802.11n  a350  a380  ack  acm  acm-queue  active-monitoring  addresses  addressing  aggregation  air-gap  air-gaps  alb  algorithms  amazon  antispam  anycast  ap  aphyr  apple  apps  architecture  arpanet  async  authorization  automation  autonomous-vehicles  autoremediation  availability  avionics  aws  az  backdoors  backlog  backoff  bandwidth  bbs  bdd  benchmarks  bert  bgp  billing  bit-flips  blocking  blocklists  bobtail  boeing  book  borg  bridge  bridging  browser  bruce-schneier  buffalo  bufferbloat  buffering  buffers  bugs  build  bulkhead  byzantine-generals  ca  caches  caching  calico  cam  cameras  cap  capacity  cars  cascading-failures  cdn  censorship  chat  checkip  checklists  checksums  china  circuit-breaker  cisco  claire-perry  cli  clients  clos-networks  cloud  cloudflare  codel  coding  coffee-machines  cogent  colmmacc  colons  comcast  command-line  communication  community  complaining  complexity  compression  conferences  congestion  connection-pooling  connectivity  consistency  consistent-hashing  containers  contractors  corruption  cosmic-rays  cp  crdts  cross-region  crypto  cs  css  curl  d-link  data-centers  data-retention  data-structures  databases  datacenter  datacenters  dataviz  ddos  debugging  deployment  design  design-patterns  development  devices  devops  dht  distcomp  distributed  distributed-systems  distsys  diy  dns  docker  dpi  dpr  dreamliner  drm  ds-lite  dual-ec-drbg  ebooks  ec2  ecc  ecmp  eircom  elb  embedded  emulation  encryption  enstratius  environments  epoll  eric-brandwine  erlang  ethernet  eureka  exabgp  exactly-once-delivery  exploits  exponential  exponential-backoff  fabrics  facebook  fail  fail-fast  failover  failure  fallback  false-positives  fat-tree  fault-tolerance  fbi  fboss  federation  filemap  files  filesystems  filtering  FIN  firewall  firewalls  forecasting  foscam  fpgas  fq_codel  fraud  freebsd  funny  games  gaming  gatling  gc  gc-scout  gce  gchq  gcp  gfs  github  gmail  go  google  greatfire  grid  gsm  hackernews  hacks  hairy  han  handshake  hangs  happy-eyeballs  haproxy  hardware  heanet  heroku  hft  history  hmm  holes  home  home-network  homebrew  homeplug  honeypot  hotels  hsdpa  html5  http  http2  httpbin  httpry  https  hulu  hungary  hvac  hystrix  ieunet  ilya-grigorik  incast  infrastructure  insane  instrumentation  inter-region  internet  io  ios  ios7  iot  ip  ipc  ipsec  iptables  ipv4  ipv6  irc  ireland  isps  james-hamilton  java  javascript  jay-kreps  jeff-dean  jepsen  jim-gettys  jitter  jmeter  jobs  js  jsconf  juniper  kafka  kernel  kubernetes  latency  layer-7  lbs  least-conns  level3  lhtable  libraries  linkedin  linux  load  load-balancers  load-balancing  load-testing  logging  long-tail  low-latency  lte  lua  luajit  lvm  machine-learning  macs  malware  map-reduce  marc-brooker  markets  measurement  measurements  memcached  mesh  mesh-networks  michael-nygard  microservices  mirc  mit  mmorpgs  mmos  mobile  mongrel2  monitoring  mp3  mplayer  mptcp  multi-az  multi-region  music  mysql  nagle  nat  ncc  net-neutrality  netbsd  netcat  netem  netflix  nethogs  netspot  netty  network  network-architecture  network-effects  network-neutrality  network-partitions  networking  networks  niallm  nlb  nsa  numa  occ  open-rights-group  open-source  openbsd  opendns  opensource  openwrt  ops  orcas-island  organisation  ospf  osx  outage-detection  outages  overblocking  overlay  overload  p2p  packaging  packet  packet-loss  packets  papers  parallel  partition  partitions  patents  patterns  payment  pbailis  pci  pdf  peering  percentiles  percona  performance  perl  phones  pimms  ping  planes  planex  platforms  plcs  politics  poll  porn  port-forwarding  postmortems  power  powerline-networking  presentation  presentations  primes  privacy  prngs  probing  production  prosumer  protocol  protocols  provisioning  proxies  proxying  ptrace  puppet  pwnat  python  qcon  qdisc  qualcomm  queueing  racks  rails  rap-genius  rdns  redis  reference  reliability  remycc  replay  replication  rest  retransmit  retries  retrying  reversing  riak  ribbon  ripe  round-robin  routers  routing  rpc  RST  rsync  rtt  ruby  s3  sack  safe-browsing  samy-kamkar  scalability  scaling  scary  scp  sdn  security  seesaw  self-driving  serialization  server-side  servers  shell  shopify  shutdown  shutterstock  signal  simulation  skype  slas  slides  smb  smtp  snabb-switch  snapshots  sniffer  snooping  snowden  soa  sockets  software  SO_LINGER  so_reuseport  spam  speed  spotify  sre  ss  sse  ssh  sshuttle  ssl  stacks  staging  star-wars  state  statistics  stock-markets  stock-trading  storage  strace  streaming  stud  stun  surveillance  survey  swarm  switch  switches  switching  swpats  sysadmin  syscalls  sysctls  systems  tactical-chat  talk-talk  talks  tax  tc  tcp  tcp-ip  tcp-stack  tcpdump  tcprstat  tcp_cork  tcp_nodelay  tee  telcos  telecom  telnet  terraform  testing  tests  text  think-of-the-children  threadpools  thrift  time-wait  timeouts  tips  tls  tools  top  tor  tor-bridges  tracedump  traceroute  tracing  traversal  tun  tuning  tunneling  tunnelling  turkey  tutorials  tv  twilio  twitter  ubiquiti  udp  uk  unbound  unit-tests  unix  upc  uptime  us-military  us-politics  usability  v2v  verizon  via-dehora  via:adulau  via:bela  via:codeslinger  via:dehora  via:eoin-brazil  via:eoinbrazil  via:fanf  via:hn  via:ip  via:irldexter  via:jg  via:jk  via:joe-feise  via:johnharrington  via:kragen  via:lusis  via:nelson  via:nmaurer  via:scanlan  via:waxy  video  vimeo  vips  virgin  virgin-media  virtualization  visualization  vlans  vpc  vpn  vpns  vrfs  wa  wan  war  weave  web  web-services  webrtc  websockets  wedge  whats-my-ip  whinge  wifi  wii  windows  wireless  wlan  wordpress  wow  writev  xen  xfinity-wifi  xhr  yelp  youtube  zedshaw  zookeeper 

Copy this bookmark: