jm + speed   25

Gil Tene on benchmarking
'I would strongly encourage you to avoid repeating the mistakes of testing methodologies that focus entirely on max achievable throughput and then report some (usually bogus) latency stats at those max throughout modes. The techempower numbers are a classic example of this in play, and while they do provide some basis for comparing a small aspect of behavior (what I call the "how fast can this thing drive off a cliff" comparison, or "pedal to the metal" testing), those results are not very useful for comparing load carrying capacities for anything that actually needs to maintain some form of responsiveness SLA or latency spectrum requirements.'

Some excellent advice here on how to measure and represent stack performance.

Also: 'DON'T use or report standard deviation for latency. Ever. Except if you mean it as a joke.'
performance  benchmarking  testing  speed  gil-tene  latency  measurement  hdrhistogram  load-testing  load 
april 2016 by jm
ustwo Reimagines the In-Car Cluster
Designers behind the cult mobile game, Monument Valley, take on the legacy-bound in-car UI
ux  ui  cars  driving  safety  ustwo  monument-valley  speed 
september 2015 by jm
A causal profiler for C++.
Causal profiling is a novel technique to measure optimization potential. This measurement matches developers' assumptions about profilers: that optimizing highly-ranked code will have the greatest impact on performance. Causal profiling measures optimization potential for serial, parallel, and asynchronous programs without instrumentation of special handling for library calls and concurrency primitives. Instead, a causal profiler uses performance experiments to predict the effect of optimizations. This allows the profiler to establish causality: "optimizing function X will have effect Y," exactly the measurement developers had assumed they were getting all along.

I can see this being a good technique to stochastically discover race conditions and concurrency bugs, too.
optimization  c++  performance  coding  profiling  speed  causal-profilers 
december 2014 by jm
A nice Lua/C++ implementation of Aho-Corasick for fast string matching against multiple patterns (via JGC). This uses an interesting technique to get better performance by compacting the data structure into a single buffer, to avoid following pointers all over RAM and busting the cache.
optimization  speed  performance  aho-corasick  tries  string-matching  strings  algorithms  lua  c++  via:jgc 
august 2014 by jm
Moving Big Data into the Cloud with Tsunami UDP - AWS Big Data Blog
Pretty serious speedup. 81 MB/sec with Tsunami UDP, compared to 9 MB/sec with plain old scp. Probably kills internet performance for everyone else though!
tsunami-udp  udp  scp  copying  transfers  internet  long-distance  performance  speed 
august 2014 by jm
Google Fonts recently switched to using Zopfli
Google Fonts recently switched to using new Zopfli compression algorithm:  the fonts are ~6% smaller on average, and in some cases up to 15% smaller! [...]
What's Zopfli? It's an algorithm that was developed by the compression team at Google that delivers ~3~8% bytesize improvement when compared to gzip with maximum compression. This byte savings comes at a cost of much higher encoding cost, but the good news is, fonts are static files and decompression speed is exactly the same. Google Fonts pays the compression cost once and every clients gets the benefit of smaller download. If you’re curious to learn more about Zopfli:
zopfli  compression  gzip  fonts  google  speed  optimization 
january 2014 by jm
LatencyUtils by giltene
The LatencyUtils package includes useful utilities for tracking latencies. Especially in common in-process recording scenarios, which can exhibit significant coordinated omission sensitivity without proper handling.
gil-tene  metrics  java  measurement  coordinated-omission  latency  speed  service-metrics  open-source 
november 2013 by jm
Groundbreaking Results for High Performance Trading with FPGA and x86 Technologies
The enhancement in performance was achieved by providing a fast-path where trades are executed directly by the FPGA under the control of trigger rules processed by the x86 based functions. The latency is reduced further by two additional techniques in the FPGA – inline parsing and pre-emption. As market data enters the switch, the Ethernet frame is parsed serially as bits arrive, allowing partial information to be extracted and matched before the whole frame has been received. Then, instead of waiting until the end of a potential triggering input packet, pre-emption is used to start sending the overhead part of a response which contains the Ethernet, IP, TCP and FIX headers. This allows completion of an outgoing order almost immediately after the end of the triggering market feed packet.

Insane stuff. (Via Martin Thompson)
via:martin-thompson  insane  speed  low-latency  fpga  fast-path  trading  stock-markets  performance  optimization  ethernet 
october 2013 by jm
SSL/TLS overhead
'The TLS handshake has multiple variations, but let’s pick the most common one – anonymous client and authenticated server (the connections browsers use most of the time).' Works out to 4 packets, in addition to the TCP handshake's 3, and about 6.5k bytes on average.
network  tls  ssl  performance  latency  speed  networking  internet  security  packets  tcp  handshake 
june 2013 by jm
Big Memory, Part 4
good microbenchmarking of a bunch of Java collections; Trove, fastutil, PCJ, mahout-collections, hppc
java  collections  benchmarks  performance  speed  coding  data-structures  optimization 
june 2013 by jm
fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion; provides also big (64-bit) arrays, sets and lists, and fast, practical I/O classes for binary and text files. It is free software distributed under the Apache License 2.0. It requires Java 6 or newer.

used by Facebook (along with Apache Giraph, Netty, Unsafe) to speed up "weekend Hive jobs" to "coffee breaks".
via:highscalability  facebook  giraph  optimization  java  speed  fastutil  collections  data-structures 
june 2013 by jm
Netflix ISP Speed Index for Ireland
Via Mulley. Magnet doing well, with UPC coming second; UPC have dropped a fair bit in the past month. Would love to see it broken down by region...
upc  ireland  isps  speed  bandwidth  netflix  broadband  magnet  eircom 
april 2013 by jm
joshua's blog: overclocking the lecture
Joshua's old tip on watching videos at 2x speed using Perian
quicktime  video  hacks  mac  speed  lectures  presentations  learning 
april 2013 by jm
Jeff Dean's list of "Numbers Everyone Should Know"
from a 2007 Google all-hands, the list of typical latency timings from ranging from an L1 cache reference (0.5 nanoseconds) to a CA->NL->CA IP round trip (150 milliseconds).
performance  latencies  google  jeff-dean  timing  caches  speed  network  zippy  disks  via:kellabyte 
march 2013 by jm
Unlike other tools intended to solve the JVM startup problem (e.g. Nailgun, Cake), Drip does not use a persistent JVM. There are many pitfalls to using a persistent JVM, which we discovered while working on the Cake build tool for Clojure. The main problem is that the state of the persistent JVM gets dirty over time, producing strange errors and requiring liberal use of cake kill whenever any error is encountered, just in case dirty state is the cause.

Instead of going down this road, Drip uses a different strategy. It keeps a fresh JVM spun up in reserve with the correct classpath and other JVM options so you can quickly connect and use it when needed, then throw it away. Drip hashes the JVM options and stores information about how to connect to the JVM in a directory with the hash value as its name.

(via HN)
java  command-line  tools  startup  speed 
november 2012 by jm
AnandTech - The Intel SSD DC S3700: Intel's 3rd Generation Controller Analyzed
Interesting trend; Intel moved from a btree to an array-based data structure for their logical-block address indirection map, in order to reduce worst-case latencies (via Martin Thompson)
latency  intel  via:martin-thompson  optimization  speed  p99  data-structures  arrays  btrees  ssd  hardware 
november 2012 by jm
How to make a security geek feel very old: #Factorisation, #DKIM and @DrZacharyHarris
“A 384-bit key I can factor on my laptop in 24 hours. The 512-bit keys I can factor in about 72 hours using Amazon Web Services for $75. And I did do a number of those. Then there are the 768-bit keys. Those are not factorable by a normal person like me with my resources alone. But the government of Iran probably could, or a large group with sufficient computing resources could pull it off.”

Remember when we thought 512-bit keys would be enough? how time flies!

Of course, John Aycock raised this problem back in 2007, although he assumed it'd take a 100,000-host botnet to crack them (in 153 minutes).
factorisation  moores-law  cpu  speed  dkim  domain-keys  512-bit  cracking  security  via:alec-muffet 
october 2012 by jm
Through speed of traffic on San Francisco area streets vs. popularity with Flickr and Twitter users
"slower streets" generate more photos/tweets than "faster streets", with a peak around 9 mph
data  san-francisco  photos  flickr  twitter  speed  driving 
september 2011 by jm
snappy - A fast compressor/decompressor
'On a single core of a Core i7 processorin 64-bit mode, it compresses at about 250 MB/sec or more and decompresses atabout 500 MB/sec or more. (These numbers are for the slowest inputs in ourbenchmark suite; others are much faster.) In our tests, Snappy usuallyis faster than algorithms in the same class (e.g. LZO, LZF, FastLZ, QuickLZ,etc.) while achieving comparable compression ratios.'  Apache-licensed, from Google
snappy  google  compression  speed  from delicious
march 2011 by jm
Overclocking SSL
techie details from Adam Langley on how Google's been improving TLS/SSL, with lots of good tips. they switched in January to HTTPS for all Gmail users by default, without any additional machines or hardware
certificates  encryption  google  https  latency  speed  ssl  tcp  tls  web  performance  from delicious
july 2010 by jm
"Source Code Optimisation", Felix von Leitner, Linux Kongress 2009 [PDF]
Good presentation on C compiler optimization, via Cal Henderson. 'People often write less readable code because they think it will produce faster code. Unfortunately, in most cases, the code will not be faster.' I particularly like 'Fancy-Schmancy Algorithms': 'If you have 10-100 elements, use a list, not a red-black tree; Fancy data structures help on paper, but rarely in reality. (More space overhead in the data structure, less L2 cache left for actual data.)'
via:iamcal  compilers  c  c++  optimization  coding  assembly  speed  from delicious
november 2009 by jm

related tags

512-bit  aho-corasick  algorithms  arrays  assembly  bandwidth  benchmarking  benchmarks  broadband  btrees  c  c++  caches  cars  cassandra  causal-profilers  certificates  coding  collections  command-line  compilers  compression  concurrency  coordinated-omission  copying  cpu  cracking  data  data-structures  disks  disruptor  dkim  domain-keys  driving  eircom  encryption  ethernet  facebook  factorisation  fast-path  fastutil  flickr  fonts  fpga  gil-tene  giraph  google  gzip  hacks  handshake  haproxy  hardware  hdrhistogram  http  https  insane  intel  internet  ireland  isps  java  jeff-dean  kragen  latencies  latency  learning  lectures  load  load-testing  long-distance  low-latency  lua  mac  magnet  mapreduce  marketing  measurement  memory  metrics  monument-valley  moores-law  netflix  network  networking  nginx  ofcom  open-source  optimization  p99  packets  performance  photos  presentations  profiling  quicktime  random  safety  san-francisco  scp  security  service-metrics  snappy  speed  ssd  ssl  startup  stock-markets  string-matching  strings  tcp  testing  thoughts  timing  tls  tools  trading  transfers  tries  tsunami-udp  twitter  udp  ui  uk  upc  ustwo  ux  via:alec-muffet  via:highscalability  via:iamcal  via:jgc  via:kellabyte  via:martin-thompson  video  web  zippy  zopfli 

Copy this bookmark: