jm + optimization   35

'Addressing the rebalancing problem in bike-sharing systems' [paper]
Many of the bike-sharing systems introduced around the world in the past 15 years have the same problem: Riders tend to take some routes and not others. As a result, the bikes tend to collect in a few places, which is a drag for users and a costly problem for the operators, who "rebalance" the system using trucks that take bikes from full stations to empty ones. Now, scientists are coming up with special algorithms to improve this process. One of them, developed by scientists at the Vienna University of Technology and the Austrian Institute of Technology, is now being tested in Vienna's bike-sharing system; another, developed at Cornell University, is already in use in New York City.


Timely -- here's what Dublin Bikes looked like this morning: https://twitter.com/jmason/status/503828246086295552

(via Andrew Caines)
cycling  bike-sharing  borisbikes  dublinbikes  rebalancing  fleet  availability  optimization  maths  papers  toread  algorithms 
2 days ago by jm
'TCP And The Lower Bound of Web Performance' [pdf, slides]
John Rauser, Velocity, June 2010. Good data on real-world web perf based on the limitations which TCP and the speed of light impose
tcp  speed-of-light  performance  web  optimization  john-rauser 
22 days ago by jm
"Pitfalls of Object Oriented Programming", SCEE R&D
Good presentation discussing "data-oriented programming" -- the concept of optimizing memory access speed by laying out large data in a columnar format in RAM, rather than naively in the default layout that OOP design suggests
columnar  ram  memory  optimization  coding  c++  oop  data-oriented-programming  data  cache  performance 
6 weeks ago by jm
Exceptional Performance
Good benchmark data on the performance of JVM exceptions
java  jvm  exceptions  benchmarking  performance  optimization  coding 
may 2014 by jm
Druid | How We Scaled HyperLogLog: Three Real-World Optimizations
3 optimizations Druid.io have made to the HLL algorithm to scale it up for production use in Metamarkets: compacting registers (fixes a bug with unions of multiple HLLs); a sparse storage format (to optimize space); faster lookups using a lookup table.
druid.io  metamarkets  scaling  hyperloglog  hll  algorithms  performance  optimization  counting  estimation 
april 2014 by jm
Garbage Collection Optimization for High-Throughput and Low-Latency Java Applications
LinkedIn talk about the GC opts they used to optimize the Feed. good detail
performance  optimization  linkedin  java  jvm  gc  tuning 
april 2014 by jm
Netty Best Practices
'a.k.a. Faster == Better'. Slides from a talk given at Facebook on 19th March 2014 by Norman Maurer
netty  java  performance  optimization  facebook  slides  presentations 
april 2014 by jm
[#1259] Add optimized queue for SCMP pattern and use it in NIO and nativ... · 6efac61 · netty/netty
Interesting -- Netty has imported an optimized ASL2-licensed MPSC queue implementation from Akka (presumably for performance raisins)
performance  optimization  open-source  mpsc  queues  data-structures  netty  akka  java 
march 2014 by jm
Impact of large primitive arrays (BLOBS) on JVM Garbage Collection
some nice graphs and data on CMS performance, with/without -XX:ParGCCardsPerStrideChunk
cms  java  jvm  performance  optimization  tuning  off-heap-storage  memory 
march 2014 by jm
Branchless hex-to-decimal conversion hack
via @simonebordet, on the mechanical-sympathy list: ((c & 0x1F) + ((c >> 6) * 0x19) – 0x10)
hacks  one-liners  coding  performance  optimization  hex  conversion  numbers  ascii 
january 2014 by jm
Google Fonts recently switched to using Zopfli
Google Fonts recently switched to using new Zopfli compression algorithm:  the fonts are ~6% smaller on average, and in some cases up to 15% smaller! [...]
What's Zopfli? It's an algorithm that was developed by the compression team at Google that delivers ~3~8% bytesize improvement when compared to gzip with maximum compression. This byte savings comes at a cost of much higher encoding cost, but the good news is, fonts are static files and decompression speed is exactly the same. Google Fonts pays the compression cost once and every clients gets the benefit of smaller download. If you’re curious to learn more about Zopfli: http://bit.ly/Y8DEL4
zopfli  compression  gzip  fonts  google  speed  optimization 
january 2014 by jm
Netflix: Your Linux AMI: optimization and performance [slides]
a fantastic bunch of low-level kernel tweaks and tunables which Netflix have found useful in production to maximise productivity of their fleet. Interesting use of SCHED_BATCH process scheduler class for batch processes, in particular. Also, great docs on their experience with perf and SystemTap. Perf really looks like a tool I need to get to grips with...
netflix  aws  tuning  ami  perf  systemtap  tunables  sched_batch  batch  hadoop  optimization  performance 
december 2013 by jm
stereopsis : graphics : radix tricks
some nice super-optimized Radix Sort code which handles floating point values. See also http://codercorner.com/RadixSortRevisited.htm for more info on the histogramming/counter concept
sorting  programming  coding  algorithms  radix-sort  optimization  floating-point 
december 2013 by jm
Asynchronous logging versus Memory Mapped Files
Interesting article around using mmap'd files from Java using RandomAccessFile.getChannel().map(), which allows them to be accessed directly as a ByteBuffer. together with Atomic variable lazySet() operations, this provides pretty excellent performance results on low-latency writes to disk. See also: http://psy-lob-saw.blogspot.ie/2012/12/atomiclazyset-is-performance-win-for.html
atomic  lazyset  putordered  jmm  java  synchronization  randomaccessfile  bytebuffers  performance  optimization  memory  disk  queues 
november 2013 by jm
SPSC revisited part III - FastFlow + Sparse Data
holy moly. This is some heavily-optimized mechanical-sympathy Java code. By using a sparse data structure, cache-aligned fields, and wait-free low-level CAS concurrency primitives via sun.misc.Unsafe, a single-producer/single-consumer queue implementation goes pretty damn fast compared to the current state of the art
nitsanw  optimization  concurrency  java  jvm  cas  spsc  queues  data-structures  algorithms 
october 2013 by jm
Groundbreaking Results for High Performance Trading with FPGA and x86 Technologies
The enhancement in performance was achieved by providing a fast-path where trades are executed directly by the FPGA under the control of trigger rules processed by the x86 based functions. The latency is reduced further by two additional techniques in the FPGA – inline parsing and pre-emption. As market data enters the switch, the Ethernet frame is parsed serially as bits arrive, allowing partial information to be extracted and matched before the whole frame has been received. Then, instead of waiting until the end of a potential triggering input packet, pre-emption is used to start sending the overhead part of a response which contains the Ethernet, IP, TCP and FIX headers. This allows completion of an outgoing order almost immediately after the end of the triggering market feed packet.


Insane stuff. (Via Martin Thompson)
via:martin-thompson  insane  speed  low-latency  fpga  fast-path  trading  stock-markets  performance  optimization  ethernet 
october 2013 by jm
Probabalistic Scraping of Plain Text Tables
a nifty hack.
Recently I have been banging my head trying to import a ton of OCR acquired data expressed in tabular form. I think I have come up with a neat approach using probabilistic reasoning combined with mixed integer programming. The method is pretty robust to all sorts of real world issues. In particular, the method leverages topological understanding of tables, encodes it declaratively into a mixed integer/linear program, and integrates weak probabilistic signals to classify the whole table in one go (at sub second speeds). This method can be used for any kind of classification where you have strong logical constraints but noisy data.


(via proggit)
scraping  tables  ocr  probabilistic  linear-programming  optimization  machine-learning  via:proggit 
september 2013 by jm
Java Garbage Collection Distilled
Martin Thompson lays it out:
Serial, Parallel, Concurrent, CMS, G1, Young Gen, New Gen, Old Gen, Perm Gen, Eden, Tenured, Survivor Spaces, Safepoints, and the hundreds of JVM start-up flags. Does this all baffle you when trying to tune the garbage collector while trying to get the required throughput and latency from your Java application? If it does then don’t worry, you are not alone. Documentation describing garbage collection feels like man pages for an aircraft. Every knob and dial is detailed and explained but nowhere can you find a guide on how to fly. This article will attempt to explain the tradeoffs when choosing and tuning garbage collection algorithms for a particular workload.
gc  java  garbage-collection  coding  cms  g1  jvm  optimization 
june 2013 by jm
Big Memory, Part 4
good microbenchmarking of a bunch of Java collections; Trove, fastutil, PCJ, mahout-collections, hppc
java  collections  benchmarks  performance  speed  coding  data-structures  optimization 
june 2013 by jm
fastutil
fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion; provides also big (64-bit) arrays, sets and lists, and fast, practical I/O classes for binary and text files. It is free software distributed under the Apache License 2.0. It requires Java 6 or newer.


used by Facebook (along with Apache Giraph, Netty, Unsafe) to speed up "weekend Hive jobs" to "coffee breaks". http://www.slideshare.net/nitayj/2013-0603-berlin-buzzwords
via:highscalability  facebook  giraph  optimization  java  speed  fastutil  collections  data-structures 
june 2013 by jm
Measuring & Optimizing I/O Performance
Another good writeup on iostat and EBS, from Ilya Grigorik
io  optimization  sysadmin  performance  iostat  ebs  aws  ops 
may 2013 by jm
Jetty-9 goes fast with Mechanical Sympathy
This is very cool! Applying Mechanical Sympathy optimization techniques to Jetty, specifically: "False sharing" on the BlockingArrayQueue data structure resolved; a new ArrayTernaryTrie data structure to improve header field storage, making it faster to build. look up, efficient on RAM, cheap to GC, and more cache-friendly than a traditional trie; and a branchless hex-to-byte conversion statement. The results are a 30%-faster microbenchmark on amd64, with 50% less Young Gen garbage collections. Lovely to see low-level infrastructure libs like Jetty getting this kind of optimization.
jetty  java  mechanical-sympathy  optimization  coding  tries 
february 2013 by jm
Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com
Using new Intel Core i7 instructions to speed up string manipulation. Fascinating stuff. SSE ftw
sse  optimization  simd  assembly  intel  i7  intel-core  strstr  strings  string-matching  strchr  strlen  coding 
january 2013 by jm
AnandTech - The Intel SSD DC S3700: Intel's 3rd Generation Controller Analyzed
Interesting trend; Intel moved from a btree to an array-based data structure for their logical-block address indirection map, in order to reduce worst-case latencies (via Martin Thompson)
latency  intel  via:martin-thompson  optimization  speed  p99  data-structures  arrays  btrees  ssd  hardware 
november 2012 by jm
PCRE Performance Project
Excellent stuff. Using "sljit", a stackless platform-independent JIT compiler, this compiles Perl-compatible regular expressions to machine code on ARM, x86, MIPS and PowerPC platforms, resulting in 'similar matching speed to DFA based engines (like re2) on common patterns' with Perl compatibility. 'This work has been released as part of PCRE 8.20 and above. Now (PCRE 8.31), nearly all PCRE features are supported including UTF-8/16 and partial matching.'
pcre  regexps  regex  performance  optimization  jit  compilation  dfa  re2  via:akohli 
september 2012 by jm
Avoiding Hash Lookups in a Ruby Implementation
'If I were to sum up the past 6 years I've spent optimizing JRuby it would be with the following phrase: Get Rid Of Hash Lookups.'

This has been a particular theme of some recent optimization hacks I've been working on. Hashes may be O(1) to read, on average, but that doesn't necessarily mean they're the right tool for performance...

(via Declan McGrath)
via:declanmcgrath  hash  optimization  ruby  performance  jruby  hashing  data-structures  big-o  optimisation 
september 2012 by jm
Expensive lessons in Python performance tuning
some good advice for large-scale Python performance: prun and guppy for profiling, namedtuples for memory efficiency, and picloud for trivial EC2-based scale-out. (via Nelson)
picloud  prun  guppy  namedtuples  python  optimization  performance  tuning  profiling 
july 2012 by jm
Linux Profiling tools and techniques
great tips for system-level and app-level profiling on Linux from Padraig
profiling  optimization  linux  cache  valgrind 
april 2011 by jm
Blosc
A high-performance compressor optimized for binary data -- 'designed to transmit data to the processor cache faster than a traditional, non-compressed, direct memory fetch via memcpy()' (via Bill de hOra)
via:dehora  compression  memcpy  caching  l1  software  memory  optimization  performance  python  pytables  from delicious
october 2010 by jm
The MySQL “swap insanity” problem and the effects of the NUMA architecture
very interesting; modern multicore x86 architectures use a NUMA memory architecture, which can cause a dip into swap, even when there appears to be plenty of free RAM available
linux  memory  mysql  optimization  performance  swap  tuning  vm  numa  swap-insanity  swapping  from delicious
september 2010 by jm
spamtune.pl
'a Perl script that generates an OpenOffice.org spreadsheet which loads up SpamAssassin configuration and known spam and ham messages. Once loaded, you can tweak individual SpamAssassin scores in the spreadsheet itself and see their effect on spam/ham classification in real-time. The script also shows you the number of false positives and negatives for a set of scores in real-time.' by Raj Mathur <raju at linux-delhi.org>
spamtune  spamassassin  rules  scores  optimization  tweaking  openoffice  from delicious
april 2010 by jm
"Source Code Optimisation", Felix von Leitner, Linux Kongress 2009 [PDF]
Good presentation on C compiler optimization, via Cal Henderson. 'People often write less readable code because they think it will produce faster code. Unfortunately, in most cases, the code will not be faster.' I particularly like 'Fancy-Schmancy Algorithms': 'If you have 10-100 elements, use a list, not a red-black tree; Fancy data structures help on paper, but rarely in reality. (More space overhead in the data structure, less L2 cache left for actual data.)'
via:iamcal  compilers  c  c++  optimization  coding  assembly  speed  from delicious
november 2009 by jm

related tags

akka  algorithms  ami  arrays  ascii  assembly  atomic  autoscaling  availability  aws  batch  benchmarking  benchmarks  big-o  bike-sharing  borisbikes  btrees  bytebuffers  c  c++  cache  caching  cas  cms  coding  collections  columnar  compilation  compilers  compression  concurrency  conversion  counting  cycling  data  data-oriented-programming  data-structures  dfa  disk  druid.io  dublinbikes  ebs  estimation  ethernet  exceptions  facebook  fast-path  fastutil  fleet  floating-point  fonts  fpga  g1  g1gc  garbage-collection  gc  giraph  google  guppy  gzip  hacks  hadoop  hardware  hash  hashing  hex  hll  hyperloglog  i7  insane  intel  intel-core  io  iostat  java  jetty  jit  jmm  john-rauser  jruby  jvm  l1  latency  lazyset  linear-programming  linkedin  linux  low-latency  machine-learning  maths  mechanical-sympathy  memcpy  memory  metamarkets  mpsc  mysql  namedtuples  netflix  netty  nitsanw  numa  numbers  ocr  off-heap-storage  one-liners  oop  open-source  openoffice  ops  optimisation  optimization  p99  papers  pcre  perf  performance  picloud  power  presentations  probabilistic  profiling  programming  prun  putordered  pytables  python  queues  radix-sort  ram  randomaccessfile  re2  rebalancing  regex  regexps  ruby  rules  scaling  sched_batch  scores  scraping  simd  slides  software  sort  sorting  sorting-networks  spamassassin  spamtune  speed  speed-of-light  spsc  ssd  sse  stack-overflow  stock-markets  strchr  string-matching  strings  strlen  strstr  swap  swap-insanity  swapping  synchronization  sysadmin  systemtap  tables  tcp  toread  trading  tries  tunables  tuning  tweaking  valgrind  via:akohli  via:declanmcgrath  via:dehora  via:eoinbrazil  via:highscalability  via:iamcal  via:martin-thompson  via:proggit  vm  web  zopfli 

Copy this bookmark:



description:


tags: