Is Prefix Of String In Table? A Journey Into SIMD String Processing.


16 bookmarks. First posted by abolibibelot may 2018.


Really good description of SIMD string matching with really good figures.
may 2018 by guidoism
TL;DR

Wrote some C and assembly code that uses SIMD instructions to perform prefix matching of strings. The C code was between 4-7x faster than the baseline implementation for prefix matching. The assembly code was 9-12x faster than the baseline specifically for the negative match case (determining that an incoming string definitely does not prefix match any of our known strings). The fastest negative match could be done in around 6 CPU cycles, which is pretty quick. (Integer division, for example, takes about 90 cycles.)

Overview

Goal: given a string, determine if it prefix-matches a set of known strings as fast as possible. That is, in a set of known strings, do any of them prefix match the incoming search string?

A reference implementation was written in C as a baseline, which simply looped through an array of strings, comparing each one, byte-by-byte, looking for a prefix match. Prefix match performance ranged from 28 CPU cycles to 130, and negative match performance was around 74 cycles.

A SIMD-friendly C structure called STRING_TABLE was derived. It is optimized for up to 16 strings, ideally of length less than or equal 16 characters. The table is created from the set of known strings up-front; it is sorted by length, ascending, and a unique character (with regards to other characters at the same byte offset) is then extracted, along with its index. A 16 byte character array, STRING_SLOT, is used to capture the unique characters. A 16 element array of unsigned characters, SLOT_INDEX, is used to capture the index. Similarly, lengths are stored in the same fashion via SLOT_LENGTHS. Finally, a 16 element array of STRING_SLOTs is used to capture up to the first 16 bytes of each string in the set.
algorithm  grep  optimization  cs 
may 2018 by euler
And after 230 hours of work, I'm pleased to finally announce my first article: Is Prefix Of…
from twitter_favs
may 2018 by abolibibelot