jm + sensory-networks   2

Paper: Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs
a software based, large-scale regex matcher designed to match multiple patterns at once (up to tens of thousands of patterns at once) and to ‘stream‘ (that is, match patterns across many different ‘stream writes’ without holding on to all the data you’ve ever seen). To my knowledge this makes it unique.

RE2 is software based but doesn’t scale to large numbers of patterns; nor does it stream (although it could). It occupies a fundamentally different niche to Hyperscan; we compared the performance of RE2::Set (the RE2 multiple pattern interface) to Hyperscan a while back.

Most back-tracking matchers (such as libpcre) are one pattern at a time and are inherently incapable of streaming, due to their requirement to backtrack into arbitrary amounts of old input.
regex  regular-expressions  algorithms  hyperscan  sensory-networks  regexps  simd  nfa 
20 days ago by jm
a high-performance multiple regex matching library. It follows the regular expression syntax of the commonly-used libpcre library, yet functions as a standalone library with its own API written in C. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions, as well as matching of regular expressions across streams of data. Hyperscan is typically used in a DPI library stack.

Hyperscan began in 2008, and evolved from a commercial closed-source product 2009-2015. First developed at Sensory Networks Incorporated, and later acquired and released as open source software by Intel in October 2015. 

Hyperscan is under a 3-clause BSD license. We welcome outside contributors.

This is really impressive -- state of the art in parallel regexp matching has improved quite a lot since I was last looking at it.

(via Tony Finch)
via:fanf  regexps  regular-expressions  text  matching  pattern-matching  intel  open-source  bsd  c  dpi  scanning  sensory-networks 
august 2017 by jm

Copy this bookmark: