jm + regex   9

Paper: Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs
a software based, large-scale regex matcher designed to match multiple patterns at once (up to tens of thousands of patterns at once) and to ‘stream‘ (that is, match patterns across many different ‘stream writes’ without holding on to all the data you’ve ever seen). To my knowledge this makes it unique.

RE2 is software based but doesn’t scale to large numbers of patterns; nor does it stream (although it could). It occupies a fundamentally different niche to Hyperscan; we compared the performance of RE2::Set (the RE2 multiple pattern interface) to Hyperscan a while back.

Most back-tracking matchers (such as libpcre) are one pattern at a time and are inherently incapable of streaming, due to their requirement to backtrack into arbitrary amounts of old input.
regex  regular-expressions  algorithms  hyperscan  sensory-networks  regexps  simd  nfa 
20 days ago by jm
Regexp Disaster
Course notes from Gerald Jay Sussman's "Adventures in Advanced Symbolic Programming" class at MIT. Hard to argue with this:
The syntax of the regular-expression language is awful. There are various incompatable forms of the language and the quotation conventions are baroquen [sic]. Nevertheless, there is a great deal of useful software, for example grep, that uses regular expressions to specify the desired behavior.

Although regular-expression systems are derived from a perfectly good mathematical formalism, the particular choices made by implementers to expand the formalism into useful software systems are often
disastrous: the quotation conventions adopted are highly irregular; the egregious misuse of parentheses, both for grouping and for backward reference, is a miracle to behold. In addition, attempts to
increase the expressive power and address shortcomings of earlier designs have led to a proliferation of incompatible derivative languages.

(via Rob Pike's twitter:
regex  regexps  regular-expressions  functional  combinators  gjs  rob-pike  coding  languages 
july 2016 by jm
'a Ruby regular expression editor and tester'. Great for prototyping regexps with a little set of test data, providing a neat permalink for the results
regex  regexp  ruby  tools  coding  web  editors  testing 
july 2016 by jm
a high-performance multiple regex matching library. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data.

Via Tony Finch
via:fanf  regexps  regex  dpi  hyperscan  dfa  nfa  hybrid-automata  text-matching  matching  text  strings  streams 
october 2015 by jm
Peter Norvig writes a program to play regex golf with arbitrary lists
In response to XKCD 1313. This is excellent. It's reminiscent of my SpamAssassin SOUGHT-ruleset regexp-discovery algorithm, described in , albeit without the BLAST step intended to maximise pattern length and minimise false positives
python  regex  xkcd  blast  rule-discovery  spamassassin  rules  regexps  regular-expressions  algorithms  peter-norvig 
january 2014 by jm
PCRE Performance Project
Excellent stuff. Using "sljit", a stackless platform-independent JIT compiler, this compiles Perl-compatible regular expressions to machine code on ARM, x86, MIPS and PowerPC platforms, resulting in 'similar matching speed to DFA based engines (like re2) on common patterns' with Perl compatibility. 'This work has been released as part of PCRE 8.20 and above. Now (PCRE 8.31), nearly all PCRE features are supported including UTF-8/16 and partial matching.'
pcre  regexps  regex  performance  optimization  jit  compilation  dfa  re2  via:akohli 
september 2012 by jm
RegExr: Online Regular Expression Testing Tool
a very nice interactive editor in Flash, supporting lots of the usual perlish stuff. via Joe
via:jdrumgoole  regexps  regular-expressions  spamassassin  rule-dev  flash  regex  flex  utilities  from delicious
december 2009 by jm
Structural Regular Expressions
'The current UNIX text processing tools are weakened by the built-in concept of a line. There is a simple notation that can describe the `shape' of files when the typical array-of-lines picture is inadequate. That notation is regular expressions. Using regular expressions to describe the structure in addition to the contents of files has interesting applications, and yields elegant methods for dealing with some problems the current tools handle clumsily. When operations using these expressions are composed, the result is reminiscent of shell pipelines.' Paper by Rob Pike, via adulau. intriguing
sregex  via:adulau  regexp  rob-pike  regex  library  text  structural  parsing  from delicious
november 2009 by jm
sregex - Structural Regular Expressions
'The sregex module implements Structural Regular Expressions.' Python, Apache-licensed
sregex  python  via:adulau  regexp  robpike  regex  library  text  structural  parsing  from delicious
november 2009 by jm

Copy this bookmark: