regexps   161

« earlier    

Details of the Cloudflare outage on July 2, 2019
Great writeup from jgc. Worth noting some important lessons:

* config changes should be rolled out carefully and gradually, just like code;

* particularly regexps, which are effectively code anyway;

* emergency-use rollback systems need to work, of course!;

* having emergency-only systems is a risk, too, since infrequently-used code paths are likely to atrophy and break without anyone noticing (as nsheridan said);

* /.*/ in a regexp is pretty much always bad news, and would have been worth a linter to catch before commit.
cloudflare  outages  regex  postmortems  regexps  deployment  rollback  via:jgc  via:jm 
12 weeks ago by ignatz
Details of the Cloudflare outage on July 2, 2019
Great writeup from jgc. Worth noting some important lessons:

* config changes should be rolled out carefully and gradually, just like code;

* particularly regexps, which are effectively code anyway;

* emergency-use rollback systems need to work, of course!;

* having emergency-only systems is a risk, too, since infrequently-used code paths are likely to atrophy and break without anyone noticing (as nsheridan said);

* /.*/ in a regexp is pretty much always bad news, and would have been worth a linter to catch before commit.
cloudflare  outages  regex  postmortems  regexps  deployment  rollback  via:jgc 
july 2019 by jm
Paper: Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs
a software based, large-scale regex matcher designed to match multiple patterns at once (up to tens of thousands of patterns at once) and to ‘stream‘ (that is, match patterns across many different ‘stream writes’ without holding on to all the data you’ve ever seen). To my knowledge this makes it unique.

RE2 is software based but doesn’t scale to large numbers of patterns; nor does it stream (although it could). It occupies a fundamentally different niche to Hyperscan; we compared the performance of RE2::Set (the RE2 multiple pattern interface) to Hyperscan a while back.

Most back-tracking matchers (such as libpcre) are one pattern at a time and are inherently incapable of streaming, due to their requirement to backtrack into arbitrary amounts of old input.
regex  regular-expressions  algorithms  hyperscan  sensory-networks  regexps  simd  nfa 
march 2019 by jm
Rudy Giuliani doesn't understand how links work
As waxy noted: 'this might be funny if he wasn't Trump's cybersecurity advisor'.
Twitter allowed someone to invade my text with a disgusting anti-President message. The same thing-period no space-occurred later and it didn’t happen. Don’t tell me they are not committed cardcarrying anti-Trumpers. Time Magazine also may fit that description. FAIRNESS PLEASE


Giuliani composed a tweet with no spaces after full stops, and a broken regexp at Twitter auto-linkified "G-20.In". An internet prankster registered this domain and Giuliani lost his shit in a spectacular display of incompetence.

The best bit? Here's a thread with the original devs: https://twitter.com/hoverbird/status/1070142045140877312 -- 'Hey @tw and @bcherry, remember all the debates we had about the linkifying regex around edge cases like this?'

(via Waxy and pretty much everyone on twitter)
edge-cases  bugs  twitter  regexps  regular-expressions  links  urls  us-politics  trump  rudy-giuliani  security  funny 
december 2018 by jm
Hyperscan
a high-performance multiple regex matching library. It follows the regular expression syntax of the commonly-used libpcre library, yet functions as a standalone library with its own API written in C. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions, as well as matching of regular expressions across streams of data. Hyperscan is typically used in a DPI library stack.

Hyperscan began in 2008, and evolved from a commercial closed-source product 2009-2015. First developed at Sensory Networks Incorporated, and later acquired and released as open source software by Intel in October 2015. 

Hyperscan is under a 3-clause BSD license. We welcome outside contributors.


This is really impressive -- state of the art in parallel regexp matching has improved quite a lot since I was last looking at it.

(via Tony Finch)
via:fanf  regexps  regular-expressions  text  matching  pattern-matching  intel  open-source  bsd  c  dpi  scanning  sensory-networks 
august 2017 by jm
Regexp Disaster
Course notes from Gerald Jay Sussman's "Adventures in Advanced Symbolic Programming" class at MIT. Hard to argue with this:
The syntax of the regular-expression language is awful. There are various incompatable forms of the language and the quotation conventions are baroquen [sic]. Nevertheless, there is a great deal of useful software, for example grep, that uses regular expressions to specify the desired behavior.

Although regular-expression systems are derived from a perfectly good mathematical formalism, the particular choices made by implementers to expand the formalism into useful software systems are often
disastrous: the quotation conventions adopted are highly irregular; the egregious misuse of parentheses, both for grouping and for backward reference, is a miracle to behold. In addition, attempts to
increase the expressive power and address shortcomings of earlier designs have led to a proliferation of incompatible derivative languages.


(via Rob Pike's twitter: https://twitter.com/rob_pike/status/755856685923639296)
regex  regexps  regular-expressions  functional  combinators  gjs  rob-pike  coding  languages 
july 2016 by jm
Hyperscan
a high-performance multiple regex matching library. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data.


Via Tony Finch
via:fanf  regexps  regex  dpi  hyperscan  dfa  nfa  hybrid-automata  text-matching  matching  text  strings  streams 
october 2015 by jm
Retina
a regex-based, Turing-complete programming language. It's main feature is taking some text via standard input and repeatedly applying regex operations to it (e.g. matching, splitting, and most of all replacing). Under the hood, it uses .NET's regex engine, which means that both the .NET flavour and the ECMAScript flavour are available.


Reminscent of sed(1); see http://codegolf.stackexchange.com/a/58166 for an example Retina program
retina  regexps  regexes  regular-expressions  coding  hacks  dot-net  languages 
september 2015 by jm
Online regex tester and debugger: JavaScript, Python, PHP, and PCRE
Online regex tester, debugger with highlighting for PHP, PCRE, Python and JavaScript.
toolbox  regexps 
may 2015 by amrangaye
RegExr: Learn, Build, & Test RegEx
Regular expression tester with syntax highlighting, contextual help, video tutorial, reference, and searchable community patterns.
toolbox  regexps 
may 2015 by amrangaye
Advanced searching
Org-mode has many powerful built-in search functions. These tools transform hierarchical org files into robust plain text "databases" that can be queried in sophisticated ways. Outline headings in Org-mode not only function as document sections or todo items; each heading can also store an unlimited amount of text and various types of metadata. And, of course, since Org-mode files are plain text, any number of tools (grep, awk, perl, etc.) can be used to filter and manipulate the data they contain.

The goal of this tutorial is to offer an introduction to the built-in commands and syntax for querying Org-mode outlines. While these are explained in various places in the Org-mode manual, this tutorial attempts to provide an overview in one place. It is particularly aimed at those who would like to use Org-mode as a note-taking and reference management tool. Nonetheless, it should prove useful to anyone who needs to locate specific information buried in an ever-growing collection of Org-mode data.
org-mode  emacs  agenda  search  regexps  org  productivity 
march 2015 by deprecated

« earlier    

related tags

[delicious-do_not_delete]  ack  agenda  algorithms  awk  backreferences  bioinformatics  blast  boost  bsd  bugs  built-in  built  by:timbl  c  chomsky-grammar  chomsky  cloudflare  code  codesearch  coding  combinators  compilation  compsci  computational-complexity  context-free  cool  crosswords  cucumber  daringfireball  demerphq  deployment  details  dfa  dot-net  dpi  edge-cases  editing  editors  efficiency  emacs  encoding  examples  exponential-time  expressions  extensions  filetype:pdf  filters  flash  flex  framework  fun  functional  funny  game  gems  github  gjs  go  google  gotcha  grammar  grep  hacks  html  humour  hybrid-automata  hyperscan  in  indexing  inspiration  intel  java  javascript-cookbook  javascript  jit  jquery  json  language  languages  library  linear-time  links  linux-fu  lisp  lucene  matching  media:document  motifs  multiline  mysql  nfa  node  np-complete  open-source  opensource  optimization  org-mode  org  outages  parse  parselets  parser  parsing  pattern-matching  pattern  pcre  pdf  performance  perl  perlmonks  peter-norvig  php  postmortems  productivity  programming  project  python  r.learning  r  rails  re2  reference  regex  regexes  regexp  regular-expressions  regular  regularexpression  regularexpressions  replace  response  retina  rfc  rob-pike  rollback  ruby  rudy-giuliani  rule-dev  rule-discovery  rules  russcox  scanning  scheme  scraping  search  searchandreplace  searchengine  security  sed  sensory-networks  servlets  shivers  simd  sitescooper  slides  solution  spamassassin  sre  stackoverflow  state-machines  streams  strings  structural  suffix-trees  syntax  testing  text-matching  text  textediting  textprocessing  tips  to-read  toolbox  tools  toread  trigram  trigramindex  trump  tutorials  twitter  unicode  unix  unread  uri  url  urls  us-politics  utf-8  utf8  utilities  validation  vi  video  vim  vm  web  websockets  xkcd  xml  zalgo 

Copy this bookmark:



description:


tags: