jm + grep   7

'Make JSON greppable!'
json  gron  grep  cli  tools  data  hacking  golang 
april 2018 by jm
Building a Regex Search Engine for DNA | Hacker News
The original post is pretty mediocre -- a search engine which handles a corpus of "thousands" of plasmids from "a scientist's personal library", and which doesn't handle fuzzy matches? I think that's called grep -- but the HN comments are good
grep  regular-expressions  hacker-news  strings  dna  genomics  search  elasticsearch 
april 2016 by jm
interactive menu selection for the UNIX command line
cli  linux  unix  grep  menus  selection  ui  interactive  terminal 
february 2016 by jm
Ag: faster than Ack
Some nice performance tricks; I particularly like the use of sljit:
Ag uses Pthreads to take advantage of multiple CPU cores and search files in parallel.
Files are mmap()ed instead of read into a buffer.
Literal string searching uses Boyer-Moore strstr.
Regex searching uses PCRE's JIT compiler (if Ag is built with PCRE >=8.21).
Ag calls pcre_study() before executing the same regex on every file.
Instead of calling fnmatch() on every pattern in your ignore files, non-regex patterns are loaded into arrays and binary searched.
jit  cli  grep  search  ack  ag  unix  pcre  sljit  boyer-moore  tools 
march 2015 by jm
Dutch grepping Facebook for welfare fraud
'The [Dutch] councils are working with a specialist Amsterdam research firm, using the type of computer software previously deployed only in counterterrorism, monitoring [LinkedIn, Facebook and Twitter] traffic for keywords and cross-referencing any suspicious information with digital lists of social welfare recipients.

Among the giveaway terms, apparently, are “holiday” and “new car”. If the automated software finds a match between one of these terms and a person claiming social welfare payments, the information is passed on to investigators to gather real-life evidence.' With a 30% false positive rate, apparently -- let's hope those investigations aren't too intrusive!
grep  dutch  holland  via:tjmcintyre  privacy  facebook  twitter  linkedin  welfare  dole  fraud  false-positives  searching 
september 2011 by jm
'A set of programs for creating, manipulating, and outputing a stream of Records, or hashes. Inspired by Monad.' looks very powerful
monad  recordstream  open-source  recs  cli  grep  from delicious
september 2010 by jm
_Fast Cache for Your Text: Accelerating Exact Pattern Matching with Feed-Forward Bloom Filters_ [PDF]
intriguing application of a Bloom Filter optimised for modern CPUs (2-level, with a cache-partitioned first level), providing massive speedups vs GNU grep or trie-based approaches like Aho-Corasick -- or possibly re2c, as used in "sa-compile". On the other hand, a perl implementation of Rabin-Karp, which is similar, didn't perform as well. Still, may be worth investigating
bloom-filters  grep  filtering  spamassassin  sa-compile  text-matching  caches  aho-corasick  from delicious
september 2010 by jm

