strings   2975

« earlier    

GitHub - boyerjohn/rapidstring: Maybe the fastest string library ever.

rapidstring is maybe the fastest string library ever written in ANSI C. Here are some of the features:

Highly performant. Every aspect of the library was first considered from a performance perspective, and it shows. The current benchmarks outperform the standard string implementations of GCC, Clang, MSVC and ICC by a factor of two or more in most tests.

Trivial integration. The entire library consists of a single header file. The code is written in vanilla ANSI C that has been tested on all current compilers. Furthermore, the code is entirely C++ compatible.

Minimalist design. Out of the ~1,500 lines of code, only 200 are used to implement the library functions, the rest being documentation. The library has the sole purpose of providing an efficient and reliable string library.

Extensive documentation. All functions are thoroughly documented with information on its parameters, the complexity, whether it allocates, when it allocates and more.

Configurable. The internal implementation of rapidstring is very open. All internal functions and macros are documented to allow the utmost leeway to the user. Any internal macros such as the allocations functions, the stack capacity, the growth multiplier or the inling settings may be redefined by the user.

Vigorous testing. The library has 100% unit test coverage with valgrind memory leak checks. All tests are ran on GCC, Clang and MSVC in the continous integration builds to ensure the library is always up to par.
clang  c  strings  performance 
17 days ago by euler
Aho–Corasick string search | Lobsters
burntsushi covers WHY AC is so good, including potential pitfalls. Another comment has link to original paper.
lobsters  comment  algorithms  strings 
18 days ago by mechazoidal
‎Text Case on the App Store
A cool $1 utility for iOS that takes selected or copied text and performs case transformations: title case, URL Encoded, uppercase, lowercase, capitalized, reversed, and "Mocking Spongebob" (random capitalization). Available through a share sheet in any editor or text field.
ios  strings  writing 
6 weeks ago by ttscoff
[1710.10964] At the Roots of Dictionary Compression: String Attractors
A well-known fact in the field of lossless text compression is that high-order entropy is a weak model when the input contains long repetitions. Motivated by this fact, decades of research have generated myriads of so-called dictionary compressors: algorithms able to reduce the text's size by exploiting its repetitiveness. Lempel-Ziv 77 is probably one of the most successful and known tools of this kind, followed by straight-line programs, run-length Burrows-Wheeler transform, and other less-known schemes. In this paper, we show that these techniques are different solutions to the same, elegant, combinatorial problem: to find a small set of positions capturing all distinct text's substrings. We call such a set a string attractor. We first show reductions between dictionary compressors and string attractors. This gives us the approximation ratios of dictionary compressors with respect to the smallest string attractor and allows us to solve several open problems related to the asymptotic relations between the output sizes of different dictionary compressors. We then show that the k-attractor problem - that is, deciding whether a text has a size-t set of positions capturing all substrings of length at most k - is NP-complete for k >= 3. We provide approximation techniques for the smallest k-attractor, show that the problem is APX-complete for constant k, and give strong inapproximability results. To conclude, we provide matching lower- and upper- bounds for the random access problem on string attractors. Our optimal data structure is universal: by our reductions to string attractors, it supports random access on any dictionary-compression scheme. In particular, our solution matches the lower bound also on LZ77, straight-line programs, collage systems, and macro schemes, and therefore essentially closes (at once) the random access problem for all these compressors.
compression  strings  feature-extraction  representation  algorithms  computational-complexity  to-understand  nudge-targets  consider:looking-to-see 
6 weeks ago by Vaguery
Strings Are Evil
Reducing memory allocations from 7.5GB to 32KB
8 weeks ago by geetarista

« earlier    

related tags

"film  "mus  access  algorithms  analysis  android-library  android  array  ascii  attached  audio  autoprefixer  backfire  bash  bass  beam  benchmark  blockbuster  blog  blueprint  book  bytes  c#  c++  c  characters  cheatsheet  cinematic  clang  classical  cli  clojure  cocoa  cocoa_touch  code  coding  combine  comma  comment  comments  comparing  compression  computational-complexity  consider:looking-to-see  consider:performance-measures  consider:representation  corruption  cplusplus  crystallang  dammitbrain  data-mining  data  database  datastructures  date  debugging  decode  destructuring  dev  development  djb2  documentation  dotnet  drop  ecmascript  elements  elixir  emotional  encode  encoding  english  epic  erlang  errormessage  es2015  es6  escape  escaping  example  extra  factorio  family  feature-extraction  features  field  file-format  film  find  forensics  format  formatstring  formatting  fpga  funny  github  go  golang  guide  guitar  hardware  hash  hashing  heart  heartstrings  hexadecimal  howto  human  i18n  implode  important  index  injection  input  ios  iterables  iterate  java  javascript  join  js  lexicon  libraries  library  license  lines  list  lists  literals  lobsters  locale  localization  loop  malware  matching  math  mdb  melodic  memory  method  metrics  minimal  mojod  money  ms  multibyte  multiline  music"  music  mysql  natural.language.processing  naughty  newbie  newline  nlp  nsattributedstring  nslocalizedstring  nudge-targets  null  nylon-string  ocaml  ollydbg  onapis  optimization  orchestral  oreilly  orientationchanges  osx  overview  packages  pandas  partial  pattern-discovery  patternmatching  percussion  perfect  performance  php  plugin  pocket  postcss  powershell  print  programmierung  programming  programminglanguages  project  puppet  py3  pykd  python  python2.7  python2  python3  r  rather-interesting  readability  reasonml  reference  regex  replacement  representation  resources  return  reverse-engineering  reverseengineering  rstudio  ruby  sanitisation  sanitise  sanity  santization  scala  score  script  scripting  sdl  security  separate  separated  separator  sequences  softwaredesign  solution  soundcloud  soundtrack  sql  stack_overflow  stackoverflow  statistics  stephanie  string  stringformatting  stringmanipulation  substring  support  swift  swift4  sync  syntax  teaching  template  testing  text  textdata  time  to-understand  to-use  tolearn  tool  toread  totry  tounderstand  trailer  trumpets  tuples  tutorial  unicode  utf8  utf8mb4  values  vocabulary  w3schools  windbg  windows  woodwinds  word.lists  wordlists  wordpress  words  writing  xliff  xml  xss  yaml  zero-length 

Copy this bookmark: