jm + tries   10

qp tries: smaller and faster than crit-bit tries
interesting new data structure from Tony Finch. "Some simple benchmarks say qp tries have about 1/3 less memory overhead and are about 10% faster than crit-bit tries."
crit-bit  popcount  bits  bitmaps  tries  data-structures  via:fanf  qp-tries  crit-bit-tries  hacks  memory 
october 2015 by jm
A nice Lua/C++ implementation of Aho-Corasick for fast string matching against multiple patterns (via JGC). This uses an interesting technique to get better performance by compacting the data structure into a single buffer, to avoid following pointers all over RAM and busting the cache.
optimization  speed  performance  aho-corasick  tries  string-matching  strings  algorithms  lua  c++  via:jgc 
august 2014 by jm
Lucene 4 - Revisiting Problems For Speed [slides]
a Presentation from Simon Willnauer on optimization work performed on Lucene in 2011. The most interesting stuff here is the work done to replace an O(n^2) FuzzyQuery fuzzy-match algorithm with a FSM trie is extremely cool -- benchmarked at 214 times faster!
benchmarks  slides  lucene  search  fuzzy-matching  text-matching  strings  algorithms  coding  fsm  tries 
april 2013 by jm
Jetty-9 goes fast with Mechanical Sympathy
This is very cool! Applying Mechanical Sympathy optimization techniques to Jetty, specifically: "False sharing" on the BlockingArrayQueue data structure resolved; a new ArrayTernaryTrie data structure to improve header field storage, making it faster to build. look up, efficient on RAM, cheap to GC, and more cache-friendly than a traditional trie; and a branchless hex-to-byte conversion statement. The results are a 30%-faster microbenchmark on amd64, with 50% less Young Gen garbage collections. Lovely to see low-level infrastructure libs like Jetty getting this kind of optimization.
jetty  java  mechanical-sympathy  optimization  coding  tries 
february 2013 by jm
Efficient In-Memory Indexing with Generalized Prefix Trees [PDF]
'Efficient data structures for in-memory indexing gain in importance due to
(1) the exponentially increasing amount of data, (2) the growing main-memory capacity, and (3) the gap between main-memory and CPU speed. In consequence, there are
high performance demands for in-memory data structures. Such index structures are
used -- with minor changes -- as primary or secondary indices in almost every DBMS.
Typically, tree-based or hash-based structures are used, while structures based on
prefix-trees (tries) are neglected in this context. For tree-based and hash-based structures, the major disadvantages are inherently caused by the need for reorganization
and key comparisons. In contrast, the major disadvantage of trie-based structures in
terms of high memory consumption (created and accessed nodes) could be improved.
In this paper, we argue for reconsidering prefix trees as in-memory index structures
and we present the generalized trie, which is a prefix tree with variable prefix length
for indexing arbitrary data types of fixed or variable length. The variable prefix length
enables the adjustment of the trie height and its memory consumption. Further, we
introduce concepts for reducing the number of created and accessed trie levels. This
trie is order-preserving and has deterministic trie paths for keys, and hence, it does
not require any dynamic reorganization or key comparisons. Finally, the generalized
trie yields improvements compared to existing in-memory index structures, especially
for skewed data. In conclusion, the generalized trie is applicable as general-purpose
in-memory index structure in many different OLTP or hybrid (OLTP and OLAP) data
management systems that require balanced read/write performance.' (via Tony Finch)
via:fanf  prefix-trees  tries  data-structures 
january 2013 by jm
The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases [PDF]
'Main memory capacities have grown up to a point where most databases fit into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data structures like balanced binary search trees are not efficient on modern hardware, because they do not optimally utilize on-CPU caches. Hash tables, also often used for main-memory indexes, are fast but only support point queries. To overcome these shortcomings, we present ART, an adaptive radix tree (trie) for efficient indexing in main memory. Its lookup performance surpasses highly tuned, read-only search trees, while supporting very efficient insertions and deletions as well. At the same time, ART is very space efficient and solves the problem of excessive worst-case space consumption, which plagues most radix trees, by adaptively choosing compact and efficient data structures for internal nodes. Even though ART’s performance is comparable to hash tables, it maintains the data in sorted order, which enables additional operations like range scan and prefix lookup.' (via Tony Finch)
via:fanf  data-structures  trees  indexing  cache-aware  tries 
january 2013 by jm
HAT-trie: A Cache-conscious Trie-based Data Structure for Strings [PDF]
'Tries are the fastest tree-based data structures for managing strings in-memory, but are space-intensive. The burst-trie is almost as fast but reduces space by collapsing trie-chains into buckets. This is not however, a cache-conscious approach and can lead to poor performance on current processors. In this paper, we introduce the HAT-trie, a cache-conscious trie-based data structure that is formed by carefully combining existing components. We evaluate performance using several real-world datasets and against other highperformance data structures. We show strong improvements in both time and space; in most cases approaching that of the cache-conscious hash table. Our HAT-trie is shown to be the most efficient trie-based data structure for managing variable-length strings in-memory while maintaining sort order.' (via Tony Finch)
via:fanf  data-structures  tries  cache-aware  trees 
january 2013 by jm
Efficient concurrent long set and map
An ordered set and map data structure and algorithm for long keys and values, supporting concurrent reads by multiple threads and updates by a single thread.

Some good stuff in the linked blog posts about Clojure's PersistentHashMap and PersistentVector data structures, too.
arrays  java  tries  data-structures  persistent  clojure  concurrent  set  map 
december 2012 by jm
Open Data Structures
A free-as-in-speech as well as -beer textbook of data structures, covering a great range, including some I hadn't heard of before. Here's the full list: ArrayStack, FastArrayStack, ArrayQueue, ArrayDeque, DualArrayDeque, RootishArrayStack, SLList, DLList,
SEList, SkiplistSSet, SkiplistList, ChainedHashTable, LinearHashTable, BinaryTree, BinarySearchTree, Treap, ScapegoatTree, RedBlackTree, BinaryHeap, MeldableHeap, AdjacencyMatrix, AdjacencyLists, BinaryTrie, XFastTrie, and YFastTrie
algorithms  books  data-structures  computer-science  coding  tries  skiplists  arrays  queues  heap  trees  graphs  hashtables 
may 2012 by jm

Copy this bookmark: