jm + indexes   4

The Case for Learned Index Structures
'Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible.'

Excellent follow-up thread from Henry Robinson:

'The fact that the learned representation is more compact is very neat. But also it's not really a surprise that, given the entire dataset, we can construct a more compact function than a B-tree which is *designed* to support efficient updates.' [...] 'given that the model performs best when trained on the whole data set - I strongly doubt B-trees are the best we can do with the current state-of-the art.'
data-structures  ml  google  b-trees  storage  indexes  deep-learning  henry-robinson 
5 weeks ago by jm
Outage postmortem (2015-10-08 UTC) : Stripe: Help & Support
There was a breakdown in communication between the developer who requested the index migration and the database operator who deleted the old index. Instead of working on the migration together, they communicated in an implicit way through flawed tooling. The dashboard that surfaced the migration request was missing important context: the reason for the requested deletion, the dependency on another index’s creation, and the criticality of the index for API traffic. Indeed, the database operator didn’t have a way to check whether the index had recently been used for a query.

Good demo of how the Etsy-style chatops deployment approach would have helped avoid this risk.
stripe  postmortem  outages  databases  indexes  deployment  chatops  deploy  ops 
october 2015 by jm
FastBit: An Efficient Compressed Bitmap Index Technology
an [LGPL] open-source data processing library following the spirit of NoSQL movement. It offers a set of searching functions supported by compressed bitmap indexes. It treats user data in the column-oriented manner similar to well-known database management systems such as Sybase IQ, MonetDB, and Vertica. It is designed to accelerate user's data selection tasks without imposing undue requirements. In particular, the user data is NOT required to be under the control of FastBit software, which allows the user to continue to use their existing data analysis tools.

The key technology underlying the FastBit software is a set of compressed bitmap indexes. In database systems, an index is a data structure to accelerate data accesses and reduce the query response time. Most of the commonly used indexes are variants of the B-tree, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes called compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations, but are somewhat slower to update after a modification of an individual record.

A key innovation in FastBit is the Word-Aligned Hybrid compression (WAH) for the bitmaps.[...] Another innovation in FastBit is the multi-level bitmap encoding methods.
fastbit  nosql  algorithms  indexing  search  compressed-bitmaps  indexes  wah  bitmaps  compression 
april 2013 by jm
Riakking Complex Data Types
interesting details about Riak's support for secondary indexes. Not quite SQL, but still more powerful than plain old K/V storage (via dehora)
via:dehora  riak  indexes  storage  nosql  key-value-stores  2i  range-queries 
march 2013 by jm

Copy this bookmark: