jm + sql   29

'shift is a [web] application that helps you run schema migrations on MySQL databases'
databases  mysql  sql  migrations  ops  square  ddl  percona 
7 weeks ago by jm
Why Uber Engineering Switched from Postgres to MySQL
Uber bringing the smackdown for the HN postgres fanclub, with some juicy technical details of issues that caused them pain. FWIW, I was bitten by crappy postgres behaviour in the past (specifically around vacuuming and pgbouncer), so I've long been a MySQL fan ;)
database  mysql  postgres  postgresql  uber  architecture  storage  sql 
july 2016 by jm
ClickHouse — open-source distributed column-oriented DBMS
'ClickHouse manages extremely large volumes of data in a stable and sustainable manner. It currently powers Yandex.Metrica, world’s second largest web analytics platform, with over 13 trillion database records and over 20 billion events a day, generating customized reports on-the-fly, directly from non-aggregated data. This system was successfully implemented at CERN’s LHCb experiment to store and process metadata on 10bn events with over 1000 attributes per event registered in 2011.'

Yandex-tastic, but still looks really interesting
yandex  analytics  database  storage  sql  clickhouse 
june 2016 by jm
Visual Representation of SQL Joins
useful bookmark to have (via Nelson)
sql  joins  mysql  reference  database 
march 2016 by jm
5 subtle ways you're using MySQL as a queue, and why it'll bite you
Excellent post from Percona. I particularly like that they don't just say "don't use MySQL" -- they give good advice on how it can be made work: 1) avoid polling; 2) avoid locking; and 3) avoid storing your queue in the same table as other data.
database  mysql  queueing  queue  messaging  percona  rds  locking  sql  architecture 
january 2016 by jm
SQL on Kafka using PipelineDB
this is quite nice. PipelineDB allows direct hookup of a Kafka stream, and will ingest durably and reliably, and provide SQL views computed over a sliding window of the stream.
logging  sql  kafka  pipelinedb  streaming  sliding-window  databases  search  querying 
september 2015 by jm
Mikhail Panchenko's thoughts on the July 2015 CircleCI outage
an excellent followup operational post on CircleCI's "database is not a queue" outage
database-is-not-a-queue  mysql  sql  databases  ops  outages  postmortems 
july 2015 by jm
How We Scale VividCortex's Backend Systems - High Scalability
Excellent post from Baron Schwartz about their large-scale, 1-second-granularity time series database storage system
time-series  tsd  storage  mysql  sql  baron-schwartz  ops  performance  scalability  scaling  go 
march 2015 by jm
devbook/ at master · barsoom/devbook
How to avoid the shitty behaviour of ActiveRecord wrt migration safety, particularly around removing/renaming columns. ugh, ActiveRecord
activerecord  fail  rails  mysql  sql  migrations  databases  schemas  releasing 
march 2015 by jm
The Large Hadron Migrator is a tool to perform live database migrations in a Rails app without locking.

The basic idea is to perform the migration online while the system is live, without locking the table. In contrast to OAK and the facebook tool, we only use a copy table and triggers. The Large Hadron is a test driven Ruby solution which can easily be dropped into an ActiveRecord or DataMapper migration. It presumes a single auto incremented numerical primary key called id as per the Rails convention. Unlike the twitter solution, it does not require the presence of an indexed updated_at column.
migrations  database  sql  ops  mysql  rails  ruby  lhm  soundcloud  activerecord 
march 2015 by jm
A Dropwizard Metrics extension to instrument JDBC resources and measure SQL execution times.
metrics  sql  jdbc  instrumentation  dropwizard 
march 2015 by jm
F1: A Distributed SQL Database That Scales
Beyond the interesting-enough stuff about scalability in a distributed SQL store, there's this really nifty point about avoiding the horrors of the SQL/ORM impedance mismatch:
At Google, Protocol Buffers are ubiquitous for data storage and interchange between applications. When we still had a MySQL schema, users often had to write tedious and error-prone transformations between database rows and in-memory data structures. Putting protocol buffers in the schema removes this impedance mismatch and gives users a universal data structure they can use both in the database and in application code…. Protocol Buffer columns are more natural and reduce semantic complexity for users, who can now read and write their logical business objects as atomic units, without having to think about materializing them using joins across several tables.

This is something that pretty much any store can already adopt. Go protobufs. (or Avro, etc.)

Also, I find this really neat, and I hope this idea is implemented elsewhere soon: asynchronous schema updates:

Schema changes are applied asynchronously on multiple F1 servers. Anomalies are prevented by the use of a schema leasing mechanism with support for only current and next schema versions; and by subdividing schema changes into multiple phases where consecutive pairs of changes are mutually compatible and cannot cause anomalies.
schema  sql  f1  google  papers  orm  protobuf 
january 2015 by jm
Rails migrations with no downtime
Ugh, Rails fail. It is impossible to drop a column from a Rails-managed table without downtime, even if nothing in the code accesses it (!!), without ugly hacks that don't even seem to work on recent versions of ActiveRecord.
activerecord  deploy  migrations  rails  ruby  sql  fail  downtime 
november 2014 by jm
Validate SQL queries at compile-time in Rust
The sql! macro will validate that its string literal argument parses as a valid Postgres query.

Based on , which links the PostgreSQL server code directly into a C extension. Mad stuff, Ted!

(via Rob Clancy)
macros  postgres  compile  validation  sql  rust  coding 
october 2014 by jm
173 million 2013 NYC taxi rides shared on BigQuery : bigquery
Interesting! (a) there's a subreddit for Google BigQuery, with links to interesting data sets, like this one; (b) the entire 173-million-row dataset for NYC taxi rides in 2013 is available for querying; and (c) the tip percentage histogram is cool.
datasets  bigquery  sql  google  nyc  new-york  taxis  data  big-data  histograms  tipping 
july 2014 by jm
How to make breaking changes and not break all the things
Well-written description of the "several backward-compatible changes" approach to breaking-change schema migration (via Marc)
databases  coding  compatibility  migration  schemas  sql  continuous-deployment 
june 2014 by jm
Database Migrations Done Right
The rule is simple. You should never tie database migrations to application deploys or vice versa. By minimising dependencies you enable faster, easier and cleaner deployments.

A solid description of why this is a good idea, from an ex-Guardian dev.
migrations  database  sql  mysql  postgres  deployment  ops  dependencies  loose-coupling 
may 2014 by jm
We're sending out the wrong signals in bid to lure the big data bucks -
Simon McGarr on Ireland's looming data-protection train-crash.
Last week, during the debate of his proposals to increase fees for making a Freedom of Information request, Brendan Howlin was asked how one of his amendments would affect citizens looking for data from the State's electronic databases. His reply was to cheerfully admit he didn't even understand the question. "I have no idea what an SQL code is. Does anyone know what an SQL code is?"

Unlike the minister, it probably isn't your job to know that SQL is the computer language that underpins the data industry. The amendment he had originally proposed would have effectively allowed civil servants to pretend that their computer files were made of paper when deciding whether a request was reasonable. His answer showed how the Government could have proposed such an absurd idea in the first place.

Like it or not – fair or not – these are not the signals a country that wanted to build a long-term data industry would choose to send out. They are the sort of signals that Ireland used to send out about Financial Regulation. I think it's agreed, that approach didn't work out so well.
foi  ireland  brendan-howlin  technology  illiteracy  sql  civil-service  government  data-protection  privacy  regulation  dpa 
december 2013 by jm
Presto: Interacting with petabytes of data at Facebook
Presto has become a major interactive system for the company’s data warehouse. It is deployed in multiple geographical regions and we have successfully scaled a single cluster to 1,000 nodes. The system is actively used by over a thousand employees,who run more than 30,000 queries processing one petabyte daily.

Presto is 10x better than Hive/MapReduce in terms of CPU efficiency and latency for most queries at Facebook. It currently supports a large subset of ANSI SQL, including joins, left/right outer joins, subqueries,and most of the common aggregate and scalar functions, including approximate distinct counts (using HyperLogLog) and approximate percentiles (based on quantile digest). The main restrictions at this stage are a size limitation on the join tables and cardinality of unique keys/groups. The system also lacks the ability to write output data back to tables (currently query results are streamed to the client).
facebook  hadoop  hdfs  open-source  java  sql  hive  map-reduce  querying  olap 
november 2013 by jm
Google swaps out MySQL, moves to MariaDB
When we asked Sallner to quantify the scale of the migration he said, "They're moving it all. Everything they have. All of the MySQL servers are moving to MariaDB, as far as I understand."

By moving to MariaDB, Google can free itself of any dependence on technology dictated by Oracle – a company whose motivations are unclear, and whose track record for working with the wider technology community is dicey, to say the least. Oracle has controlled MySQL since its acquisition of Sun in 2010, and the key InnoDB storage engine since it got ahold of Innobase in 2005.

[...] We asked Cole why Google would shift from MySQL to MariaDB, and what the key technical differences between the systems were. "From my perspective, they're more or less equivalent other than if you look at specific features and how they implement them," Cole said, speaking in a personal capacity and not on behalf of Google. "Ideologically there are lots of differences."

So -- AWS, when will RDS offer MariaDB as an option?
google  mysql  mariadb  sql  open-source  licensing  databases  storage  innodb  oracle 
september 2013 by jm
Cloudera Impala 1.0: It’s Here, It’s Real, It’s Already the Standard for SQL on Hadoop
we are proud to announce the first production drop of Impala, which reflects feedback from across the user community based on multiple types of real-world workloads. Just as a refresher, the main design principle behind Impala is complete integration with the Hadoop platform (jointly utilizing a single pool of storage, metadata model, security framework, and set of system resources). This integration allows Impala users to take advantage of the time-tested cost, flexibility, and scale advantages of Hadoop for interactive SQL queries, and makes SQL a first-class Hadoop citizen alongside MapReduce and other frameworks. The net result is that all your data becomes available for interactive analysis simultaneously with all other types of processing, with no ETL delays needed.

Along with some great benchmark numbers against Hive. nifty stuff
cloudera  impala  sql  querying  etl  olap  hadoop  analytics  business-intelligence  reports 
may 2013 by jm
The Bw-Tree: A B-tree for New Hardware - Microsoft Research
The emergence of new hardware and platforms has led to reconsideration of how data management systems are designed. However, certain basic functions such as key indexed access to records remain essential. While we exploit the common architectural layering of prior systems, we make radically new design decisions about each layer. Our new form of B tree, called the Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. Our storage manager uses a unique form of log structuring that blurs the distinction between a page and a record store and works well with flash storage. This paper describes the architecture and algorithms for the Bw-tree, focusing on the main memory aspects. The paper includes results of our experiments that demonstrate that this fresh approach produces outstanding performance.
bw-trees  database  paper  toread  research  algorithms  microsoft  sql  sql-server  b-trees  data-structures  storage  cache-friendly  mechanical-sympathy 
april 2013 by jm
Data Corruption To Go: The Perils Of sql_mode = NULL « Code as Craft
bloody hell. A load of cases where MySQL will happily accommodate all sorts of malformed and invalid input -- thankfully with fixes.

Also includes a very nifty example of Etsy tee'ing their production db traffic (30k pps in and out) via tcpdump and pt-query-digest to a test database host. Fantastic hackery
mysql  input  corrupt  invalid  validation  coding  databases  sql  testing  tcpdump  percona  pt-query-digest  tee 
march 2013 by jm
Online Schema Change for MySQL
A tool written by Facebook to ease the pain of online MySQL schema-change migrations.
Some ALTER TABLE statements take too long form the perspective of some MySQL users. The fast index create feature for the InnoDB plugin in MySQL 5.1 makes this less of an issue but this can still take minutes to hours for a large table and for some MySQL deployments that is too long.
A workaround is to perform the change on a slave first and then promote the slave to be the new master. But this requires a slave located near the master. MySQL 5.0 added support for triggers and some replication systems have been built using triggers to capture row changes. Why not use triggers for this? The openarkkit toolkit did just that with oak-online-alter-table. We have published our version of an online schema change utility (OnlineSchemaChange.php aka OSC).
facebook  mysql  sql  schema  database  migrations  ops  alter-table 
march 2013 by jm
'Splout is a scalable, open-source, easy-to-manage SQL big data view. Splout is to Hadoop + SQL what Voldemort or Elephant DB are to Hadoop + Key/Value. Splout serves a read-only, partitioned SQL view which is generated and indexed by Hadoop.'

Some FAQs: 'What's the difference between Splout SQL and Dremel-like solutions such as BigQuery, Impala or Apache Drill? Splout SQL is not a "fast analytics" Dremel-like engine. It is more thought to be used for serving datasets under web / mobile high-throughput, many lookups, low-latency applications. Splout SQL is more like a NoSQL database in the sense that it has been thought for answering queries under sub-second latencies. It has been thought for performing queries that impact a very small subset of the data, not queries that analyze the whole dataset at once.'
splout  sql  big-data  hadoop  read-only  scaling  queries  analytics 
february 2013 by jm
Why did annon attack the FG website? : ireland
all signs point to 'they didn't.'  also, interesting comment in the Reddit thread: 'From a source close to the situation; the forms [on the FG site] were not being sanitised [against SQL injection attacks] at all.'  incredibly amateurish, if true
reddit  anonymous  4chan  hacks  fine-gael  fghack  ireland  politics  security  sql  exploits  from delicious
january 2011 by jm

related tags

4chan  activerecord  algorithms  alter-table  amazon  analytics  anonymous  architecture  aurora  aws  b-trees  baron-schwartz  big-data  bigquery  brendan-howlin  business-intelligence  bw-trees  cache-friendly  civil-service  clickhouse  cloudera  coding  compatibility  compile  continuous-deployment  corrupt  data  data-protection  data-structures  database  database-is-not-a-queue  databases  datasets  ddl  dependencies  deploy  deployment  downtime  dpa  dropwizard  ec2  etl  exploits  f1  facebook  fail  fghack  fine-gael  foi  go  google  government  hacks  hadoop  hdfs  histograms  hive  illiteracy  impala  innodb  input  instrumentation  invalid  ireland  java  jdbc  joins  kafka  lhm  licensing  llvm  locking  logging  loose-coupling  macros  map-reduce  mariadb  mechanical-sympathy  messaging  metrics  microsoft  migration  migrations  mysql  new-york  nyc  olap  open-source  ops  oracle  orm  outages  paper  papers  percona  performance  pipelinedb  politics  postgres  postgresql  postmortems  privacy  protobuf  pt-query-digest  queries  querying  queue  queueing  rails  rdbms  rds  read-only  reddit  redshift  reference  regulation  releasing  replication  reports  research  ruby  rust  s3  scalability  scaling  schema  schemas  search  security  sliding-window  soundcloud  splout  sql  sql-server  square  storage  streaming  taxis  tcpdump  technology  tee  testing  time-series  tipping  toread  transactions  tsd  uber  validation  yandex 

Copy this bookmark: