jm + etl   8

Declarative Airflow Workflows in YAML, from Etsy
airflow  python  batch  cron  etl 
13 days ago by jm
Best practices with Airflow
interesting presentation describing how to architect Airflow ETL setups; see also
etl  airflow  batch  architecture  systems  ops 
october 2016 by jm
Load data into Redshift from S3 buckets using a pre-canned Lambda function. Looks like it may be a good example of production-quality Lambda
lambda  aws  ec2  redshift  s3  loaders  etl  pipeline 
may 2015 by jm
Continuous Delivery with ETL Systems [video]
Lonely Planet and Dr Foster Intelligence both make heavy use of ETL in their products, and both organisations have applied the principles of Continuous Delivery to their delivery process.
Some of the Continuous Delivery norms need to be adapted in the context of ETL, and some interesting patterns emerge, such as running Continuous Integration against data, as well as code.
etl  video  presentations  lonely-planet  dr-foster-intelligence  continuous-delivery  deployment  pipelines 
march 2014 by jm
ETL for America
This is a really good post on governmental computing, open data, and so on:
The fact that I can go months hearing about "open data" without a single mention of ETL is a problem. ETL is the pipes of your house: it's how you open data.
civic  open-data  government  etl  data-pipeline  tech  via:timoreilly 
march 2014 by jm
Cloudera Impala 1.0: It’s Here, It’s Real, It’s Already the Standard for SQL on Hadoop
we are proud to announce the first production drop of Impala, which reflects feedback from across the user community based on multiple types of real-world workloads. Just as a refresher, the main design principle behind Impala is complete integration with the Hadoop platform (jointly utilizing a single pool of storage, metadata model, security framework, and set of system resources). This integration allows Impala users to take advantage of the time-tested cost, flexibility, and scale advantages of Hadoop for interactive SQL queries, and makes SQL a first-class Hadoop citizen alongside MapReduce and other frameworks. The net result is that all your data becomes available for interactive analysis simultaneously with all other types of processing, with no ETL delays needed.

Along with some great benchmark numbers against Hive. nifty stuff
cloudera  impala  sql  querying  etl  olap  hadoop  analytics  business-intelligence  reports 
may 2013 by jm

Copy this bookmark: