Fast ETL in Python. Ideas? : dataengineering
I’m working on an etl Pipeline that feeds a bunch on ML models. At the moment we extract data from a few sql dbs, do some feature extraction and...
pandas  python  etl  discussion  sql  airflow  elt  spark  datascience 
4 days ago by cothrun
Python package for handling messy CSV files using ML.
csv  python  deep-learning  etl 
7 days ago by mjlassila
GitHub - alan-turing-institute/CleverCSV: CleverCSV is a Python package for handling messy CSV files
CleverCSV provides a drop-in replacement for the Python csv package with improved dialect detection for messy CSV files. It also provides a handy command line tool that can standardize a messy file or generate Python code to import it.
python  csv  data  ETL 
9 days ago by euler
How to Use Broadway in Your Elixir Application | AppSignal Blog
What Is Broadway and When Should You Use It in Your Elixir App?
elixir  data  broadway  etl 
4 weeks ago by forestinsb
Streaming ETL With Apache Flink - Part 1 - DZone Big Data
In this article, we discuss how to perform streaming ETL with Apache Flink in order to better manage and process data for real-time (near real-time) analysis.
flink  etl  data 
4 weeks ago by nunao
Prefect | Dataflow Automation
Orchestration of data pipelines. An alternative to Airflow and maybe a better option.
etl  elt  data  pipelines  engineering  orchestration 
4 weeks ago by ijy

