ETL   3096

« earlier    

Fast ETL in Python. Ideas? : dataengineering
I’m working on an etl Pipeline that feeds a bunch on ML models. At the moment we extract data from a few sql dbs, do some feature extraction and...
pandas  python  etl  discussion  sql  airflow  elt  spark  datascience 
4 days ago by cothrun
Python package for handling messy CSV files using ML.
csv  python  deep-learning  etl 
7 days ago by mjlassila
GitHub - alan-turing-institute/CleverCSV: CleverCSV is a Python package for handling messy CSV files
CleverCSV provides a drop-in replacement for the Python csv package with improved dialect detection for messy CSV files. It also provides a handy command line tool that can standardize a messy file or generate Python code to import it.
python  csv  data  ETL 
9 days ago by euler
How to Use Broadway in Your Elixir Application | AppSignal Blog
What Is Broadway and When Should You Use It in Your Elixir App?
elixir  data  broadway  etl 
4 weeks ago by forestinsb
Streaming ETL With Apache Flink - Part 1 - DZone Big Data
In this article, we discuss how to perform streaming ETL with Apache Flink in order to better manage and process data for real-time (near real-time) analysis.
flink  etl  data 
4 weeks ago by nunao
Prefect | Dataflow Automation
Orchestration of data pipelines. An alternative to Airflow and maybe a better option.
etl  elt  data  pipelines  engineering  orchestration 
4 weeks ago by ijy

« earlier    

related tags

#branding-yellow  #project-memex-ingester  &  (65w  (day  5000k  550lm  9w  airflow  airtight  amazon  analysis  analytics  apache  api  architecture  assessment  automation  aws  azure  bestpractice  bi  big-data  bigdata  bigquery  biodata  broadway  can-less  cleansing  cli  clojure  cloud  config  configuration  connectors  consulting  cron  csv  cue  dag  dashboards  data-engineering  data-lake  data-proc  data-processing  data-science  data-transformation  data-warehouse  data  data_engineering  data_warehouse  database  databases  dataingest  datapipeline  datascience  datawarehouse  db  deep-learning  deeplearning  development  devops  dimmable  discussion  documentation  downlight  dynamodb  edtech  elixir  elt  energy  engineering  eqv.)  etl-comparison  etl  file  flink  framework  glue  golang  hn  ic  import  inspiration  integration  interesting  inventory  javascript  jdbc  jinja  json  kafka  lake  language  lean_startup  library  light)  marketing  migration  models  mysql  nocode  ocr  opensource  oracle  orchestration  pandas  parabola  pdf  perl  pipeline  pipelines  piplines  postgres  postgresql  powerbi  proglang  programming  project  projects  pydatatable  python  rated  redshift  s3  saas  scalability  simplicity  software  spark  spreadsheet  sql  ssis  star  stream-processing  streaming  table  tabular  tefl  tesol  testing  tool  tools  toread  transducers  transform  tsv  validation  visualization  wafer  warehouse  workflow  workstuff 

Copy this bookmark: