big_data   4139

« earlier    

Democratizing Data at Airbnb – Airbnb Engineering & Data Science – Medium
Like many startups, the number of employees at Airbnb has grown significantly over the past several years. In parallel we have seen explosive growth in both the amount of data and the number of…
airbnb  chris_williams  data  analytics  big_data  data_science 
10 days ago by mreinbold
cgarciae/pypeln: Concurrent data pipelines made easy

Pypeline is a simple yet powerful python library for creating concurrent data pipelines.

- Pypeline was designed to solve simple medium data tasks that require concurrency and parallelism but where using frameworks like Spark or Dask feel exaggerated or unnatural.
- Pypeline exposes an easy to use, familiar, functional API.
- Pypeline enables you to build pipelines using Processes, Threads and asyncio.Tasks via the exact same API.
- Pypeline allows you to have control over the memory and cpu resources used at each stage of your pipeline.
via:flav  is:repo  data  data_engineering  python  data_pipeline  big_data  async 
21 days ago by andrewsardone

« earlier    

related tags

advertising  affordances  agriculture  ai  airbnb  airbus  alondra_nelson  alyssa_goodman  analysis  analytics  andrew_connolly  apache_beam  apple  archaeology  architecture  article  artificial_intelligence  arvind_narayanan  astronomy  async  aws  aws_security  barbara_koenig  batch_processing  bianca_wylie  blockchain  bologna  business  california  cars  challenge  chart  cheatsheet  chris_williams  cleaning  cloud  cmu  company  copernicus  credit_scoring  crisis  critique  culture  danah_boyd  dashboards  data  data_analysis  data_engineering  data_ethics  data_journalism  data_pipeline  data_privacy  data_processing  data_science  data_visualisation  data_visualization  database  datascience  dataset  daten  deep-learning  deep_learning  design  dias  distributed  dna  dnn  education  emily_keller  esa  ethics  european_commission  fairness  farming  for_friends  forecasting  fpga  frank_pasquale  genomics  geodata  geology  geospatial  germany  google  google_dataflow  gpu  grim_meathook_future  hacking  hadoop  hardware  health  hpc  humanitarian  ide  images  information  internet  internet_of_things  iot  is:repo  israel  italy  jacob_metcalf  java  journalismus  kate_crawford  lambda_architecture  linguistics  lsst  machine_learning  management  mapreduce  marketing  mathematics  matthew_zook  medicine  memory  methodology  mitsmr  ml  mobile  my_work  navigation  netpolicynotes  network_mapping  networking  netz  open_source  opensource  optimization  ownership  palantir  paleobiology  paper  papers  performance  physics  privacy  python  rachelle_hollander  reporting  research  resource  robotics  s3  satellite  security  seeta_pena_gangadharan  sharing  sidewalk_labs  siemens  singapore  smart_cities  society  software  solon_barocas  sql  statistics  storage  stream_processing  surveillance  surveillance_economy  swisscom  tech  technology  ted  testomonial  to_read  tools  translation  translation_industry  trusted_computing  ucberkeley  urban_planning  usa  video  visualization  weather  whitepaper  wind  zero_copy  ★★★★☆  ★★★☆☆ 

Copy this bookmark: