s3   26541

« earlier    

Apache Iceberg (incubating)
Coming to presto soon apparently....
Iceberg tracks individual data files in a table instead of directories. This allows writers to create data files in-place and only adds files to the table in an explicit commit.

Table state is maintained in metadata files. All changes to table state create a new metadata file and replace the old metadata with an atomic operation. The table metadata file tracks the table schema, partitioning config, other properties, and snapshots of the table contents.

The atomic transitions from one table metadata file to the next provide snapshot isolation. Readers use the latest table state (snapshot) that was current when they load the table metadata and are not affected by changes until they refresh and pick up a new metadata location.


excellent -- this will let me obsolete so much of our own code :)
presto  storage  s3  hive  iceberg  apache  asf  data  architecture 
yesterday by jm
The Time I Got Drunk On S3 And What I Learned – codeburst
These are lessons that I have learned while using Simple Storage Service (S3) for a variety of use cases. I do not consider any of these lessons as problems with S3. Rather they are misunderstandings or mistakes in my thinking. I am proud of learning these lessons. The only reason I learned them was by leveraging S3 to solve problems I was facing. This is the best way to learn any service in Amazon Web Services.
AWS  S3  tips 
3 days ago by euler
Twitter
Another awesome , admittedly inspired by requests, for file upload to without usin…
S3  AWS  Python  API  minimalist  from twitter_favs
7 days ago by jamescampbell
Zenko - Multi-Cloud Data Controller - manage your data without cloud lock-in
Zenko is infrastructure software for CIOs, DevOps, Data Managers to control Data in Multi-Cloud IT Environments
automation  cloud  s3  docker 
8 days ago by jrisch
Has anyone done the costs math on S3 vs. DynamoDB for "small" (json) objects? And, under which circumstances would you have to use one over the other? : aws
Storage in dynamo is $0.25 / month, 10x more than standard s3. However, since they are small objects this will likely be negligible.

The driver for costs in my experience going to be read/write requests. If you do a consistent 10 writes per second, s3 will cost you: ($0.01 / 1000 writes) * (10 writes / s) * 1 hour = $0.36

Dynamodb on the other hand, is $0.0065/hour for 10 writes/s, or ~50x cheaper.

Theres definitely a big difference, as you pay for "writes" you dont use in dynamo. So if you have very bursty throughput, you might be better off on s3, but might also consider scaling the table up and down. If you have sustained throughput, you're probably better off on dynamodb.

The other big one is request time. Dynamo has a nice batching api. With small files, your response time is largely dominated by http headers. If you can batch 10 reads/writes into one dynamodb call, you almost 10x performance(but no difference in price). You can somewhat get around this in either system by making something highly parallelized, as if you can throw enough threads at it, either system should be able to keep up.

It's also worth checking, I'm pretty sure that s3's availability guarantees are worse, but durability is better.

Basically, if you have sustained throughput use dynamodb. I worked on a system that was costing over $10k/month in small, json s3 put requests. moving it to dynamo dropped it by a ton.
AWS  S3  DynamoDB  code  json 
9 days ago by activescott
forward3d/alpinist: Automatic Alpine Linux Package (apk) Repository Generation using AWS Lambda, S3 & SSM Parameter Store
This project provides you with an python AWS Lambda function that is capable of automatically creating a signed Alpine Repository whenever a new Alpine Package is uploaded into an S3 bucket.
alpinelinux  package  repository  apk  aws  s3 
11 days ago by bfritz

« earlier    

related tags

300  access  alpinelinux  amazon  android  apache  api  apk  architecture  artifactory  asf  assets  athena  automation  aws  aws_security  backup  batch  benchmark  bestpractice  bonding  boto3  bottom!snape  boxes  cd  ci  class  cloud  cloudflare  cloudfront  code  codenewbie  compliance  container  cool  cost  data  data_analytics  data_lake  dce  deployment  development  digitalocean  docker  drama  dropbox  dynamodb  ec2  efs  email  encryption  entr  family  fandom:hp  father!snape  file  filesystem  first-time  flickr  freeware  glacier  googledrive  h/c  harry&snape  harry/snape  hdfs  helm  hive  hosted-s3  hosting  howtos  iam  iceberg  image  images  inotify  ipfs  jekyll  json  kubernetes  lambda  linux  long  lucene  mentor!snape  minimalist  mysql  oai  object-store  object_storage  onedrive  opensource  package  pelican  permission  plinth  plugin  policy  postgres  presto  private  programming  python  rails  rantcher  rating:gen  rating:nc17  rclone  remote  repository  route53  rsync  security  serverless  sftp  short  site  smut  software  spa  spaces  spark  sqlserver  squid  ssl  static  storage  swarm  sync  sysadmin  tips  todo  tool  toolkit  tools  top!harry  tricks  underage  upload  vagrant-s3auth  vagrant  versioned  vs  web  websites  youtube 

Copy this bookmark:



description:


tags: