jm + scheduling   17

Scheduled Tasks (cron) - Amazon EC2 Container Service
ECS now does cron jobs. But where does AWS Batch fit in? confusing
aws  batch  ecs  cron  scheduling  recurrence  ops 
15 days ago by jm
Apache Kafka, Purgatory, and Hierarchical Timing Wheels
In the new design, we use Hierarchical Timing Wheels for the timeout timer and DelayQueue of timer buckets to advance the clock on demand. Completed requests are removed from the timer queue immediately with O(1) cost. The buckets remain in the delay queue, however, the number of buckets is bounded. And, in a healthy system, most of the requests are satisfied before timeout, and many of the buckets become empty before pulled out of the delay queue. Thus, the timer should rarely have the buckets of the lower interval. The advantage of this design is that the number of requests in the timer queue is the number of pending requests exactly at any time. This allows us to estimate the number of requests need to be purged. We can avoid unnecessary purge operation of the watcher lists. As the result we achieve a higher scalability in terms of request rate with much better CPU usage.
algorithms  timers  kafka  scheduling  timing-wheels  delayqueue  queueing 
october 2015 by jm
Call me Maybe: Chronos
Chronos (the Mesos distributed scheduler) comes out looking pretty crappy here
aphyr  mesos  chronos  cron  scheduling  outages  ops  jepsen  testing  partitions  cap 
august 2015 by jm
danilop/runjop · GitHub
RunJOP (Run Just Once Please) is a distributed execution framework to run a command (i.e. a job) only once in a group of servers [built using AWS DynamoDB and S3].

nifty! Distributed cron is pretty easy when you've got Dynamo doing the heavy lifting.
dynamodb  cron  distributed-cron  scheduling  runjop  danilop  hacks  aws  ops 
july 2015 by jm
A higher order estimate of the optimum checkpoint interval for restart dumps
the bottom line is as follows:
If the time it takes to create a dump, δ < M/2 then use τopt = √(2δM) – δ
Otherwise (it takes longer than M/2 to create a dump), just use τopt = M.
dumping  periodic-tasks  scheduling  frequency  maths  optimal  interval  checkpointing 
june 2015 by jm
Airbnb's workflow management system; works off a DAG defined in Python code (ugh). Nice UI though, but I think Pinboard's take is neater
airbnb  open-source  python  workflow  jobs  cron  scheduling  batch 
june 2015 by jm
Pinterest's Hadoop workflow manager; 'scalable, reliable, simple, extensible' apparently. Hopefully it allows upgrades of a workflow component without breaking an existing run in progress, like LinkedIn's Azkaban does :(
python  pinterest  hadoop  workflows  ops  pinball  big-data  scheduling 
april 2015 by jm
_Blade: a Data Center Garbage Collector_
Essentially, add a central GC scheduler to improve tail latencies in a cluster, by taking instances out of the pool to perform slow GC activity instead of letting them impact live operations. I've been toying with this idea for a while, nice to see a solid paper about it
gc  latency  tail-latencies  papers  blade  go  java  scheduling  clustering  load-balancing  low-latency  performance 
april 2015 by jm
Microservices and elastic resource pools with Amazon EC2 Container Service
interesting approach to working around ECS' shortcomings -- bit specific to Hailo's microservices arch and IPC mechanism though.

aside: I like their version numbering scheme: ISO-8601, YYYYMMDDHHMMSS. keep it simple!
versioning  microservices  hailo  aws  ec2  ecs  docker  containers  scheduling  allocation  deployment  provisioning  qos 
april 2015 by jm
'Join-Idle-Queue: A Novel Load Balancing Algorithm for Dynamically Scalable Web Services' [paper]
We proposed the JIQ algorithms for web server farms that are dynamically scalable. The JIQ algorithms significantly outperform the state-of-the-art SQ(d) algorithm in terms of response time at the servers, while incurring no communication overhead on the critical path. The overall complexity of JIQ is no greater than that of SQ(d).

The extension of the JIQ algorithms proves to be useful at very high load. It will be interesting to acquire a better understanding of the algorithm with a varying reporting threshold. We would also like to understand better the relationship of the reporting frequency to response times, as well as an algorithm to further reduce the complexity of the JIQ-SQ(2) algorithm while maintaining its superior performance.
join-idle-queue  algorithms  scheduling  load-balancing  via:norman-maurer  jiq  microsoft  load-balancers  performance 
august 2014 by jm
Calendar Hacks
Some great tips on managing a busy calendar, from Etsy's managers. Block out time; refuse double-booked meetings by default; rely on apps; office hours. Thankfully I have a pretty slim calendar these days, but bookmarking for future use...
calendar  etsy  via:kellan  google  google-calendar  office-hours  life-hacks  hacks  tips  managing  managers  scheduling 
july 2014 by jm
A really excellent-looking workflow/orchestration engine for Hadoop, Pig, Hive, Redshift and other ETL jobs, featuring inter-job dependencies, cron-like scheduling, and failure handling. Open source, from Spotify
workflow  orchestration  scheduling  cron  spotify  open-source  luigi  redshift  pig  hive  hadoop  emr  jobs  make  dependencies 
july 2014 by jm
Kanban for MDN development
Mozilla's experience with Kanban. We've had good results in Amazon, too. good intro links in this post -- might start talking about it in Swrve...
kanban  scheduling  team  agile  mozilla 
may 2013 by jm
Introducing Chronos: A Replacement for Cron
A distributed, fault-tolerant "cron" is something which comes up frequently -- it makes for a great fault-tolerance building block. This one sounds like it's too closely tied into Mesos, though (IMO).
Chronos is our replacement for cron. It is a distributed and fault-tolerant scheduler which runs on top of Mesos. It's a framework and supports custom mesos executors as well as the default command executor. Thus by default, Chronos executes SH (on most systems BASH) scripts. Chronos can be used to interact with systems such as Hadoop (incl. EMR), even if the mesos slaves on which execution happens do not have Hadoop installed. Included wrapper scripts allow transfering files and executing them on a remote machine in the background and using asynchroneous callbacks to notify Chronos of job completion or failures.
cron  scheduling  mesos  stacks  design  airbnb  chronos  fault-tolerance  distcomp  distributed-computing  scripts  jobs 
march 2013 by jm
Colm argues against the 'sleep rand % 3600' hack
it's not sufficiently evenly-distributed, apparently. Also: got linked from Hack The Planet!
scheduling  probability  sleep  unix  updating  cron  random  from delicious
september 2009 by jm

related tags

agile  airbnb  airflow  algorithms  allocation  aphyr  aws  batch  big-data  blade  calendar  cap  checkpointing  chronos  clustering  containers  cron  danilop  delayqueue  dependencies  deployment  design  distcomp  distributed-computing  distributed-cron  distributed-systems  docker  dumping  dynamodb  ec2  ecs  emr  etsy  fault-tolerance  frequency  gc  go  google  google-calendar  hacks  hadoop  hailo  hive  interval  java  jepsen  jiq  jobs  join-idle-queue  kafka  kanban  latency  life-hacks  load-balancers  load-balancing  low-latency  luigi  make  managers  managing  maths  mesos  microservices  microsoft  mozilla  office-hours  open-source  ops  optimal  orchestration  outages  papers  partitions  performance  periodic-tasks  pig  pinball  pinterest  presentations  probability  provisioning  python  qos  queueing  random  recurrence  redshift  runjop  scheduler  scheduling  scripts  sleep  sparrow  spotify  stacks  tail-latencies  team  testing  timers  timing-wheels  tips  unix  updating  versioning  via:kellan  via:norman-maurer  workflow  workflows 

Copy this bookmark: