jm + pid-controllers   1

Tuning Spark Back Pressure by Simulation
Interesting, Spark uses a PID controller algorithm to manage backpressure:
Spark back pressure, which can be enabled by setting spark.streaming.backpressure.enabled=true, will dynamically resize batches so as to avoid queue build up. It is implemented using a Proportional Integral Derivative (PID) algorithm. This algorithm has some interesting properties, including the lack of guarantee of a stable fixed point. This can manifest itself not just in transient overshoot, but in a batch size oscillating around a (potentially optimal) constant throughput. The overshoot incurs latency; the undershoot costs throughput. Catastrophic overshoot leading to OOM is possible in degenerate circumstances (you need to choose the parameters quite deviously to cause this to happen). Having witnessed undershoot and slow recovery in production streaming jobs, I decided to investigate further by testing the algorithm with a simulator.
backpressure  streaming  queueing  pid-controllers  algorithms  congestion-control 
25 days ago by jm

Copy this bookmark:



description:


tags: