Google - Site Reliability Engineering
A great guide to getting started making prod engineering tractible and fun. Talks about concerns, desired outcomes well ...
Jaeger: open source, end-to-end distributed tracing
Why Jaeger?

As on-the-ground microservice practitioners are quickly realizing, the majority of operational problems that arise when moving to a distributed architecture are ultimately grounded in two areas: networking and observability. It is simply an orders of magnitude larger problem to network and debug a set of intertwined distributed services versus a single monolithic application.
debugging  logging  monitoring  tracing  distributed  observability  cloud 
