observability   181

« earlier    

Reading Up on Observability and Monitoring – Adron Hall
“key in understanding the difference in monitoring — the combing of data to determine the state or well-being of a system — versus observability — the view into and understanding of the state of events within a system.”
monitoring  itmanagement  observability 
15 days ago by cote
Monitoring in the time of Cloud Native – Cindy Sridharan – Medium
Observability being about getting required information so that we can reactively. Section on tracing is good.

Logs - Large storage, potentially affect performance, not necessarily reliable storage.
Metrics - Less storage, easily processible to judge system health, alert. Usually system not request level.
Trace - As request flows through add meta data. Usually sampled.

"Application developers now have one job. We’re at a time when it has never been easier for application developers to focus on just making their service more robust and trust that if they do so, then the open source software they are building on top of will pay the concomitant dividends."

"We “monitored” something because we expected something to behave a certain way. What’s worse, we expected something to fail in a very specific manner and wanted to keep tabs on this specific failure. An “explicit, predictable failure” centric approach to monitoring becomes a problem when the number of failure modes both increases and failure itself becomes more implicit."

"Opting in to the model of embracing failure entails designing our services to behave gracefully in the face of failure. In other words, this means turning hard, explicit failure modes into partial, implicit and soft failure modes. Failure modes that could be papered over with graceful degradation mechanisms like retries, timeouts, circuit breaking and rate limiting. Failure modes that can be tolerated owing to relaxed consistency guarantees with mechanisms like eventual consistency or aggressive multi-tiered caching. Failure modes that can be even triggered deliberately with load shedding in the event of increased load that has the potential to take down our service entirely, thereby operating in a degraded state.

But all of this comes at the cost of increased overall complexity and the buyer’s remorse often acutely felt is the loss of ability to easily reason about systems."

"Now I’m not someone who believes that automating everything is a panacea, but the advent of platforms like Kubernetes means that several of the problems that human and failure centric monitoring tools of yore helped “monitor” are already solved. Health-checking, load balancing and taking failed services out of rotation and so forth are features these platforms provide for free. That’s their primary value prop."

" An observable system is one that exposes enough data about itself so that generating information (finding answers to questions yet to be formulated) and easily accessing this information becomes simple."

"I see both traces and metrics as an abstraction built on top of logs that pre-process and encode information along two orthogonal axes, one being request centric, the other being system centric."

"Most importantly, having an understanding of the entire request lifecycle makes it possible to debug requests spanning multiple services to pinpoint the source of increased response time or resource utilization. As such, traces largely help one understand the which and sometimes even the why — like which component of a system is even touched during the lifecycle of a request and is slowing the response?"

"The second problem with tracing instrumentation is that it’s not sufficient for developers to instrument their code. A large number of applications in the wild are built using open source frameworks or libraries which might require additional instrumentation. This becomes all the more challenging at places with polyglot architectures, since every language, framework and wire protocol with widely disparate concurrency patterns and guarantees need to cooperate. Indeed, tracing is most successfully deployed in organizations where there are a core set of languages and frameworks used uniformly across the company."
cloud  Monitoring  Observability  Microservice 
20 days ago by colin.jack
A single distribution of libraries that automatically collects traces and metrics from your app, displays them locally, and sends them to any analysis tool.
microservices  monitoring  observability  tracing 
4 weeks ago by webframp
A single distribution of libraries that automatically collects traces and metrics from your app, displays them locally, and sends them to any analysis tool.
monitoring  observability 
4 weeks ago by mpm

« earlier    

related tags

acm  agile  alert  amazon  analytics  apm  apple  architecture  architectute  article  automation  aws  bcantrill  blackbox  charity-majors  charity  charitymajors  chef  cloud  cloudnative  codahale  containers  context  controllers  customers  dashboards  data  debugging  development  devops  distributed-systems  distributed  distributedsystems  docker  dtrace  education  engineering  envoy  erlang  event  facebook  firmware  go  golang  google  grafana  hardware  high-scalability  honeycomb  important  insight  instana  instrumentation  intro  introduction  istio  itmanagement  l4  lambda  loadbalancing  logging  logs  majors  management  mesh  metrics  micoservice  microservice  microservices  mongo  monitoring  monolithic  monthly  mysql  network  nodejs  notifs  opencensus  openshift  opentracing  operations  ops  paper  pdf  performance  philosophy  pipeline  postwait  presentation  profiling  programming  prometheus  reckons  red  redmonk  refocus  research  resilience  rootcause  salesforce  scaling  screencast  serverless  service  sidecar  signal  signals  slides  soa  solid  sre  strangeloop  sysadmin  teams  terms  tls  tools  topline  trace  tracing  turnbull  tweet  video  wham  whitebox  zerbra  zipkin 

Copy this bookmark: