mpm + monitoring   92

Real-time performance monitoring, done right
22 days ago by mpm
Co-evolving Tracing and Fault Injection with Box of Pain
Distributed systems are hard to reason about largely because of uncertainty about what may go wrong in a particular execution, and about whether the system will mitigate those faults. Tools that perturb executions can help test whether a system is robust to faults, while tools that observe executions can help better understand their system-wide effects. We present Box of Pain, a tracer and fault injector for unmodified distributed systems that addresses both concerns by interposing at the system call level and dynamically reconstructing the partial order of communication events based on causal relationships. Box of Pain's lightweight approach to tracing and focus on simulating the effects of partial failures on communication rather than the failures themselves sets it apart from other tracing and fault injection systems. We present evidence of the promise of Box of Pain and its approach to lightweight observation and perturbation of distributed systems.
testing  monitoring 
8 weeks ago by mpm
A Guide To Service Level Objectives, Part 1: SLOs & You
Whether you’re a Site Reliability Engineer (SRE), developer, or executive, as a service provider you have a vested interest in (or responsibility for) ensuring system reliability. However, “system reliability” in and of itself can be a vague and subjective term that depends on the specific needs of the enterprise. So, SLOs are necessary because they define your Quality of Service (QoS) and reliability goals in concrete, measurable, objective terms.
9 weeks ago by mpm
Gain insight into resource utilization with new Linux kernel pressure metrics and related tools
monitoring  linux 
november 2018 by mpm
Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications
To ensure undisrupted business, large Internet companies need to closely monitor various KPIs (e.g., Page Views, number of online users, and number of orders) of its Web applications, to accurately detect anomalies and trigger timely troubleshooting/mitigation. However, anomaly detection for these seasonal KPIs with various patterns and data quality has been a great challenge, especially without labels. In this paper, we proposed Donut, an unsupervised anomaly detection algorithm based on VAE. Thanks to a few of our key techniques, Donut greatly outperforms a state-of-arts supervised ensemble approach and a baseline VAE approach, and its best F-scores range from 0.75 to 0.9 for the studied KPIs from a top global Internet company. We come up with a novel KDE interpretation of reconstruction for Donut, making it the first VAE-based anomaly detection algorithm with solid theoretical explanation
june 2018 by mpm
A single distribution of libraries that automatically collects traces and metrics from your app, displays them locally, and sends them to any analysis tool.
monitoring  observability 
january 2018 by mpm
libstapsdt is a library which allows creating and firing Systemtap's USDT probes at runtime. It's inspired on chrisa/libusdt. The goal of this library is to add USDT probes functionality to dynamic languages.
monitoring  observability 
november 2017 by mpm
Circus: A Process & Socket Manager
Circus is a Python program which can be used to monitor and control processes and sockets
monitoring  deployment 
august 2017 by mpm
Istio adds traffic management to microservices and creates a basis for value-add capabilities like security, monitoring, routing, connectivity management and policy. The software is built using the battle-tested Envoy proxy from Lyft, and gives visibility and control over traffic without requiring any changes to application code
discovery  monitoring  confidentiality  integrity 
may 2017 by mpm
Trace Compass
Eclipse Trace Compass is an open source application for viewing and analyzing any type of logs or traces. Its goal is to provide views, graphs, metrics, and more to help extract useful information from traces, in a way that is more user-friendly and informative than huge text dumps
monitoring  visualization 
march 2017 by mpm
Monitoring and Tuning the Linux Networking Stack: Sending Data
This blog post explains how computers running the Linux kernel send packets, as well as how to monitor and tune each component of the networking stack as packets flow from user programs to network hardware.
linux  networking  performance  monitoring 
february 2017 by mpm
Glances is a cross-platform system monitoring tool written in Python
linux  monitoring  python 
january 2017 by mpm
Monitoring and Tuning the Linux Networking Stack: Receiving Data
It is impossible to tune or monitor the Linux networking stack without reading the source code of the kernel and having a deep understanding of what exactly is happening.This blog post will hopefully serve as a reference to anyone looking to do this
linux  networking  monitoring 
january 2017 by mpm
A vendor-neutral open standard for distributed tracing.
monitoring  observability 
october 2016 by mpm
SQL powered operating system instrumentation and analytics
october 2016 by mpm
The CMX library follows similar principles as JMX (Java Management Extensions) and provides similar monitoring capabilities for C and C++ applications. It allows registering and exposing runtime information as simple counters, floating point numbers or character data. This can be subsequently used by external diagnostics tools for checking thresholds, sending alerts or trending
c++  monitoring  managability 
july 2016 by mpm
Monitoring and Tuning the Linux Networking Stack: Receiving Data
This blog post explains how computers running the Linux kernel receive packets, as well as how to monitor and tune each component of the networking stack as packets flow from the network toward userland programs
linux  networking  monitoring  performance 
june 2016 by mpm
intelsdi-x/snap: The open telemetry framework
Snap is an open telemetry framework designed to simplify the collection, processing and publishing of system data through a single API
monitoring  managability  observability 
may 2016 by mpm
Observe the execution of any software. These tools use static and dynamic tracing of both user- and kernel-level code (via kprobes, uprobes, tracepoints, and USDT).
linux  performance  monitoring 
april 2016 by mpm
netdata is a highly optimized Linux daemon providing real-time performance monitoring for Linux systems, Applications, SNMP devices, over the web!
linux  monitoring 
april 2016 by mpm
nicstat is a Solaris and Linux command-line that prints out network statistics for all network interface cards (NICs), including packets, kilobytes per second, average packet sizes and more
networking  monitoring  linux 
january 2016 by mpm
Very Fast Reservoir Sampling
In this post I will demonstrate how to do reservoir sampling orders of magnitude faster than the traditional “naive” reservoir sampling algorithm, using a fast high-fidelity approximation to the reservoir sampling-gap distribution
statistics  monitoring 
december 2015 by mpm
Why Percentiles Don’t Work the Way you Think
They’re not asking for the 99th percentile of a metric, they’re asking for a metric of 99th percentile. This is very common in systems like Graphite, and it doesn’t achieve what many people seem to think it does. This blog post explains how you might have the wrong idea™ about percentiles, the degree of the mistake (it depends), and what you can do instead
statistics  monitoring 
december 2015 by mpm
Common Trace Format
The Common Trace Format (CTF) is a binary trace format designed to be very fast to write without compromising great flexibility. It allows traces to be natively generated by any C/C++ application or system, as well as by bare-metal (hardware) components
performance  monitoring  logging 
october 2015 by mpm
The Exometer Core package allows for easy and efficient instrumentation of Erlang code, allowing crucial data on system performance to be exported to a wide variety of monitoring systems
erlang  monitoring 
march 2015 by mpm
TAG: a Tiny AGgregation service for ad-hoc sensor networks
We present the Tiny AGgregation (TAG) service for aggregation in TinyOS. TAG allows users to express simple, declarative queries and have them distributed and executed efficiently in networks of low-power, wireless sensors. We discuss various generic properties of aggregates, and show how those properties affect the performance of our in-network approach. We include a performance study demonstrating the advantages of our approach over traditional centralized, out-of-network methods, and discuss a variety of optimizations for improving the performance and fault-tolerance of the basic solution.
monitoring  metrics 
august 2013 by mpm
This tool lets you monitor I/O latency in real time. It shows disk latency in the same way as ping shows network latency
monitoring  io 
august 2013 by mpm
Passively Monitoring Network Round-Trip Times
This post describes how Boundary uses a well-known TCP mechanism to calculate round-trip times (RTTs) between any two hosts by passively monitoring TCP traffic flows, i.e., without actively launching ICMP echo requests (pings). The post is primarily an overview of this one aspect of TCP monitoring, it also outlines the mechanism we are using, and demonstrates its correctness.
monitoring  tcp  networking 
march 2013 by mpm
As its name implies, colmux is a collectl multiplexor, which allows one to collect data from multiple systems and treat it as a single data stream, essentially extending collectl's functionality to a set of hosts rather than a single one.
february 2013 by mpm
GEMS: Gossip-Enabled Monitoring Service for Scalable Heterogeneous Distributed Systems
We present experiments and analytical projections demonstrating scalability, fast response times and low resource utilization requirements, making GEMS a potent solution for resource monitoring in distributed computing
gossip  protocol  monitoring 
november 2012 by mpm
Sentry is a realtime event logging and aggregation platform
logging  monitoring 
october 2012 by mpm
Block I/O Layer Tracing using blktrace
blktrace is a really useful tool to see what I/O operations are going on inside the Linux block I/O layer
linux  io  monitoring 
october 2012 by mpm
Zipkin is a distributed tracing system that helps us gather timing data for all the disparate services at Twitter. It manages both the collection and lookup of this data through a Collector and a Query service
june 2012 by mpm
Nimrod is a metrics server purely based on log processing: hence, it doesn't affect the way you write your applications, nor has it any side effect on them
logging  monitoring  metrics 
may 2012 by mpm
Google-Wide Profiling: A Continuous Profiling Infrastructure for Data Centers
With negligible overhead, GWP provides stable, accurate profiles and a datacenter-scale tool for traditional performance analyses. Furthermore, GWP introduces novel applications of its profiles, such as application- platform affinity measurements and identification of platform-specific, microarchitectural peculiarities.
distributed  performance  monitoring 
april 2012 by mpm
tiny Erlang app that works in conjunction with statsderl in order to generate information on the Erlang VM for graphite logs.
erlang  monitoring 
april 2012 by mpm
Kibana is an open source (MIT License), browser based interface to Logstash and ElasticSearch
logging  monitoring  visualization 
april 2012 by mpm
Amon is a self-hosted, lightweight web application and server monitoring toolkit
march 2012 by mpm
Riemann aggregates events from your servers and applications with a powerful stream processing language.
monitoring  distributed 
march 2012 by mpm
Free open source self-hosted log management and exception tracking
logging  monitoring 
february 2012 by mpm
RRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data
monitoring  metrics  charts 
february 2012 by mpm
HeapAudit is a foursquare open source project designed for understanding JVM heap allocations. It is implemented as a Java agent built on top of .
java  performance  monitoring 
february 2012 by mpm
start-stop-daemon is used to control the creation and termination of system-level processes. Using one of the matching options, start-stop-daemon can be configured to find existing instances of a running process.
linux  deployment  monitoring  managability 
december 2011 by mpm
Fay: Extensible Distributed Tracing from Kernels to Clusters
Fay is a flexible platform for the efficient collection, processing, and analysis of software execution traces. Fay provides dynamic tracing through use of runtime instrumentation and distributed aggregation within machines and across clusters
distributed  testing  monitoring 
december 2011 by mpm
The goal of the Scalable Distributed Information Management System (SDIMS) project is to develop a "distributed operating systems control plane" that will serve as the backbone for a large-scale distributed services. SDIMS aggregates information about large-scale networked systems to provide detailed views of nearby information and rare events and summary views of common, global information.
distributed  monitoring  overlay 
december 2011 by mpm
Dropwizard is a Java framework for developing ops-friendly, high-performance, RESTful web services.
java  rest  monitoring  managability 
december 2011 by mpm
Using strace and lsof to track down process hangs
One common situation that I've run across as a sysadmin/devops guy is the dreaded "it's hanging and we don't know why" problem … If you follow the recipe in this post, you should be able to quickly identify and fix many cases that exhibit this behavior.
linux  testing  monitoring  managability 
october 2011 by mpm
basho_metrics is an open source Erlang library for efficient calculation of service performance metrics.
erlang  monitoring 
september 2011 by mpm
Tracks server state and statistics, allowing you to see what your server is doing. It can also send metrics to Graphite for graphing or to a file for crash forensics.
monitoring  python 
september 2011 by mpm
Magpie: online modelling and performance-aware systems
We argue that online performance modelling should be a ubiquitous operating system service and outline several uses including performance debugging, capacity planning, system tuning and anomaly detection. We describe the Magpie modelling service which collates detailed traces from multiple machines in an e-commerce site, extracts request-specific audit trails, and constructs probabilistic models of request behaviour
distributed  monitoring  performance 
september 2011 by mpm
Unlike most monitoring tools that either focus on a small set of statistics, format their output in only one way, run either interatively or as a daemon but not both, collectl tries to do it all. You can choose to monitor any of a broad set of subsystems which currently include buddyinfo, cpu, disk, inodes, infiniband, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp
linux  performance  monitoring 
august 2011 by mpm
s6 is a small suite of programs for UNIX, designed to allow process supervision (a.k.a service supervision), in the line of daemontools and runit
monitoring  linux  managability  init 
july 2011 by mpm
Runit for Ruby (And Everything Else)
The Runit system is a process supervisor and set of utilities surrounding supervising processes.
deployment  Linux  Unix  monitoring  logging 
may 2011 by mpm
emetric creates a high level view of the resources that a running erlang system consumes over time. This is useful for long running stress tests
erlang  monitoring  testing 
march 2011 by mpm
Logster is a utility for reading log files and generating metrics in Graphite or Ganglia. It is ideal for visualizing trends of events that are occurring in your application/system/error logs
monitoring  logging 
march 2011 by mpm
Yconalyzer is a low-overhead pcap utility that provides a bird's eye view of traffic on a particular TCP port, displaying a distribution of duration, volume and throughput over all connections while being able to narrow down to a connection as well
networking  monitoring 
march 2011 by mpm
provides per task, per CPU and per-workload counters, counter groups, and it provides sampling capabilities on top of those - and more
linux  monitoring  performance 
december 2010 by mpm
Iperf was developed by NLANR/DAST as a modern alternative for measuring maximum TCP and UDP bandwidth performance. Iperf allows the tuning of various parameters and UDP characteristics. Iperf reports bandwidth, delay jitter, datagram loss
networking  testing  monitoring 
november 2010 by mpm
Bcfg2 helps system administrators produce a consistent, reproducible, and verifiable description of their environment, and offers visualization and reporting tools to aid in day-to-day administrative tasks
maintainability  monitoring 
november 2010 by mpm
OpenTSDB is a distributed, scalable Time Series DataBase (TSDB) written on top of HBase
monitoring  distributed 
october 2010 by mpm
an obscure kernel feature to get more info about dying processes
I stumbled upon a code path in the Linux kernel which allows external programs to be launched when a core dump is about to happen
linux  monitoring 
september 2010 by mpm
daemons and utilities to reliably start, monitor, log, and control a collection of persistent processes
monitoring  deployment  linux  managability  init 
may 2010 by mpm
Dapper, a Large-Scale Distributed Systems Tracing Infrastructure
Here we introduce the design of Dapper, Google’s production distributed systems tracing infrastructure, and describe how our design goals of low overhead, application-level transparency, and ubiquitous deployment on a very large scale system were met.
distributed  monitoring 
april 2010 by mpm
Send collectd statistics from your Erlang applications
erlang  monitoring 
february 2010 by mpm
Logcheck is a simple yet great idea, an almost set-it-and-forget-it way to monitor your server logs for problems of all kinds.
logging  monitoring 
july 2009 by mpm
Log server, aggregator and viewer
Log server, aggregator and viewer in one package
logging  monitoring 
june 2009 by mpm
BTrace is a safe, dynamic tracing tool for the Java platform. BTrace can be used to dynamically trace a running Java program
java  monitoring 
june 2009 by mpm
Service Management Facility
Self-healing services are delivered and managed on Solaris with the Service Management Facility (smf(5)). smf(5) augments the existing init.d(4) and inetd(1M) startup mechanisms, promoting the service to a first-class operating system object.
unix  monitoring  maintainability  availability 
june 2009 by mpm
Continuous Profiling and Debugging in Distributed Systems
UC Berkeley Computer Science Professor Ion Stoica is trying to find ways to help us cope with the challenge of getting these distributed systems to perform continuously.
monitoring  distributed 
june 2009 by mpm
HOWTO: Use Splunk as Your Remote Syslog Server
the reason I’m doing this is to get a brutally powerful data view in one interface.
june 2009 by mpm
SmokePing keeps track of your network latency
metrics  monitoring  networking 
june 2009 by mpm
Helios is an open source performance and availability monitoring, visualization and reporting system
monitoring  logging 
may 2009 by mpm
Reconnoiter is built out of years of frustration using tools like RRDTOOL, Munin, Cacti, ZenOSS, Nagios, etc. etc. I have a lot of problems with these tools
may 2009 by mpm
Monit is a free open source utility for managing and monitoring, processes, files, directories and filesystems on a UNIX system. Monit conducts automatic maintenance and repair and can execute meaningful causal actions in error situations.
monitoring  linux 
may 2009 by mpm
Instrumenting applications with JMX
Debuggers and profilers can provide insight into an application's behavior, but we usually only break out these tools when there's a serious problem. Building monitoring hooks into an application can make it easier to understand what your programs are doing without breaking out the debugger. Now that Java Management Extensions (JMX) is built into the Java™ SE platform, and the jconsole viewer provides a universal monitoring GUI, using JMX to provide a window into your application is easier and more effective than ever.
java  monitoring 
april 2009 by mpm
powerful and featureful tool that generates advanced web, streaming, ftp or mail server statistics, graphically
monitoring  logging 
april 2009 by mpm
Monitoring Hard Drive Health on Linux with smartmontools
smartmontools is a free software package that can monitor S.M.A.R.T. attributes and run hard drive self-tests
linux  monitoring 
april 2009 by mpm
X-Trace is a network diagnostic tool designed to provide users and network operators with better visibility into increasingly complex Internet applications.
monitoring  distributed  networking 
april 2009 by mpm
collectd is a daemon which collects system performance statistics periodically and provides mechanisms to store the values in a variety of ways, for example in RRD-files.
linux  monitoring 
february 2009 by mpm
« earlier      
per page:    204080120160

Copy this bookmark: