jm + logging   30

GitHub - jorgebastida/awslogs: AWS CloudWatch logs for Humans™
This feature alone is a bit of a killer app:
$ awslogs get /var/log/syslog ip-10-1.* --start='2h ago' | grep ERROR

cli  logging  aws  cloudwatch  logs  awslogs  ec2 
7 days ago by jm
Simple testing can prevent most critical failures
Specifically, the following 3 classes of errors were implicated in 92% of the major production outages in this study and could have been caught with simple code review:
Error handlers that ignore errors (or just contain a log statement); error handlers with “TODO” or “FIXME” in the comment; and error handlers that catch an abstract exception type (e.g. Exception or Throwable in Java) and then take drastic action such as aborting the system.

(Interestingly, the latter was a particular favourite approach of some misplaced "fail fast"/"crash-only software design" dogma in Amazon. I wasn't a fan)
fail-fast  crash-only-software  coding  design  bugs  code-review  review  outages  papers  logging  errors  exceptions 
october 2016 by jm
Structural and semantic deficiencies in the systemd architecture for real-world service management, a technical treatise
Despite its overarching abstractions, it is semantically non-uniform and its complicated transaction and job scheduling heuristics ordered around a dependently networked object system create pathological failure cases with little debugging context that would otherwise not necessarily occur on systems with less layers of indirection. The use of bus APIs complicate communication with the service manager and lead to duplication of the object model for little gain. Further, the unit file options often carry implicit state or are not sufficiently expressive. There is an imbalance with regards to features of an eager service manager and that of a lazy loading service manager, having rusty edge cases of both with non-generic, manager-specific facilities. The approach to logging and the circularly dependent architecture seem to imply that lots of prior art has been ignored or understudied.
analysis  systemd  linux  unix  ops  init  critiques  software  logging 
november 2015 by jm
SQL on Kafka using PipelineDB
this is quite nice. PipelineDB allows direct hookup of a Kafka stream, and will ingest durably and reliably, and provide SQL views computed over a sliding window of the stream.
logging  sql  kafka  pipelinedb  streaming  sliding-window  databases  search  querying 
september 2015 by jm
VPC Flow Logs
we are introducing Flow Logs for the Amazon Virtual Private Cloud.  Once enabled for a particular VPC, VPC subnet, or Elastic Network Interface (ENI), relevant network traffic will be logged to CloudWatch Logs for storage and analysis by your own applications or third-party tools.

You can create alarms that will fire if certain types of traffic are detected; you can also create metrics to help you to identify trends and patterns. The information captured includes information about allowed and denied traffic (based on security group and network ACL rules). It also includes source and destination IP addresses, ports, the IANA protocol number, packet and byte counts, a time interval during which the flow was observed, and an action (ACCEPT or REJECT).
ec2  aws  vpc  logging  tracing  ops  flow-logs  network  tcpdump  packets  packet-capture 
june 2015 by jm
Three Questions to Answer When Reporting an Error
Very long, but tl;dr:
the trick to creating an effective error message is to answer the 3 Questions within your message: What is the error? What was the probable cause of the error? What is the probable remedy?
errors  ui  ux  reporting  logging  coding 
may 2015 by jm
Why Loggly loves Apache Kafka
Some good factoids about Loggly's Kafka usage and scales
scalability  logging  loggly  kafka  queueing  ops  reliabilty 
may 2015 by jm
Our latest open source release from Swrve Labs: an Apache-licensed, SLF4J-compatible, simple, fluent API for rate-limited logging in Java:

'A RateLimitedLog object tracks the rate of log message emission, imposes an internal rate limit, and will efficiently suppress logging if this is exceeded. When a log is suppressed, at the end of the limit period, another log message is output indicating how many log lines were suppressed. This style of rate limiting is the same as the one used by UNIX syslog; this means it should be comprehensible, easy to predict, and familiar to many users, unlike more complex adaptive rate limits.'

We've been using this in production for months -- it's pretty nifty ;) Never fear your logs again!
logs  logging  coding  java  open-source  swrve  slf4j  rate-limiting  libraries 
february 2015 by jm
AWS re:Invent 2014 | (SPOT302) Under the Covers of AWS: Its Core Distributed Systems - YouTube
This is a really solid talk -- not surprising, alv@ is one of the speakers!
"AWS and operate some of the world's largest distributed systems infrastructure and applications. In our past 18 years of operating this infrastructure, we have come to realize that building such large distributed systems to meet the durability, reliability, scalability, and performance needs of AWS requires us to build our services using a few common distributed systems primitives. Examples of these primitives include a reliable method to build consensus in a distributed system, reliable and scalable key-value store, infrastructure for a transactional logging system, scalable database query layers using both NoSQL and SQL APIs, and a system for scalable and elastic compute infrastructure.

In this session, we discuss some of the solutions that we employ in building these primitives and our lessons in operating these systems. We also cover the history of some of these primitives -- DHTs, transactional logging, materialized views and various other deep distributed systems concepts; how their design evolved over time; and how we continue to scale them to AWS. "

scale  scaling  aws  amazon  dht  logging  data-structures  distcomp  via:marc-brooker  dynamodb  s3 
november 2014 by jm
Is Docker ready for production? Feedbacks of a 2 weeks hands on
I have to agree with this assessment -- there are a lot of loose ends still for production use of Docker in a SOA stack environment:
From my point of view, Docker is probably the best thing I’ve seen in ages to automate a build. It allows to pre build and reuse shared dependencies, ensuring they’re up to date and reducing your build time. It avoids you to either pollute your Jenkins environment or boot a costly and slow Virtualbox virtual machine using Vagrant. But I don’t feel like it’s production ready in a complex environment, because it adds too much complexity. And I’m not even sure that’s what it was designed for.
docker  complexity  devops  ops  production  deployment  soa  web-services  provisioning  networking  logging 
october 2014 by jm
Logentries Announces Machine Learning Analytics for IT Ops Monitoring and Real-time Alerting
This sounds pretty neat:
With Logentries Anomaly Detection, users can:

Set-up real-time alerting based on deviations from important patterns and log events.
Easily customize Anomaly thresholds and compare different time periods.

With Logentries Inactivity Alerting, users can:

Monitor standard, incoming events such as an application heart beat.
Receive real-time alerts based on log inactivity (i.e. receive alerts when something does not occur).
logging  syslog  logentries  anomaly-detection  ops  machine-learning  inactivity  alarms  alerting  heartbeats 
august 2014 by jm
Systemd: Harbinger of the Linux apocalypse
While there are many defensible aspects of Systemd, other aspects boggle the mind. Not the least of these was that, as of a few months ago, trying to debug the kernel from the boot line would cause the system to crash. This was because of Systemd's voracious logging and the fact that Systemd responds to the "debug" flag on the kernel boot line -- a flag meant for the kernel, not anything else. That, straight up, is a bug.

However, the Systemd developers didn't see it that way and actively fought with those experiencing the problem. Add the fact that one of the Systemd developers was banned by Linus Torvalds for poor attitude and bad design and another was responsible for causing significant issues with Linux audio support, but blamed the problem on everything else but his software, and you have a bad situation on your hands.

There's no shortage of egos in the open source development world. There's no shortage of new ideas and veteran developers and administrators pooh-poohing something new simply because it's new. But there are also 45 years of history behind Unix and extremely good reasons it's still flourishing. Tools designed like Systemd do not fit the Linux mold, to their own detriment. Systemd's design has more in common with Windows than with Unix -- down to the binary logging.

The link re systemd consuming the "debug" kernel boot arg is a canonical example of inflexible coders refusing to fix their own bugs. (via Jason Dixon)
systemd  linux  red-hat  egos  linus-torvalds  unix  init  booting  debugging  logging  design  software  via:obfuscurity 
august 2014 by jm
AWS SDK for Java Client Configuration
turns out the AWS SDK has lots of tuning knobs: region selection, socket buffer sizes, and debug logging (including wire logging).
aws  sdk  java  logging  ec2  s3  dynamodb  sockets  tuning 
june 2014 by jm
Why dispute resolution is hard
Good stuff (as usual) from Ross Anderson and Stephen Murdoch.

'Today we release a paper on security protocols and evidence which analyses why dispute resolution mechanisms in electronic systems often don’t work very well. On this blog we’ve noted many many problems with EMV (Chip and PIN), as well as other systems from curfew tags to digital tachographs. Time and again we find that electronic systems are truly awful for courts to deal with. Why?
The main reason, we observed, is that their dispute resolution aspects were never properly designed, built and tested. The firms that delivered the main production systems assumed, or hoped, that because some audit data were available, lawyers would be able to use them somehow.
As you’d expect, all sorts of things go wrong. We derive some principles, and show how these are also violated by new systems ranging from phone banking through overlay payments to Bitcoin. We also propose some enhancements to the EMV protocol which would make it easier to resolve disputes over Chip and PIN transactions.'
finance  security  ross-anderson  emv  bitcoin  chip-and-pin  banking  architecture  verification  vvat  logging 
february 2014 by jm
Make The Web Fast - The HAR Show: Capturing and Analyzing performance data with HTTP Archive format — Google Developers
Wow, I didn't know about this. Great idea.
Need a flexible format to record, export, and analyze network performance data? Well, that's exactly what the HTTP Archive format (HAR) is designed to do! Even better, did you know that Chrome DevTools supports it? In this episode we'll take a deep dive into the format (as you'll see, its very simple), and explore the many different ways it can help you capture and analyze your sites performance. Join Ilya Grigorik and Peter Lubbers to find out how to capture HAR network traces in Chrome, visualize the data via an online tool, share the reports with your clients and coworkers, automate the logging and capture of HAR data for your build scripts, and even adapt it to server-side analysis use cases
capturing  logging  performance  http  debugging  trace  capture  har  archives  protocols  recording 
december 2013 by jm
Creating Flight Recordings
lots more detail on the new "Java Mission Control" feature in Hotspot 7u40 JVMs, and how to use it to start and stop profiling in a live, production JVM from a separate "jcmd" command-line client. If the overhead is small, this could be really neat -- turn on profiling for 1 minute every hour on a single instance, and collect realtime production profile data on an automated basis for post-facto analysis if required
instrumentation  logging  profiling  java  jvm  ops 
september 2013 by jm
Behind the Screens at Loggly
Boost ASIO at the front end (!), Kafka 0.8, Storm, and ElasticSearch
boost  scalability  loggly  logging  ingestion  cep  stream-processing  kafka  storm  architecture  elasticsearch 
september 2013 by jm
Log4j 2: Performance close to insane
Nice writeup on Log4j 2's new AsyncAppender implementation, based on the LMAX Disruptor. sounds pretty excellent:
“One nice little detail I should mention is that both Async Loggers and Async Appenders fix something that has always bothered me in Log4j-1.x, which is that they will flush the buffer after logging the last event in the queue . With Log4j-1.x, if you used buffered I/O, you often could not see the last few log events, as they were still stuck in the memory buffer. Your only option was setting immediateFlush to true, which forces disk I/O on every single log event and has a performance impact.
With Async Loggers and Appenders in Log4j-2.0 your log statements are all flushed to disk, so they are always visible, but this happens in a very efficient manner.”
logging  java  performance  async  disruptor  low-latency 
july 2013 by jm
Log4j2 Asynchronous Loggers for Low-Latency Logging - Apache Log4j 2
implemented using the LMAX Disruptor library -- very impressive performance figures. I presume in real-world usage, these latencies are dwarfed by hardware costs, though
disruptor  coding  java  log4j  logging  async  performance 
april 2013 by jm
'The Unified Logging Infrastructure for Data Analytics at Twitter' [PDF]
A picture of how Twitter standardized their internal service event logging formats to allow batch analysis and analytics. They surface service metrics to dashboards from Pig jobs on a daily basis, which frankly doesn't sound too great...
twitter  analytics  event-logging  events  logging  metrics 
january 2013 by jm
Chronon DVR for Java
"record entire execution of your Java app; play it back on any machine". Other features: time-travelling debugger -- step backwards, jump to any point in execution, designed for long running programs; post-execution logging -- add log statements after the program has run, and see what it would have logged. Looks extremely nifty, but I wonder how big those recording files get...
debugging  via:peakscale  eclipse  chronon  dvr  java  coding  logging  jvm 
may 2012 by jm
sbtourist/nimrod - GitHub
'Nimrod is a metrics server, inspired by the excellent Coda Hale's Metrics library, but purely based on log processing: hence, it doesn't affect the way you write your applications, nor it has any side effect on them.'
nimrod  service-metrics  logging 
february 2012 by jm
'Free open source self-hosted log management and exception tracking', loggly-style.  Basically, a nifty web data-mining UI on your syslogs (via adulau)
logging  syslog  sysadmin  mongodb  opensource  via:adulau  logs  web  ui  data-mining  from delicious
january 2011 by jm
First logging-as-a-service tool for the cloud wins NovaUCD award -
first, eh? not sure about that. still, good going for Irish startup JLizard, logging in the cloud seems to be hot
logging  metrics  analysis  cloud  ireland  startups  novaucd  from delicious
november 2010 by jm
open-source app to manage events and logs; collect logs, parse them, store, search, with web UI
logs  logging  logstash  metrics  from delicious
november 2010 by jm
'Logging as a Service' - a cloud-based logging service
logging  loggly  cloud  logs  data  metrics  from delicious
november 2010 by jm
Petit: Log Analysis
log analyzer; removes common strings and patterns from log files, identifying outliers and hapaxen as "interesting". also does charting of frequencies etc.
logs  logging  analysis  loganalysis  syslog  tools  from delicious
june 2010 by jm
glTail.rb - realtime logfile visualization
'View real-time data and statistics from any logfile on any server with SSH, in an intuitive and entertaining way', supporting postfix/spamd/clamd logs among loads of others. very cool if a little silly
dataviz  visualization  tail  gltail  opengl  linux  apache  spamd  spamassassin  logs  statistics  sysadmin  analytics  animation  analysis  server  ruby  monitoring  logging  logfiles 
july 2009 by jm

related tags

alarms  alerting  amazon  analysis  analytics  animation  anomaly-detection  apache  architecture  archives  async  aws  awslogs  banking  bitcoin  boost  booting  bugs  capture  capturing  cep  chip-and-pin  chronon  cli  cloud  cloudwatch  code-review  coding  complexity  crash-only-software  critiques  data  data-mining  data-structures  databases  dataviz  debugging  deployment  design  devops  dht  disruptor  distcomp  docker  dvr  dynamodb  ec2  eclipse  egos  elasticsearch  emv  errors  event-logging  events  exceptions  fail  fail-fast  finance  flow-logs  gltail  har  heartbeats  http  inactivity  ingestion  init  instrumentation  ireland  java  journald  jvm  kafka  libraries  linus-torvalds  linux  log4j  loganalysis  logentries  logfiles  logging  loggly  logs  logstash  low-latency  machine-learning  metrics  mongodb  monitoring  mysql  network  networking  nimrod  novaucd  open-source  opengl  opensource  ops  outages  packet-capture  packets  papers  performance  pipelinedb  production  profiling  protocols  provisioning  querying  queueing  rate-limiting  rds  recording  red-hat  reliabilty  reporting  review  ross-anderson  ruby  s3  scalability  scale  scaling  sdk  search  security  server  service-metrics  slf4j  sliding-window  soa  sockets  software  spamassassin  spamd  sql  stackoverflow  startups  statistics  storm  stream-processing  streaming  swrve  sysadmin  syslog  systemd  systemdsucks  tail  tcpdump  tools  trace  tracing  tuning  twitter  ui  unix  ux  verification  via:adulau  via:marc-brooker  via:obfuscurity  via:peakscale  visualization  vpc  vvat  web  web-services 

Copy this bookmark: