jm + tools   58

'Containerized Data Analytics':
There are two bold new ideas in Pachyderm:

Containers as the core processing primitive
Version Control for data

These ideas lead directly to a system that's much more powerful, flexible and easy to use.

To process data, you simply create a containerized program which reads and writes to the local filesystem. You can use any tools you want because it's all just going in a container! Pachyderm will take your container and inject data into it. We'll then automatically replicate your container, showing each copy a different chunk of data. With this technique, Pachyderm can scale any code you write to process up to petabytes of data (Example: distributed grep).

Pachyderm also version controls all data using a commit-based distributed filesystem (PFS), similar to what git does with code. Version control for data has far reaching consequences in a distributed filesystem. You get the full history of your data, can track changes and diffs, collaborate with teammates, and if anything goes wrong you can revert the entire cluster with one click!

Version control is also very synergistic with our containerized processing engine. Pachyderm understands how your data changes and thus, as new data is ingested, can run your workload on the diff of the data rather than the whole thing. This means that there's no difference between a batched job and a streaming job, the same code will work for both!
analytics  data  containers  golang  pachyderm  tools  data-science  docker  version-control 
4 weeks ago by jm if you aren't safe, we'll make noise for you
a Dead Man's Switch for border crossings; if you are detained and cannot make a "checkin", it'll make noise on your behalf so your friends and family know what's happened
safety  borders  dead-mans-switch  landsafe  tools 
7 weeks ago by jm
Julia Evans reverse engineers
simple usage of Docker, blue/green deploys, and AWS ALBs
docker  alb  aws  ec2  blue-green-deploys  deployment  ops  tools  skyliner  via:jgilbert 
november 2016 by jm
'a Ruby regular expression editor and tester'. Great for prototyping regexps with a little set of test data, providing a neat permalink for the results
regex  regexp  ruby  tools  coding  web  editors  testing 
july 2016 by jm
command line utility that performs an HTML element selection on HTML content passed to the stdin. Using css selectors that everybody knows. Since input comes from stdin and output is sent to stdout, it can easily be used inside traditional UNIX pipelines to extract content from webpages and html files. tq provides extra formating options such as json-encoding or newlines squashing, so it can play nicely with everyones favourite command line tooling.
tq  linux  unix  cli  command-line  html  parsing  css  tools 
may 2016 by jm
The Make: Weekend Projects Thumbnail Guide To Soldering
man, I wish I had this 30 years ago. now I know what stuff I need to get to make my occasional solders less of a PITA
soldering  gadgets  tools  workbench  make  fixing  diy 
april 2016 by jm
Qualys SSL Server Test
pretty sure I had this bookmarked previously, but this is the current URL -- SSL/TLS quality report
ssl  tls  security  tests  ops  tools  testing 
march 2016 by jm
a free, multi-threaded compression utility with support for bzip2 compressed file format. lbzip2 can process standard bz2 files in parallel. It uses POSIX threading model (pthreads), which allows it to take full advantage of symmetric multiprocessing (SMP) systems. It has been proven to scale linearly, even to over one hundred processor cores.

lbzip2 is fully compatible with bzip2 – both at file format and command line level. Files created by lbzip2 can be decompressed by all versions of bzip2 and other software supporting bz2 format. lbzip2 can decompress any bz2 files in parallel. All bzip2 command-line options are also accepted by lbzip2. This makes lbzip2 a drop-in replacement for bzip2.
bzip2  gzip  compression  lbzip2  parallel  cli  tools 
march 2016 by jm
Online chart maker for CSV and Excel data; make charts and dashboards online. One really nice feature is that charts made this way get permalinks, and can be easily inlined as PNGs or HTML5 divs. (See for an example.)
data  javascript  python  tools  visualization  dataviz  charts  graphing  web  plotly  plots  graphs 
january 2016 by jm
Image editing tool for the Mac, recommended by Oisin
images  design  graphics  mac  osx  tools  apps 
december 2015 by jm
Google Cloud Shell
your command line environment in the [Google] Cloud. This feature enables you to connect to a shell environment on a virtual machine, pre-loaded with the tools you need to easily run commands to develop, deploy and manage your projects. Currently, Cloud Shell is an f1-micro Google Compute Engine machine that exposes a Debian-based development environment. You are also assigned 5 GB of standard persistent disk space as the home disk so you can store files between sessions.

It's also free. This is a great idea -- handy both for beginners getting to grips with GoogCloud and for experts looking for a quite dev env to hack with. I wish AWS had something similar.
google  cloud  shell  google-cloud  gcs  gce  cli  tools 
october 2015 by jm
a specialized packet sniffer designed for displaying and logging HTTP traffic. It is not intended to perform analysis itself, but to capture, parse, and log the traffic for later analysis. It can be run in real-time displaying the traffic as it is parsed, or as a daemon process that logs to an output file. It is written to be as lightweight and flexible as possible, so that it can be easily adaptable to different applications.

via Eoin Brazil
via:eoinbrazil  httpry  http  networking  tools  ops  testing  tcpdump  tracing 
september 2015 by jm
'a simple command line tool that turns your CLI tools into web applications'
cli  terminal  web  tools  unix 
september 2015 by jm
'like sed, awk, cut, join, and sort for name-indexed data such as CSV'

Written in "modern C" with zero runtime dependencies. Looks great
cli  csv  unix  miller  tsv  data  tools 
august 2015 by jm
'Simplistic interactive filtering tool' -- live incremental-search filtering in a terminal window
cli  shell  terminal  tools  go  peco  interactive  incremental-search  search  ui  unix 
june 2015 by jm
a command line tool for JVM diagnostic troubleshooting and profiling.
java  jvm  monitoring  commandline  jmx  sjk  tools  ops 
june 2015 by jm
Ag: faster than Ack
Some nice performance tricks; I particularly like the use of sljit:
Ag uses Pthreads to take advantage of multiple CPU cores and search files in parallel.
Files are mmap()ed instead of read into a buffer.
Literal string searching uses Boyer-Moore strstr.
Regex searching uses PCRE's JIT compiler (if Ag is built with PCRE >=8.21).
Ag calls pcre_study() before executing the same regex on every file.
Instead of calling fnmatch() on every pattern in your ignore files, non-regex patterns are loaded into arrays and binary searched.
jit  cli  grep  search  ack  ag  unix  pcre  sljit  boyer-moore  tools 
march 2015 by jm
OpenJDK: jol
'JOL (Java Object Layout) is the tiny toolbox to analyze object layout schemes in JVMs. These tools are using Unsafe, JVMTI, and Serviceability Agent (SA) heavily to decoder the actual object layout, footprint, and references. This makes JOL much more accurate than other tools relying on heap dumps, specification assumptions, etc.'

Recommended by Nitsan Wakart, looks pretty useful for JVM devs
java  jvm  tools  scala  memory  estimation  ram  object-layout  debugging  via:nitsan 
february 2015 by jm
A tool for managing Apache Kafka. It supports the following :

Manage multiple clusters;
Easy inspection of cluster state (topics, brokers, replica distribution, partition distribution);
Run preferred replica election;
Generate partition assignments (based on current state of cluster);
Run reassignment of partition (based on generated assignments)
yahoo  kafka  ops  tools 
february 2015 by jm
'A constant throughput, correct latency-recording variant of wrk. This is a must-have when measuring network service latency -- corrects for Coordinated Omission error:
wrk's model, which is similar to the model found in many current load generators, computes the latency for a given request as the time from the sending of the first byte of the request to the time the complete response was received. While this model correctly measures the actual completion time of individual requests, it exhibits a strong Coordinated Omission effect, through which most of the high latency artifacts exhibited by the measured server will be ignored. Since each connection will only begin to send a request after receiving a response, high latency responses result in the load generator coordinating with the server to avoid measurement during high latency periods.
wrk  latency  measurement  tools  cli  http  load-testing  testing  load-generation  coordinated-omission  gil-tene 
november 2014 by jm
"A command-line power tool for Twitter." It really is -- much better timeline searchability than the "real" Twitter UI, for example
twitter  ruby  github  cli  tools  unix  search 
october 2014 by jm
Inviso: Visualizing Hadoop Performance
With the increasing size and complexity of Hadoop deployments, being able to locate and understand performance is key to running an efficient platform.  Inviso provides a convenient view of the inner workings of jobs and platform.  By simply overlaying a new view on existing infrastructure, Inviso can operate inside any Hadoop environment with a small footprint and provide easy access and insight.  

This sounds pretty useful.
inviso  netflix  hadoop  emr  performance  ops  tools 
september 2014 by jm
tinystat - GoDoc
tinystat is used to compare two or more sets of measurements (e.g., runs of a multiple runs of benchmarks of two possible implementations) and determine if they are statistically different, using Student's t-test. It's inspired largely by FreeBSD's ministat (written by Poul-Henning Kamp).
t-test  student  statistics  go  coda-hale  tinystat  stats  tools  command-line  unix 
september 2014 by jm
Whiteboard Picture Cleaner

This [shell one-liner] will take a picture of a whiteboard and use parts of the ImageMagick library with sane defaults to clean it up tremendously.: convert "$1" -morphology Convolve DoG:15,100,0 -negate -normalize -blur 0x1 -channel RBG -level 60%,91%,0.1 "$2"

Some kind soul has put up a quickie web UI here:
graphics  tools  whiteboard  imagemagick  text  images  cleanup  gimp  photoshop  via:fanf 
june 2014 by jm
'a command line tool for Amazon's Simple Storage Service (S3). Written in Python, easy_install the package to install as an egg. Supports multithreaded operations for large volumes. Put, get, or delete many items concurrently, using a fixed-size pool of threads. Built on workerpool for multithreading and boto for access to the Amazon S3 API. Unix-friendly input and output. Pipe things in, out, and all around.'

MIT-licensed open source. (via Paul Dolan)
via:pdolan  s3  s3funnel  tools  ops  aws  python  mit  open-source 
april 2014 by jm
open source, system-level exploration: capture system state and activity from a running Linux instance, then save, filter and analyze.
Think of it as strace + tcpdump + lsof + awesome sauce.
With a little Lua cherry on top.

This sounds excellent. Linux-based, GPLv2.
debugging  tools  linux  ops  tracing  strace  open-source  sysdig  cli  tcpdump  lsof 
april 2014 by jm
Another cool library from Roy Holder: 'an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.'

Similar to his Guava-Retrier java lib, but using a decorator.
retrying  python  libraries  tools  backoff  retry  error-handling 
april 2014 by jm
an easily embeddable, decentralized, k-ordered unique ID generator. It can use the same encoded ID format as Twitter's Snowflake or Boundary's Flake implementations as well as any other customized encoding without too much effort. The fauxflake-core module has no external dependencies and is meant to be about as light as possible while still delivering useful functionality. Essentially, if you want to be able to generate a unique identifier across your infrastructure with reasonable assurances about collisions, then you might find this useful.

From the same guy as the excellent Guava Retrier library; java, ASL2-licensed open source.
open-source  java  asl2  fauxflake  tools  libraries  unique-ids  ids  unique  snowflake  distsys 
april 2014 by jm
a utility to perform parallel, pipelined execution of a single HTTP GET. htcat is intended for the purpose of incantations like: htcat | tar -zx

It is tuned (and only really useful) for faster interconnects: [....] 109MB/s on a gigabit network, between an AWS EC2 instance and S3. This represents 91% use of the theoretical maximum of gigabit (119.2 MiB/s).
go  cli  http  file-transfer  ops  tools 
march 2014 by jm
Sugru Magnet Kit
Sugru + neodymium magnets = WANT
sugru  diy  tools  magnets  want  toget  bike  hacks  fixing 
january 2014 by jm
'like inetd, but for WebSockets' -- 'a small command line tool that will wrap an existing command line interface program, and allow it to be accessed via a WebSocket. It provides a quick mechanism for allowing web-applications to interact with existing command line tools.'

Awesome idea. BSD-licensed. (Via Mike Loukides)
websockets  cli  server  tools  unix  inetd  web  http  open-source 
december 2013 by jm
from the Percona toolkit. 'Conveniently summarizes the status and configuration of a server. It is not a tuning tool or diagnosis tool. It produces a report that is easy to diff and can be pasted into emails without losing the formatting. This tool works well on many types of Unix systems.' --- summarises OOM history, top, netstat connection table, interface stats, network config, RAID, LVM, disks, inodes, disk scheduling, mounts, memory, processors, and CPU.
percona  tools  cli  unix  ops  linux  diagnosis  raid  netstat  oom 
october 2013 by jm

The future of the AWS command line tools is awscli, a single, unified, consistent command line tool that works with almost all of the AWS services. Here is a quick list of the services that awscli currently supports: Auto Scaling, CloudFormation, CloudSearch, CloudWatch, Data Pipeline, Direct Connect, DynamoDB, EC2, ElastiCache, Elastic Beanstalk, Elastic Transcoder, ELB, EMR, Identity and Access Management, Import/Export, OpsWorks, RDS, Redshift, Route 53, S3, SES, SNS, SQS, Storage Gateway, Security Token Service, Support API, SWF, VPC. Support for the following appears to be planned: CloudFront, Glacier, SimpleDB.

The awscli software is being actively developed as an open source project on Github, with a lot of support from Amazon. You’ll note that the biggest contributors to awscli are Amazon employees with Mitch Garnaat leading. Mitch is also the author of boto, the amazing Python library for AWS.
aws  awscli  cli  tools  command-line  ec2  s3  amazon  api 
august 2013 by jm
This program creates an EBS snapshot for an Amazon EC2 EBS volume. To
help ensure consistent data in the snapshot, it tries to flush and
freeze the filesystem(s) first as well as flushing and locking the
database, if applicable.

Filesystems can be frozen during the snapshot. Prior to Linux kernel
2.6.29, XFS must be used for freezing support. While frozen, a
filesystem will be consistent on disk and all writes will block.

There are a number of timeouts to reduce the risk of interfering with
the normal database operation while improving the chances of getting a
consistent snapshot.

If you have multiple EBS volumes in a RAID configuration, you can
specify all of the volume ids on the command line and it will create
snapshots for each while the filesystem and database are locked. Note
that it is your responsibility to keep track of the resulting snapshot
ids and to figure out how to put these back together when you need to
restore the RAID setup.

ubuntu  ec2  aws  linux  ebs  snapshots  ops  tools  alestic 
may 2013 by jm
like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text. [it] is written in portable C, and it has zero runtime dependencies. You can download a single binary, scp it to a far away machine, and expect it to work.

Nice tool. Needs to get into the Debian/Ubuntu apt repos pronto ;)
jq  tools  cli  via:peakscale  json  coding  data  sed  unix 
april 2013 by jm
Some really cool-looking UNIX command line utils, packaged in Debian (and therefore in Ubuntu too). A few of these I've reimplemented separately, but it's always good to replace a hack with a more widely available "official" tool. Thanks, Joey Hess!
sponge: accept input, wait til EOF, then rewrite a file;
chronic: runs a command quietly unless it fails;
combine: combine the lines in two files using boolean operations;
ifdata: get network interface info without parsing ifconfig output;
ifne: run a program if the standard input is not empty;
isutf8: check if a file or standard input is utf-8;
lckdo: execute a program with a lock held;
mispipe: pipe two commands, returning the exit status of the first;
parallel: run multiple jobs at once;
pee: tee standard input to pipes;
sponge: soak up standard input and write to a file;
ts: timestamp standard input;
vidir: edit a directory in your text editor;
vipe: insert a text editor into a pipe;
zrun: automatically uncompress arguments to command
bash  shell  cli  unix  scripting  via:peakscale  joey-hess  debian  ubuntu  tools  command-line  commands 
march 2013 by jm
Denominator: A Multi-Vendor Interface for DNS
the latest good stuff from Netflix.

Denominator is a portable Java library for manipulating DNS clouds. Denominator has pluggable back-ends, initially including AWS Route53, Neustar Ultra, DynECT, and a mock for testing. We also ship a command line version so it's easy for anyone to try it out.
The reason we built Denominator is that we are working on multi-region failover and traffic sharing patterns to provide higher availability for the streaming service during regional outages caused by our own bugs and AWS issues. To do this we need to directly control the DNS configuration that routes users to each region and each zone. When we looked at the features and vendors in this space we found that we were already using AWS Route53, which has a nice API but is missing some advanced features; Neustar UltraDNS, which has a SOAP based API; and DynECT, which has a REST API that uses a quite different pseudo-transactional model. We couldn’t find a Java based API that grouped together common set of capabilities that we are interested in, so we created one. The idea is that any feature that is supported by more than one vendor API is the highest common denominator, and that functionality can be switched between vendors as needed, or in the event of a DNS vendor outage.
dns  netflix  java  tools  ops  route53  aws  ultradns  dynect 
march 2013 by jm
How Team Obama’s tech efficiency left Romney IT in dust | Ars Technica
The web-app dev and ops best practices used by the Obama campaign's tech team. Some key tools: Puppet, EC2, Asgard, Cacti, Opsview, StatsD, Graphite, Seyren, Route53, Loggly, etc.
obama  campaigns  tools  ops  asgard  ec2  aws  route53 
november 2012 by jm
Unlike other tools intended to solve the JVM startup problem (e.g. Nailgun, Cake), Drip does not use a persistent JVM. There are many pitfalls to using a persistent JVM, which we discovered while working on the Cake build tool for Clojure. The main problem is that the state of the persistent JVM gets dirty over time, producing strange errors and requiring liberal use of cake kill whenever any error is encountered, just in case dirty state is the cause.

Instead of going down this road, Drip uses a different strategy. It keeps a fresh JVM spun up in reserve with the correct classpath and other JVM options so you can quickly connect and use it when needed, then throw it away. Drip hashes the JVM options and stores information about how to connect to the JVM in a directory with the hash value as its name.

(via HN)
java  command-line  tools  startup  speed 
november 2012 by jm
twitter/jvmgcprof - GitHub
'gcprof is a simple utility for profile allocation and garbage collection activity in the JVM [...] Profile allocation and garbage collection activity in the JVM. The gcprof command runs a java command under profiling. Allocation and collection statistics are printed periodically. If -n or -no are provided, statistics are also reported in terms of the given application metric. Total allocation, allocation rate, and a survival histogram is given. The intended use for this tool is twofold: (1) monitor and test garbage allocation and GC behavior, and (2) inform GC tuning.'
gc  java  performance  twitter  jvm  tools 
february 2012 by jm
Determining response times with tcprstat
'Tcprstat is a free, open-source TCP analysis tool that watches network traffic and computes the delay between requests and responses. From this it derives response-time statistics and prints them out.' Computes percentiles, too
tcp  tcprstat  tcp-ip  networking  measurement  statistics  performance  instrumentation  linux  unix  tools  cli 
november 2011 by jm
Linux SS Utility To Investigate Sockets / Network Connections
'When amount of sockets is enough large, netstat or even plain cat /proc/net/tcp/ cause nothing but pains and curses. In linux-2.4 the desease [sic] became worse: even if amount of sockets is small reading /proc/net/tcp/ is slow enough. This utility presents a new approach, which is supposed to scale well.' via scanlan
via:scanlan  ss  linux  sockets  networking  tools  cli 
october 2011 by jm
Cool, but obscure unix tools
these are great - some new ones on me!
cli  linux  terminal  unix  tools  command-line 
may 2011 by jm
wraps strace(1) to summarise and aggregate I/O ops performed by a Linux process. looks pretty nifty (via Jeremy Zawodny)
via:jzawodny  io  strace  linux  monitoring  debugging  performance  profiling  sysadmin  ioprofile  unix  tools  from delicious
october 2010 by jm
Cory Doctorow's working environment
hardware and software, specifically, and an Ubuntu/Thinkpad user. some good tips here, and well-written, naturally
cory-doctorow  geek  howto  lifehacks  ubuntu  productivity  tips  tools  from delicious
july 2010 by jm
Petit: Log Analysis
log analyzer; removes common strings and patterns from log files, identifying outliers and hapaxen as "interesting". also does charting of frequencies etc.
logs  logging  analysis  loganalysis  syslog  tools  from delicious
june 2010 by jm
Search results for on Delicious
wow, you can search a time period for everyone who bookmarked pages on a specific site (via Britta)
delicious  search  nifty  tools  egosurfing  via:britta  from delicious
february 2010 by jm
JSON Format
'your online JSON Formatter'. useful. via JKeyes
via:jkeyes  json  formatting  tools  useful  format  debugging  from delicious
november 2009 by jm
SD, a distributed bug tracker
now available. sadly, no support for Bugzilla, which is what we use in SpamAssassin (srsly), so I won't be trying it out just yet, but still -- cool
bugs  bug-tracking  trac  prophet  distributed  coding  tools  web  sd 
august 2009 by jm
Simpleton's guide to git
it really is. Yet another one-page intro to git, but a good one
git  tips  via:joshua  scm  tools  vc 
august 2009 by jm

related tags

ack  ag  agile  alb  alestic  amazon  analysis  analytics  api  apps  architecture  asgard  asl2  aws  awscli  backoff  backups  bash  bike  bits  blue-green-deploys  boing-boing  borders  boyer-moore  bug-tracking  bugs  bzip2  campaigns  charts  cleanup  cli  cloud  cloudflare  coda-hale  coding  collaboration  command-line  commandline  commands  compression  construction  containers  coordinated-omission  cory-doctorow  css  csv  currency  currency-conversion  data  data-science  dataviz  dead-mans-switch  debian  debugging  delicious  deployment  design  dessid  dev  diagnosis  distributed  distribution  distsys  diy  dns  docker  drills  dual-use  dumb-init  dynamodb  dynect  ebs  ec2  editors  egosurfing  eircom  emr  error-handling  estimation  export  fauxflake  file-transfer  fixing  format  formatting  gadgets  gc  gce  gcs  geek  gil-tene  gimp  git  github  go  golang  google  google-cloud  graphics  graphing  graphs  grep  gzip  hacking  hacks  hadoop  howto  html  http  http2  httpry  ids  imagemagick  images  incremental-search  inetd  init  instagram  instrumentation  interactive  inviso  io  ioprofile  ireland  java  javascript  jit  jmx  joey-hess  jq  json  jvm  kafka  landsafe  latency  law  lbzip2  libraries  lifehacks  linux  load-generation  load-testing  loganalysis  logging  logs  lsof  mac  magnets  make  management  measurement  memory  miller  mit  money  money-transfer  monitoring  netflix  netstat  networking  nifty  obama  object-layout  oom  open-source  ops  osx  pachyderm  parallel  parsing  pcre  peco  percona  performance  photoshop  plotly  plots  productivity  profiling  project-management  prophet  python  raid  ram  recommendations  regex  regexp  retry  retrying  route53  ruby  s3  s3funnel  safety  scala  scm  scripting  scrum  sd  search  security  sed  server  shell  signals  sjk  skyliner  sljit  snapshots  snowflake  sockets  software  soldering  spark  spdy  speed  ss  ssl  startup  statistics  stats  strace  student  sugru  sysadmin  sysdig  syslog  t-test  tcp  tcp-ip  tcpdump  tcprstat  terminal  testing  tests  text  tinystat  tips  tls  toget  tools  tq  trac  tracing  tsv  twitter  ubuntu  ui  ultradns  unique  unique-ids  unix  useful  vc  version-control  via:britta  via:eoinbrazil  via:fanf  via:jgilbert  via:jkeyes  via:joshua  via:jzawodny  via:nitsan  via:pdolan  via:peakscale  via:phickey  via:scanlan  visualization  vm  vmtouch  want  web  websockets  whiteboard  workbench  wrk  yahoo  yelp 

Copy this bookmark: