jm + testing   135

Computer says no: Irish vet fails oral English test needed to stay in Australia
An Irish veterinarian with degrees in history and politics has been unable to convince a machine she can speak English well enough to stay in Australia.

Louise Kennedy is a native English speaker, has excellent grammar and a broad vocabulary. She holds two university degrees – both obtained in English – and has been working in Australia as an equine vet on a skilled worker visa for the past two years.

But she is now scrambling for other visa options after a computer-based English test – scored by a machine – essentially handed her a fail in terms of convincing immigration officers she can fluently speak her own language.

This is idiotic. Computer-based voice recognition is in no way reliable enough for this kind of job. It's automated Kafkaesque bureaucracy -- "computer says no". Shame on Oz

(via James Kelleher)
via:etienneshrdlu  kafkaesque  bureaucracy  computer-says-no  voice-recognition  australia  immigration  english  voice  testing 
7 days ago by jm
Developer Experience Lessons Operating a Serverless-like Platform at Netflix
Very interesting writeup on how Netflix are finding operating a serverless scripting system; they offer scriptability in their backend and it's used heavily by devs to provide features. Lots of having to reinvent the wheel on packaging, deployment, versioning, and test/staging infrastructure
serverless  dependencies  packaging  deployment  versioning  devex  netflix  developer-experience  dev  testing  staging  scripting 
5 weeks ago by jm
Undefined Behavior in 2017
This is an extremely detailed post on the state of dynamic checkers in C/C++ (via the inimitable Marc Brooker):
Recently we’ve heard a few people imply that problems stemming from undefined behaviors (UB) in C and C++ are largely solved due to ubiquitous availability of dynamic checking tools such as ASan, UBSan, MSan, and TSan. We are here to state the obvious — that, despite the many excellent advances in tooling over the last few years, UB-related problems are far from solved — and to look at the current situation in detail.
via:marc-brooker  c  c++  coding  testing  debugging  dynamic-analysis  valgrind  asan  ubsan  tsan 
6 weeks ago by jm
Why did Apple, Amazon, Google stocks crash to the same price today?
Nasdaq said in a statement that "certain third parties improperly propagated test data that was distributed as part of the normal evening test procedures."

"For July 3, 2017, all production data was completed by 5:16 PM as expected per the early close of the markets," the statement continued. "Any data messages received post 5:16 PM should be deemed as test data and purged from direct data recipient's databases. UTP (Unlisted Trading Privileges) is asking all third parties to revert to Nasdaq Official Closing Prices effective at 5:16 PM."
testing  fail  stock-markets  nasdaq  test-data  test  production  integration-testing  test-in-prod 
6 weeks ago by jm
RIPE Atlas Probes
Interesting! We discussed similar ideas in $prevjob, good to see one hitting production globally.
RIPE Atlas probes form the backbone of the RIPE Atlas infrastructure. Volunteers all over the world host these small hardware devices that actively measure Internet connectivity through ping, traceroute, DNS, SSL/TLS, NTP and HTTP measurements. This data is collected and aggregated by the RIPE NCC, which makes the data publicly available. Network operators, engineers, researchers and even home users have used this data for a wide range of purposes, from investigating network outages to DNS anycasting to testing IPv6 connectivity.

Anyone can apply to host a RIPE Atlas probe. If your application is successful (based on your location), we will ship you a probe free of charge. Hosts simply need to plug their probe into their home (or other) network.

Probes are USB-powered and are connected to an Ethernet port on the host’s router or switch. They then automatically and continuously perform active measurements about the Internet’s connectivity, and this data is sent to the RIPE NCC, where it is aggregated and made publicly available. We also use this data to create several Internet maps and data visualisations. [....]

The hardware of the first and second generation probes is a Lantronix XPort Pro module with custom powering and housing built around it. The third generation probe is a modified TP-Link wireless router (model TL-MR 3020) with a small USB thumb drive in it, but this probe does not support WiFi.

(via irldexter)
via:irldexter  ripe  ncc  probing  active-monitoring  networking  ping  traceroute  dns  testing  http  ipv6  anycast  hardware  devices  isps 
7 weeks ago by jm
Determinism in League of Legends
Once again, deterministic replay/reruns of online games proves useful. John Carmack wrote a .plan about this many years ago:

(via Nelson)
clock  realtime  time  determinism  testing  replay  games  league-of-legends  via:nelson 
8 weeks ago by jm
A sandboxed local environment that replicates the live AWS Lambda environment almost identically – including installed software and libraries, file structure and permissions, environment variables, context objects and behaviors – even the user and running process are the same.

(via og-aws)
docker  lambda  images  testing  aws  serverless 
9 weeks ago by jm
An empirical study on the correctness of formally verified distributed systems
We must recognise that even formal verification can leave gaps and hidden assumptions that need to be teased out and tested, using the full battery of testing techniques at our disposal. Building distributed systems is hard. But knowing that shouldn’t make us shy away from trying to do the right thing, instead it should make us redouble our efforts in our quest for correctness.
formal-verification  software  coding  testing  tla+  chapar  fuzzing  verdi  bugs  papers 
12 weeks ago by jm
'What’s your ML Test Score? A rubric for ML production systems'
'Using machine learning in real-world production systems is complicated by a host of issues not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for assessing the production-readiness of an ML system. But how much testing and monitoring is enough? We present an ML Test Score rubric based on a set of actionable tests to help quantify these issues.'

Google paper on testable machine learning systems.
machine-learning  testing  ml  papers  google 
april 2017 by jm
atlassian/localstack: A fully functional local AWS cloud stack. Develop and test your cloud apps offline!
LocalStack provides an easy-to-use test/mocking framework for developing Cloud applications. Currently, the focus is primarily on supporting the AWS cloud stack.

LocalStack spins up the following core Cloud APIs on your local machine:

API Gateway at http://localhost:4567;
Kinesis at http://localhost:4568;
DynamoDB at http://localhost:4569;
DynamoDB Streams at http://localhost:4570;
Elasticsearch at http://localhost:4571;
S3 at http://localhost:4572;
Firehose at http://localhost:4573;
Lambda at http://localhost:4574;
SNS at http://localhost:4575;
SQS at http://localhost:4576

Additionally, LocalStack provides a powerful set of tools to interact with the cloud services, including a fully featured KCL Kinesis client with Python binding, simple setup/teardown integration for nosetests, as well as an Environment abstraction that allows to easily switch between local and remote Cloud execution.
aws  emulation  mocking  services  testing  dynamodb  s3 
march 2017 by jm
Testing@LMAX – Time Travel and the TARDIS
LMAX' approach to acceptance/system-testing time-dependent code. We are doing something similar in Swrve too, so finding that LMAX have taken a similar approach is a great indicator
lmax  testing  system-tests  acceptance-tests  tests  time 
november 2016 by jm
Testing Docker multi-host network performance - Percona Database Performance Blog
wow, Docker Swarm looks like a turkey right now if performance is important. Only "host" gives reasonably perf numbers
docker  networking  performance  ops  benchmarks  testing  swarm  overlay  calico  weave  bridge 
november 2016 by jm
Charity Majors responds to the CleverTap Mongo outage war story
This is a great blog post, spot on:
You can’t just go “dudes it’s faster” and jump off a cliff.  This shit is basic.  Test real production workloads. Have a rollback plan.  (Not for *10 days* … try a month or two.)

The only thing I'd nitpick on is that it's all very well to say "buy my book" or "come see me talk at Blahcon", but a good blog post or webpage would be thousands of times more useful.
databases  stateful-services  services  ops  mongodb  charity-majors  rollback  state  storage  testing  dba 
october 2016 by jm
Fake Time
'FakeTime is simulated time."
When testing RealTime software a simulator is often employed, which injects events into the program which do not occur in RealTime.
If you are writing software that controls or monitors some process that exists in the real world, it takes a long time to test it. But if you simulate it, there is no reason in the simulated software (if it is disconnected from the real world completely) not to make the apparent system time inside your software appear to move at a much faster rate. For example, I have written simulators that can verify the operational steps taken by industrial controllers over a 12 hour FakeTime period, which executes in 60 seconds. This allows me to run '12 hours' of fake time through my test cases and test scenarios, without waiting 12 hours for the testing to complete. Of course, after a successful fakeTime test, an industrial RealTime system still needs to be tested in non-simulated fashion.
faketime  time  testing  mocks  mocking  system-tests 
august 2016 by jm
Kelsey Hightower - healthz: Stop reverse engineering applications and start monitoring from the inside [video]
his Monitorama 2016 talk, talking about the "deep health checks" concept (which I implemented at Swrve earlier this year ;)
monitorama  health  deep-health-checks  healthz  testing  availability  reliability 
july 2016 by jm
'Convert the results of Infer (static analyzer by Facebook) to JUnit format results.'
junit  infer  jenkins  ui  testing 
july 2016 by jm
QA Instability Implies Production Instability
Invariably, when I see a lot of developer effort in production support I also find an unreliable QA environment. It is both unreliable in that it is frequently not available for testing, and unreliable in the sense that the system’s behavior in QA is not a good predictor of its behavior in production.
qa  testing  architecture  patterns  systems  production 
july 2016 by jm
'a Ruby regular expression editor and tester'. Great for prototyping regexps with a little set of test data, providing a neat permalink for the results
regex  regexp  ruby  tools  coding  web  editors  testing 
july 2016 by jm
Finding pearls; fuzzing ClamAV
great how-to for practical scanner fuzz testing
fuzz-testing  clamav  scanners  security  vulnerabilities  testing 
june 2016 by jm
Blue Ocean
new Jenkins UX. looks great
jenkins  tests  ui  ux  pipelines  testing 
may 2016 by jm
Why do Selenium-style record/replay tests of web applications break?
good data! Mostly because of element locations it seems....
selenium  testing  web  locators  papers  qa  tests 
may 2016 by jm
CD at LMAX: Testing into Production and Back Again
Chock-full of excellent build/test ideas from LMAX's Continuous Delivery setup. Lots of good ideas to steal
testing  lmax  build  test  continuous-delivery  dev 
may 2016 by jm
Gil Tene on benchmarking
'I would strongly encourage you to avoid repeating the mistakes of testing methodologies that focus entirely on max achievable throughput and then report some (usually bogus) latency stats at those max throughout modes. The techempower numbers are a classic example of this in play, and while they do provide some basis for comparing a small aspect of behavior (what I call the "how fast can this thing drive off a cliff" comparison, or "pedal to the metal" testing), those results are not very useful for comparing load carrying capacities for anything that actually needs to maintain some form of responsiveness SLA or latency spectrum requirements.'

Some excellent advice here on how to measure and represent stack performance.

Also: 'DON'T use or report standard deviation for latency. Ever. Except if you mean it as a joke.'
performance  benchmarking  testing  speed  gil-tene  latency  measurement  hdrhistogram  load-testing  load 
april 2016 by jm
Qualys SSL Server Test
pretty sure I had this bookmarked previously, but this is the current URL -- SSL/TLS quality report
ssl  tls  security  tests  ops  tools  testing 
march 2016 by jm
Jenkins 2.0
built-in support for CI/CD deployment pipelines, driven from a checked-in DSL file. great stuff, very glad to see them going this direction. (via Eric)
via:eric  jenkins  ci  cd  deployment  pipelines  testing  automation  build 
march 2016 by jm
Uncle Bob on "giving up TDD"
This is a great point, and one I'll be quoting:
Any design that is hard to test is crap. Pure crap. Why? Because if it's hard to test, you aren't going to test it well enough. And if you don't test it well enough, it's not going to work when you need it to work. And if it doesn't work when you need it to work the design is crap.

testing  tdd  uncle-bob  coding  design  testability  unit-tests 
march 2016 by jm
Cat-Herd's Crook
Nice approach from MongoDB:
we’ve recently gained momentum on standardizing our [cross-platform test] drivers. Human-readable, machine-testable specs, coded in YAML, prove which code conforms and which does not. These YAML tests are the Cat-Herd’s Crook: a tool to guide us all in the same direction.
mongodb  testing  unit-tests  yaml  multi-platform  coding 
march 2016 by jm
CharybdeFS: a new fault-injecting filesystem for software testing
a FUSE-based filesystem from ScyllaDB to test filesystem-related failure scenarios. great idea
fuse  software  testing  scylladb  filesystems  disk  charybdefs  fault-injection  tests 
february 2016 by jm
A Gulp Workflow for Amazon Lambda
'any nontrivial development of Lambda functions will require a simple, automated build/deploy process that also fills a couple of Lambda’s gaps such as the use of node modules and environment variables.'

See also : 'I am psyched about Amazon’s new Lambda service for asynchronous task processing, but the ideal development and testing cycle is really left to the engineer. While Amazon provides a web-based console, I prefer an approach that uses Mocha. Below you will find the gritty details using Kinesis events as a sample input.'
lambda  aws  services  testing  deployment  ops  mocha  gulp  javascript 
december 2015 by jm
Awesome new mock DynamoDB implementation:
An implementation of Amazon's DynamoDB, focussed on correctness and performance, and built on LevelDB (well, @rvagg's awesome LevelUP to be precise). This project aims to match the live DynamoDB instances as closely as possible (and is tested against them in various regions), including all limits and error messages.

Why not Amazon's DynamoDB Local? Because it's too buggy! And it differs too much from the live instances in a number of key areas.

We use DynamoDBLocal in our tests -- the availability of that tool is one of the key reasons we have adopted Dynamo so heavily, since we can safely test our code properly with it. This looks even better.
dynamodb  testing  unit-tests  integration-testing  tests  ops  dynalite  aws  leveldb 
november 2015 by jm
Fuzzing Raft for Fun and Publication
Good intro to fuzz-testing a distributed system; I've had great results using similar approaches in unit tests
fuzzing  fuzz-testing  testing  raft  akka  tests 
october 2015 by jm
Elasticsearch and data loss
"@alexbfree @ThijsFeryn [ElasticSearch is] fine as long as data loss is acceptable. . We lose ~1% of all writes on average."
elasticsearch  data-loss  reliability  data  search  aphyr  jepsen  testing  distributed-systems  ops 
october 2015 by jm
a proxy that mucks with your system and application context, operating at Layers 4 and 7, allowing you to simulate common failure scenarios from the perspective of an application under test; such as an API or a web application. If you are building a distributed system, Muxy can help you test your resilience and fault tolerance patterns.
proxy  distributed  testing  web  http  fault-tolerance  failure  injection  tcp  delay  resilience  error-handling 
september 2015 by jm
Chaos Engineering Upgraded
some details on Netflix's Chaos Monkey, Chaos Kong and other aspects of their availability/failover testing
architecture  aws  netflix  ops  chaos-monkey  chaos-kong  testing  availability  failover  ha 
september 2015 by jm
Henry Robinson on testing and fault discovery in distributed systems

'Let's talk about finding bugs in distributed systems for a bit.
These chaos monkey-style fault testing systems are all well and good, but by being application independent they're a very blunt instrument.
Particularly they make it hard to search the fault space for bugs in a directed manner, because they don't 'know' what the system is doing.
Application-aware scripting of faults in a dist. systems seems to be rarely used, but allows you to directly stress problem areas.
For example, if a bug manifests itself only when one RPC returns after some timeout, hard to narrow that down with iptables manipulation.
But allow a script to hook into RPC invocations (and other trace points, like DTrace's probes), and you can script very specific faults.
That way you can simulate cross-system integration failures, *and* write reproducible tests for the bugs they expose!
Anyhow, I've been doing this in Impala, and it's been very helpful. Haven't seen much evidence elsewhere.'
henry-robinson  testing  fault-discovery  rpc  dtrace  tracing  distributed-systems  timeouts  chaos-monkey  impala 
september 2015 by jm
a tool which simplifies tracing and testing of Java programs. Byteman allows you to insert extra Java code into your application, either as it is loaded during JVM startup or even after it has already started running. The injected code is allowed to access any of your data and call any application methods, including where they are private. You can inject code almost anywhere you want and there is no need to prepare the original source code in advance nor do you have to recompile, repackage or redeploy your application. In fact you can remove injected code and reinstall different code while the application continues to execute. The simplest use of Byteman is to install code which traces what your application is doing. This can be used for monitoring or debugging live deployments as well as for instrumenting code under test so that you can be sure it has operated correctly. By injecting code at very specific locations you can avoid the overheads which often arise when you switch on debug or product trace. Also, you decide what to trace when you run your application rather than when you write it so you don't need 100% hindsight to be able to obtain the information you need.
tracing  java  byteman  injection  jvm  ops  debugging  testing 
september 2015 by jm
How VW tricked the EPA's emissions testing system
In July 2015, CARB did some follow up testing and again the cars failed—the scrubber technology was present, but off most of the time. How this happened is pretty neat. Michigan’s Stefanopolou says computer sensors monitored the steering column. Under normal driving conditions, the column oscillates as the driver negotiates turns. But during emissions testing, the wheels of the car move, but the steering wheel doesn’t. That seems to have have been the signal for the “defeat device” to turn the catalytic scrubber up to full power, allowing the car to pass the test. Stefanopolou believes the emissions testing trick that VW used probably isn’t widespread in the automotive industry. Carmakers just don’t have many diesels on the road. And now that number may go down even more.

Depressing stuff -- but at least they think VW's fraud wasn't widespread.
fraud  volkswagen  vw  diesel  emissions  air-quality  epa  carb  catalytic-converters  testing 
september 2015 by jm
a specialized packet sniffer designed for displaying and logging HTTP traffic. It is not intended to perform analysis itself, but to capture, parse, and log the traffic for later analysis. It can be run in real-time displaying the traffic as it is parsed, or as a daemon process that logs to an output file. It is written to be as lightweight and flexible as possible, so that it can be easily adaptable to different applications.

via Eoin Brazil
via:eoinbrazil  httpry  http  networking  tools  ops  testing  tcpdump  tracing 
september 2015 by jm
Introducing the Software Testing Cupcake (Anti-Pattern)
good post on the risks of overweighting towards manual testing rather than low-level automated tests (via Tony Byrne)
qa  testing  via:tonyjbyrne  tests  antipatterns  dev 
september 2015 by jm
httpbin(1): HTTP Client Testing Service
Testing an HTTP Library can become difficult sometimes. RequestBin is fantastic for testing POST requests, but doesn't let you control the response. This exists to cover all kinds of HTTP scenarios. Additional endpoints are being considered.
http  httpbin  networking  testing  web  coding  hacks 
september 2015 by jm
Diffy: Testing services without writing tests
Play requests against 2 versions of a service. A fair bit more complex than simply replaying logged requests, which took 10 lines of a shell script last time I did it
http  testing  thrift  automation  twitter  diffy  diff  soa  tests 
september 2015 by jm
Call me Maybe: Chronos
Chronos (the Mesos distributed scheduler) comes out looking pretty crappy here
aphyr  mesos  chronos  cron  scheduling  outages  ops  jepsen  testing  partitions  cap 
august 2015 by jm
Late to this one -- a nice list of bad input (Unicode zero-width spaces, etc) for testing
testing  strings  text  data  unicode  utf-8  tests  input  corrupt 
august 2015 by jm
Testing without mocking in Scala
mocks are the sound of your code crying out, "please structure me differently!"

scala  via:jessitron  mocks  mock-objects  testing  testability  coding 
july 2015 by jm
Benchmarking GitHub Enterprise - GitHub Engineering
Walkthrough of debugging connection timeouts in a load test. Nice graphs (using matplotlib)
github  listen-backlog  tcp  debugging  timeouts  load-testing  benchmarking  testing  ops  linux 
july 2015 by jm
Improving testing by using real traffic from production
Gor, a very nice-looking tool to log and replay HTTP traffic, specifically designed to "tee" live traffic from production to staging for pre-release testing
gor  performance  testing  http  tcp  packet-capture  tests  staging  tee 
june 2015 by jm
Testing@LMAX – Aliases
Creating a user with our DSL looks like: registrationAPI.createUser("user");

You might expect this to create a user with the username ‘user’, but then we’d get conflicts between every test that wanted to call their user ‘user’ which would prevent tests from running safely against the same deployment of the exchange.

Instead, ‘user’ is just an alias that is only meaningful while this one test is running. The DSL creates a unique username that it uses when talking to the actual system. Typically this is done by adding a postfix so the real username is still reasonably understandable e.g. user-fhoai42lfkf.

Nice approach -- makes sense.
testing  lmax  system-tests  naming  coding 
june 2015 by jm
Performance Testing at LMAX
Good series of blog posts on the LMAX trading platform's performance testing strategy -- they capture live traffic off the wire, then build statistical models simulating its features. See also and .
performance  testing  tests  simulation  latency  lmax  trading  sniffing  packet-capture 
june 2015 by jm
Making End-to-End Tests Work
+1 to ALL of this. We are doing exactly the same in Swrve and it has radically improved our release quality
end-to-end  testing  acceptance-tests  tests  system-tests  lmax 
may 2015 by jm
Call me maybe: Aerospike
'Aerospike offers phenomenal latencies and throughput -- but in terms of data safety, its strongest guarantees are similar to Cassandra or Riak in Last-Write-Wins mode. It may be a safe store for immutable data, but updates to a record can be silently discarded in the event of network disruption. Because Aerospike’s timeouts are so aggressive–on the order of milliseconds -- even small network hiccups are sufficient to trigger data loss. If you are an Aerospike user, you should not expect “immediate”, “read-committed”, or “ACID consistency”; their marketing material quietly assumes you have a magical network, and I assure you this is not the case. It’s certainly not true in cloud environments, and even well-managed physical datacenters can experience horrible network failures.'
aerospike  outages  cap  testing  jepsen  aphyr  databases  storage  reliability 
may 2015 by jm
Smarter testing Java code with Spock Framework
hmm, looks quite nice as a potential next-gen JUnit replacement for unit tests
java  testing  bdd  tests  junit  unit-tests  spock  via:trishagee 
may 2015 by jm
Call me maybe: Elasticsearch 1.5.0
tl;dr: Elasticsearch still hoses data integrity on partition, badly
elasticsearch  reliability  data  storage  safety  jepsen  testing  aphyr  partition  network-partitions  cap 
may 2015 by jm
'a command line tool that (hopefully) makes it easier to deploy, update, and test functions for AWS Lambda.' much needed IMO -- Lambda is too closed
aws  lambda  mitch-garnaat  coding  testing  cli  kappa 
april 2015 by jm
Etsy's Release Management process
Good info on how Etsy use their Deployinator tool, end-to-end.

Slide 11: git SHA is visible for each env, allowing easy verification of what code is deployed.

Slide 14: Code is deployed to "princess" staging env while CI tests are running; no need to wait for unit/CI tests to complete.

Slide 23: smoke tests of pre-prod "princess" (complete after 8 mins elapsed).

Slide 31: dashboard link for deployed code is posted during deploy; post-release prod smoke tests are run by Jenkins. (short ones! they complete in 42 seconds)
deployment  etsy  deploy  deployinator  princess  staging  ops  testing  devops  smoke-tests  production  jenkins 
april 2015 by jm
Combining static model checking with dynamic enforcement using the Statecall Policy Language
This looks quite nice -- a model-checker "for regular programmers". Example model for ping(1):

<pre>01 automaton ping (int max_count, int count, bool can_timeout) {
02 Initialize;
03 during {
04 count = 0;
05 do {
06 Transmit_Ping;
07 either {
08 Receive_Ping;
09 } or (can_timeout) {
10 Timeout_Ping;
11 };
12 count = count + 1;
13 } until (count &gt;= max_count);
14 } handle {
16 Print_Summary;
17 };</pre>
ping  model-checking  models  formal-methods  verification  static  dynamic  coding  debugging  testing  distcomp  papers 
march 2015 by jm
"tees" all TCP traffic from one server to another. "widely used by companies in China"!
testing  benchmarking  performance  tcp  ip  tcpcopy  tee  china  regression-testing  stress-testing  ops 
march 2015 by jm
Correcting YCSB's Coordinated Omission problem
excellent walkthrough of CO and how it affects Yahoo!'s Cloud Storage Benchmarking platform
coordinated-omission  co  yahoo  ycsb  benchmarks  performance  testing 
march 2015 by jm
HP is trying to patent Continuous Delivery
This is appalling bollocks from HP:
On 1st March 2015 I discovered that in 2012 HP had filed a patent (WO2014027990) with the USPO for ‘Performance tests in a continuous deployment pipeline‘ (the patent was granted in 2014). [....] HP has filed several patents covering standard Continuous Delivery (CD) practices. You can help to have these patents revoked by providing ‘prior art’ examples on Stack Exchange.

In fairness, though, this kind of shit happens in most big tech companies. This is what happens when you have a broken software patenting system, with big rewards for companies who obtain shitty troll patents like these, and in turn have companies who reward the engineers who sell themselves out to write up concepts which they know have prior art. Software patents are broken by design!
cd  devops  hp  continuous-deployment  testing  deployment  performance  patents  swpats  prior-art 
march 2015 by jm
Vaurien, the Chaos TCP Proxy — Vaurien 1.8 documentation
Vaurien is basically a Chaos Monkey for your TCP connections. Vaurien acts as a proxy between your application and any backend. You can use it in your functional tests or even on a real deployment through the command-line.

Vaurien is a TCP proxy that simply reads data sent to it and pass it to a backend, and vice-versa. It has built-in protocols: TCP, HTTP, Redis & Memcache. The TCP protocol is the default one and just sucks data on both sides and pass it along.

Having higher-level protocols is mandatory in some cases, when Vaurien needs to read a specific amount of data in the sockets, or when you need to be aware of the kind of response you’re waiting for, and so on.

Vaurien also has behaviors. A behavior is a class that’s going to be invoked everytime Vaurien proxies a request. That’s how you can impact the behavior of the proxy. For instance, adding a delay or degrading the response can be implemented in a behavior.

Both protocols and behaviors are plugins, allowing you to extend Vaurien by adding new ones.

Last (but not least), Vaurien provides a couple of APIs you can use to change the behavior of the proxy live. That’s handy when you are doing functional tests against your server: you can for instance start to add big delays and see how your web application reacts.
proxy  tcp  vaurien  chaos-monkey  testing  functional-testing  failures  sockets  redis  memcache  http 
february 2015 by jm
Nice looking static code validation tool for Java, from Google. I recognise a few of these errors ;)
google  static  code-validation  lint  testing  java  coding 
february 2015 by jm
Nice wrapper for 'tc' and 'netem', for network latency/packet loss emulation
networking  testing  linux  tc  netem  latency  packet-loss  iptables 
january 2015 by jm
Of Course 23andMe's Plan Has Been to Sell Your Genetic Data All Along
Today, 23andMe announced what Forbes reports is only the first of ten deals with big biotech companies: Genentech will pay up to $60 million for access to 23andMe's data to study Parkinson's. You think 23andMe was about selling fun DNA spit tests for $99 a pop? Nope, it's been about selling your data all along.

testing  ethics  dna  genentech  23andme  parkinsons  diseases  health  privacy 
january 2015 by jm
Working Effectively with Unit Tests
$14.99 ebook, recommended by Steve Vinoski, looks good
unit-testing  testing  ebooks  jay-fields  tests  steve-vinoski  coding 
december 2014 by jm
How Etsy Does Continuous Integration for Mobile Apps
Very impressive. I particularly like the use of Tester Dojos to get through a backlog of unwritten tests -- we had a similar problem recently...
dojos  testing  ci  cd  builds  etsy  mobile  ios  shenzen  trylib  jenkins  tester-dojos 
december 2014 by jm
Good advice on running large-scale database stress tests
I've been bitten by poor key distribution in tests in the past, so this is spot on: 'I'd run it with Zipfian, Pareto, and Dirac delta distributions, and I'd choose read-modify-write transactions.'

And of course, a dataset bigger than all combined RAM.

Also: -- the "Biebermark", where just a single row out of the entire db is contended on in a read/modify/write transaction: "the inspiration for this is maintaining counts for [highly contended] popular entities like Justin Bieber and One Direction."
biebermark  benchmarks  testing  performance  stress-tests  databases  storage  mongodb  innodb  foundationdb  aphyr  measurement  distributions  keys  zipfian 
december 2014 by jm
/dev/full - Wikipedia, the free encyclopedia
This is handy!

'In Linux, /dev/full or the always full device[1][2] is a special file that always returns the error code ENOSPC (meaning "No space left on device") on writing, and provides an infinite number of null characters to any process that reads from it (similar to /dev/zero). This device is usually used when testing the behaviour of a program when it encounters a "disk full" error.'
dev  /dev/full  filesystems  devices  linux  testing  enospc  error-handling 
november 2014 by jm
'A constant throughput, correct latency-recording variant of wrk. This is a must-have when measuring network service latency -- corrects for Coordinated Omission error:
wrk's model, which is similar to the model found in many current load generators, computes the latency for a given request as the time from the sending of the first byte of the request to the time the complete response was received. While this model correctly measures the actual completion time of individual requests, it exhibits a strong Coordinated Omission effect, through which most of the high latency artifacts exhibited by the measured server will be ignored. Since each connection will only begin to send a request after receiving a response, high latency responses result in the load generator coordinating with the server to avoid measurement during high latency periods.
wrk  latency  measurement  tools  cli  http  load-testing  testing  load-generation  coordinated-omission  gil-tene 
november 2014 by jm
testing latency measurements using CTRL-Z
An excellent tip from Gil "HDRHistogram" Tene:
Good example of why I always "calibrate" latency tools with ^Z tests. If ^Z results don't make sense, don't use [the] tool. ^Z test math examples: If you ^Z for half the time, Max is obvious. [90th percentile] should be 80% of the ^Z stall time.
control-z  suspend  unix  testing  latencies  latency  measurement  percentiles  tips 
november 2014 by jm
Game Day Exercises at Stripe: Learning from `kill -9`
We’ve started running game day exercises at Stripe. During a recent game day, we tested failing over a Redis cluster by running kill -9 on its primary node, and ended up losing all data in the cluster. We were very surprised by this, but grateful to have found the problem in testing. This result and others from this exercise convinced us that game days like these are quite valuable, and we would highly recommend them for others.

Excellent post. Game days are a great idea. Also: massive Redis clustering fail
game-days  redis  testing  stripe  outages  ops  kill-9  failover 
october 2014 by jm
Load testing Apache Kafka on AWS
This is a very solid benchmarking post, examining Kafka in good detail. Nicely done. Bottom line:
I basically spend 2/3 of my work time torture testing and operationalizing distributed systems in production. There's some that I'm not so pleased with (posts pending in draft forever) and some that have attributes that I really love. Kafka is one of those systems that I pretty much enjoy every bit of, and the fact that it performs predictably well is only a symptom of the reason and not the reason itself: the authors really know what they're doing. Nothing about this software is an accident. Performance, everything in this post, is only a fraction of what's important to me and what matters when you run these systems for real. Kafka represents everything I think good distributed systems are about: that thorough and explicit design decisions win.
testing  aws  kafka  ec2  load-testing  benchmarks  performance 
october 2014 by jm
Netflix release new code to production before completing tests
Interesting -- I hadn't heard of this being an official practise anywhere before (although we actually did it ourselves this week)...
If a build has made it [past the 'integration test' phase], it is ready to be deployed to one or more internal environments for user-acceptance testing. Users could be UI developers implementing a new feature using the API, UI Testers performing end-to-end testing or automated UI regression tests. As far as possible, we strive to not have user-acceptance tests be a gating factor for our deployments. We do this by wrapping functionality in Feature Flags so that it is turned off in Production while testing is happening in other environments. 
devops  deployment  feature-flags  release  testing  integration-tests  uat  qa  production  ops  gating  netflix 
october 2014 by jm
On-Demand Jenkins Slaves With Amazon EC2
This is very likely where we'll be going for our acceptance tests in Swrve
testing  jenkins  ec2  spot-instances  scalability  auto-scaling  ops  build 
august 2014 by jm
Microservices - Not a free lunch! - High Scalability
Some good reasons not to adopt microservices blindly. Testability and distributed-systems complexity are my biggest fears
microservices  soa  devops  architecture  testing  distcomp 
august 2014 by jm
AWS Speed Test: What are the Fastest EC2 and S3 Regions?
My god, this test is awful -- this is how NOT to test networked infrastructure. (1) testing from a single EC2 instance in each region; (2) uploading to a single test bucket for each test; (3) results don't include min/max or percentiles, just an averaged measurement for each test. FAIL
fail  testing  networking  performance  ec2  aws  s3  internet 
august 2014 by jm
REST Commander: Scalable Web Server Management and Monitoring
We dynamically monitor and manage a large and rapidly growing number of web servers deployed on our infrastructure and systems. However, existing tools present major challenges when making REST/SOAP calls with server-specific requests to a large number of web servers, and then performing aggregated analysis on the responses. We therefore developed REST Commander, a parallel asynchronous HTTP client as a service to monitor and manage web servers. REST Commander on a single server can send requests to thousands of servers with response aggregation in a matter of seconds. And yes, it is open-sourced at

Feature highlights:

Click-to-run with zero installation;
Generic HTTP request template supporting variable-based replacement for sending server-specific requests;
Ability to send the same request to different servers, different requests to different servers, and different requests to the same server;
Maximum concurrency control (throttling) to accommodate server capacity;
Commander itself is also “as a service”: with its powerful REST API, you can define ad-hoc target servers, an HTTP request template, variable replacement, and a regular expression all in a single call. In addition, intuitive step-by-step wizards help you achieve the same functionality through a GUI.
rest  http  clients  load-testing  ebay  soap  async  testing  monitoring 
july 2014 by jm
Google's purify/valgrind-like concurrency checking tool:

'As a bonus, ThreadSanitizer finds some other types of bugs: thread leaks, deadlocks, incorrect uses of mutexes, malloc calls in signal handlers, and more. It also natively understands atomic operations and thus can find bugs in lock-free algorithms. [...] The tool is supported by both Clang and GCC compilers (only on Linux/Intel64). Using it is very simple: you just need to add a -fsanitize=thread flag during compilation and linking. For Go programs, you simply need to add a -race flag to the go tool (supported on Linux, Mac and Windows).'
concurrency  bugs  valgrind  threadsanitizer  threading  deadlocks  mutexes  locking  synchronization  coding  testing 
june 2014 by jm
Smart Integration Testing with Dropwizard, Flyway and Retrofit
Retrofit in particular looks neat. Mind you having worked with in-memory SQL databases before for integration testing, I'd never do that again -- too many interop glitches compared to "real world" MySQL/Postgres
testing  integration-testing  retrofit  flyway  dropwizard  logentries 
june 2014 by jm
« earlier      
per page:    204080120160

related tags

/dev/full  3g  23andme  a-b-testing  ab-testing  acceptance-testing  acceptance-tests  achievements  active-monitoring  aerospike  air-quality  akka  allergies  analytics  annotations  antipatterns  anycast  aphyr  architecture  archive  asan  async  australia  auto-scaling  automation  availability  aws  backups  bdd  benchmarking  benchmarks  biebermark  brain  bridge  bugs  build  builds  bureaucracy  byteman  c  c++  c-i  calico  cap  captures  carb  catalytic-converters  cd  cfengine  chaos-kong  chaos-monkey  chapar  charity-majors  charybdefs  chef  chefspec  china  chronos  ci  clamav  clean-code  cli  clients  clock  cloudflare  co  code  code-digger  code-review  code-smells  code-validation  coding  computer-says-no  concurrency  configuration  constraint-solving  continous-integration  continuous-delivery  continuous-deployment  control-z  conversion  coordinated-omission  corrupt  cron  csharp  cucumber  culture  data  data-loss  databases  dba  deadlocks  debugging  deep-health-checks  delay  dependencies  deploy  deployinator  deployment  design  determinism  dev  developer-experience  devex  devices  devops  dhh  diesel  diff  diffy  diseases  disk  distcomp  distributed  distributed-systems  distributions  dna  dns  docker  dojos  dot-net  dropwizard  dry  dtrace  dynalite  dynamic  dynamic-analysis  dynamodb  ebay  ebooks  ec2  echinacea  editors  elastic-scaling  elasticsearch  emissions  emulation  end-to-end  english  enospc  enstratius  epa  error-handling  error-injection  errors  espionage  ethics  etsy  excel  exercises  experiments  facebook  fail  failover  failure  failures  faketime  fault-discovery  fault-injection  fault-tolerance  feature-flags  filesystems  fillers  finance  firewalls  fishing  flyway  formal-methods  formal-verification  foundationdb  fraud  freebsd  functional-testing  funny  fuse  fuzz-testing  fuzzing  game-days  games  gaming  gating  gatling  genentech  gerrit  gil-tene  gilt  git  github  glusterfs  google  gor  graph  gsm  gulp  ha  hacks  har  hardware  haystack  hbase  hdrhistogram  health  healthz  henry-robinson  herbal-remedies  hermetic-servers  honeypot  hosting  hotspot  hp  http  http2  httpbin  httpry  ides  images  immigration  impala  infer  infoq  infrastructure  injection  innodb  input  integration-testing  integration-tests  internet  invalid  ios  ip  iptables  ipv6  iran  isps  israel  java  javascript  jay-fields  jenkins  jepsen  jmeter  jmh  jmx  jpmorgan  junit  jvm  kafka  kafkaesque  kappa  keys  kill-9  lambda  latencies  latency  league-of-legends  leveldb  likwid  lint  linux  listen-backlog  lmax  load  load-balancers  load-generation  load-testing  locators  locking  log4j  logentries  london-whale  lua  lzo  machine-learning  magic  martin-fowler  measurement  medicine  memcache  mesos  messaging  microservices  microsoft  mitch-garnaat  ml  mobile  mocha  mock-objects  mocking  mocks  model-checking  modelling  models  modularization  mongodb  monitorama  monitoring  monkey-patching  mpi  multi-platform  multicore  multithreading  murphi  mutexes  mysql  naming  nasdaq  ncc  netbsd  netcat  netem  netflix  network  network-partitions  networking  nose  numa  nytimes  openbsd  ops  osx  ouch  outages  overlay  packaging  packet-capture  packet-loss  pager-duty  pagerduty  papers  parkinsons  partition  partitions  patents  patterns  pdf  percentiles  percona  perf  performance  perl  php  ping  pipelines  politics  post-mortem  presentations  princess  prior-art  privacy  probing  production  profiling  programming  protocols  provisioning  proxy  pt-query-digest  puppet  python  qa  qcon  quality  quants  raft  rails  real-time  realtime  recording  redis  regex  regexp  regression-testing  release  reliability  replay  resilience  resiliency  rest  restoring  retrofit  riak  ripe  rollback  rpc  rspec  ruby  s3  safety  scala  scalability  scalatest  scaling  scams  scanners  scheduling  scripting  scylladb  sde  sde-fundamentals  search  security  selenium  serverless  serverspec  service-metrics  services  shadow-stack  shadowing  shenzen  simulation  slides  smoke-tests  sms  sniffing  soa  soap  sockets  software  solver  spdy  speed  spock  spot-instances  spreadsheets  sql  ssl  st-johns-wort  staging  startups  state  stateful-services  static  statistics  steak  steve-vinoski  stock-markets  storage  stress-testing  stress-tests  strings  stripe  stuxnet  suspend  swarm  swpats  synchronization  sysadmin  system-testing  system-tests  systems  talks  tc  tcp  tcpcopy  tcpdump  tdd  tech-talks  tee  test  test-data  test-doubles  test-in-prod  test-suites  testability  tester-dojos  testing  tests  text  threading  threadsanitizer  thrift  time  timecop  timeouts  tips  tla+  tls  tools  towatch  traceroute  tracing  trading  trylib  tsan  tun  turing-complete  twitter  uat  ubsan  udp  ui  ulysses-contract  uncle-bob  unicode  unit-testing  unit-tests  unix  usa  usability  utf-8  ux  vagrant  valgrind  validation  validity  value-at-risk  vaurien  verdi  verification  versioning  via:ben  via:dave-doran  via:eoinbrazil  via:eric  via:etienneshrdlu  via:hn  via:irldexter  via:james-hamilton  via:jessitron  via:kellabyte  via:lusis  via:marc-brooker  via:markdennehy  via:nelson  via:simonw  via:tonyjbyrne  via:trishagee  video  virtual-clock  vms  voice  voice-recognition  voldemort  volkswagen  vulnerabilities  vw  weave  web  witchcraft  work  workflows  workloads  wrk  xbox  yahoo  yaml  yammer  ycsb  zipfian 

Copy this bookmark: