jm + python   58

Fast Forward Labs: Probabilistic Data Structure Showdown: Cuckoo Filters vs. Bloom Filters
Nice comparison of a counting Bloom filter and a Cuckoo Filter, implemented in Python:
This post provides an update by exploring Cuckoo filters, a new probabilistic data structure that improves upon the standard Bloom filter. The Cuckoo filter provides a few advantages: 1) it enables dynamic deletion and addition of items 2) it can be easily implemented compared to Bloom filter variants with similar capabilities, and 3) for similar space constraints, the Cuckoo filter provides lower false positives, particularly at lower capacities. We provide a python implementation of the Cuckoo filter here, and compare it to a counting Bloom filter (a Bloom filter variant).
algorithms  probabilistic  approximation  bloom-filters  cuckoo-filters  sets  estimation  python 
november 2016 by jm
3 Reasons AWS Lambda Is Not Ready for Prime Time
This totally matches my own preconceptions ;)
When we at Datawire tried to actually use Lambda for a real-world HTTP-based microservice [...], we found some uncool things that make Lambda not yet ready for the world we live in:

Lambda is a building block, not a tool;
Lambda is not well documented;
Lambda is terrible at error handling

Lung skips these uncool things, which makes sense because they’d make the tutorial collapse under its own weight, but you can’t skip them if you want to work in the real world. (Note that if you’re using Lambda for event handling within the AWS world, your life will be easier. But the really interesting case in the microservice world is Lambda and HTTP.)
aws  lambda  microservices  datawire  http  api-gateway  apis  https  python  ops 
may 2016 by jm
Online chart maker for CSV and Excel data; make charts and dashboards online. One really nice feature is that charts made this way get permalinks, and can be easily inlined as PNGs or HTML5 divs. (See for an example.)
data  javascript  python  tools  visualization  dataviz  charts  graphing  web  plotly  plots  graphs 
january 2016 by jm
Baker Street
client-side 'service discovery and routing system for microservices' -- another Smartstack, then
python  router  smartstack  baker-street  microservices  service-discovery  routing  load-balancing  http 
october 2015 by jm
Levenshtein automata can be simple and fast
Nice algorithm for fuzzy text search with a limited Levenshtein edit distance using a DFA
dfa  algorithms  levenshtein  text  edit-distance  fuzzy-search  search  python 
june 2015 by jm
Airbnb's workflow management system; works off a DAG defined in Python code (ugh). Nice UI though, but I think Pinboard's take is neater
airbnb  open-source  python  workflow  jobs  cron  scheduling  batch 
june 2015 by jm
Redditor runs the secret Python code in Ex Machina
and finds:
when you run with python2.7 you get the following:
ISBN = 9780199226559
Which is Embodiment and the inner life: Cognition and Consciousness in the Space of Possible Minds. and so now I have a lot more respect for the Director.
python  movies  ex-machina  cool  books  easter-eggs 
may 2015 by jm
'CredStash is a very simple, easy to use credential management and distribution system that uses AWS Key Management System (KMS) for key wrapping and master-key storage, and DynamoDB for credential storage and sharing.'
aws  credstash  python  security  keys  key-management  secrets  kms 
april 2015 by jm
Pinterest's Hadoop workflow manager; 'scalable, reliable, simple, extensible' apparently. Hopefully it allows upgrades of a workflow component without breaking an existing run in progress, like LinkedIn's Azkaban does :(
python  pinterest  hadoop  workflows  ops  pinball  big-data  scheduling 
april 2015 by jm
Avro, mail # dev - bytes and fixed handling in Python implementation - 2014-09-04, 22:54
More Avro trouble with "bytes" fields! Avoid using "bytes" fields in Avro if you plan to interoperate with either of the Python implementations; they both fail to marshal them into JSON format correctly. This is the official "avro" library, which produces UTF-8 errors when a non-UTF-8 byte is encountered
bytes  avro  marshalling  fail  bugs  python  json  utf-8 
march 2015 by jm
tebeka / fastavro / issues / #11 - fastavro breaks dumping binary fixed [4] — Bitbucket
The Python "fastavro" library cannot correctly render "bytes" fields. This is a bug, and the maintainer is acting in a really crappy manner in this thread. Avoid this library
fastavro  fail  bugs  utf-8  bytes  encoding  asshats  open-source  python 
march 2015 by jm
Proving that Android’s, Java’s and Python’s sorting algorithm is broken (and showing how to fix it)
Wow, this is excellent work. A formal verification of Tim Peters' TimSort failed, resulting in a bugfix:
While attempting to verify TimSort, we failed to establish its instance invariant. Analysing the reason, we discovered a bug in TimSort’s implementation leading to an ArrayOutOfBoundsException for certain inputs. We suggested a proper fix for the culprit method (without losing measurable performance) and we have formally proven that the fix actually is correct and that this bug no longer persists.
timsort  algorithms  android  java  python  sorting  formal-methods  proofs  openjdk 
february 2015 by jm
A much better carbon-relay, written in C rather than Python. Linking as we've been using it in production for quite a while with no problems.
The main reason to build a replacement is performance and configurability. Carbon is single threaded, and sending metrics to multiple consistent-hash clusters requires chaining of relays. This project provides a multithreaded relay which can address multiple targets and clusters for each and every metric based on pattern matches.
graphite  carbon  c  python  ops  metrics 
january 2015 by jm
Java for Everything
Actually, I'm really agreeing with a lot of this. Particularly this part:
Programmers will cringe at writing some kind of command dispatch list:

if command = "up":
elif command = "status":
elif command = "revert":

so they’ll go off and write some introspecting auto-dispatch cleverness, but that takes longer to write and will surely confuse future readers who’ll wonder how the heck revert() ever gets called. Yet the programmer will incorrectly feel as though he saved himself time. This is the trap of the dynamic language. It feels like you’re being more productive, but aside from the first 10 minutes of a new program, you’re not. Just write the stupid dispatch manually and get on with the real work.

I've also gone right off dynamic languages for any kind of non-toy work.

Mind you he needs to get around to ditching Vim for a proper IDE. That's the key thing that makes coding in a statically-typed language really pleasant -- when graphical refactoring becomes easy and usable, and errors are visible as you type them...
java  coding  static-typing  python  unit-tests 
november 2014 by jm
Carbon vs Megacarbon and Roadmap ? · Issue #235 · graphite-project/carbon
Carbon is a great idea, but fundamentally, twisted doesn't do what carbon-relay or carbon-aggregator were built to do when hit with sustained and heavy throughput. Much to my chagrin, concurrency isn't one of python's core competencies.

+1, sadly. We are patching around the edges with half-released third-party C rewrites in our graphite setup, as we exceed the scale Carbon can support.
carbon  graphite  metrics  ops  python  twisted  scalability 
october 2014 by jm
Transform any text into a patent application
'An apparatus and device for staring into vacancy. The devices comprises a good cage, a narrow gangway, an electric pocket, a flower-bedecked cage, an insensitive felt.' (The Hunger Artist by Kafka)
python  patents  text  language  generator 
may 2014 by jm
Mock Boto: 'a library that allows your python tests to easily mock out the boto library.' Supports S3, Autoscaling, EC2, DynamoDB, ELB, Route53, SES, SQS, and STS currently, and even supports a standalone server mode, to act as a mock service for non-Python clients. Excellent!

(via Conor McDermottroe)
python  aws  testing  mocks  mocking  system-tests  unit-tests  coding  ec2  s3 
may 2014 by jm
Why Disqus made the Python->Go switchover
for their realtime component, from the horse's mouth:
at higher contention, the CPU was choking everything. Switching over to Go removed that contention for us, which was the primary issue that we were seeing.
python  languages  concurrency  go  threading  gevent  scalability  disqus  realtime  hn 
may 2014 by jm
'better dates and times for Python', to fix the absurd proliferation of slightly-incompatible Python date/time types and APIs. unfortunately, applies....
python  libraries  time  dates  timestamps  timezones  apis  proliferation  iso-8601 
may 2014 by jm
Scaling Realtime at DISQUS
Disqus' realtime architecture -- nginx PushStream module doing the heavy lifting, basically. See for the production nginx configs they use. I am very impressed that push-stream has grown to be so solid; it's a great way to deal with push from the sounds of it. now notes that some of the realtime backends are in Go. ("C1M and Nginx") is a more up to date presentation. It notes that PushStream supports "EventSource, WebSocket, Long Polling, and forever iframe". More sysctls and nginx tuning in that prez.
sysctl  nginx  tuning  go  disqus  realtime  push  eventsource  websockets  long-polling  iframe  python 
april 2014 by jm
vim-flake8 is a Vim plugin that runs the currently open file through Flake8, a static syntax and style checker for Python source code. It supersedes both vim-pyflakes and vim-pep8. Flake8 is a wrapper around PyFlakes (static syntax checker), PEP8 (style checker) and Ned's MacCabe script (complexity checker).

Recommended by several pythonistas of my acquaintance!
vim  python  syntax  error-checking  errors  flake8  editors  ides  coding 
april 2014 by jm
'a command line tool for Amazon's Simple Storage Service (S3). Written in Python, easy_install the package to install as an egg. Supports multithreaded operations for large volumes. Put, get, or delete many items concurrently, using a fixed-size pool of threads. Built on workerpool for multithreading and boto for access to the Amazon S3 API. Unix-friendly input and output. Pipe things in, out, and all around.'

MIT-licensed open source. (via Paul Dolan)
via:pdolan  s3  s3funnel  tools  ops  aws  python  mit  open-source 
april 2014 by jm
Another cool library from Roy Holder: 'an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.'

Similar to his Guava-Retrier java lib, but using a decorator.
retrying  python  libraries  tools  backoff  retry  error-handling 
april 2014 by jm
Dr. Bunsen / Time Warp
I use it to modify Time Machine’s backup behavior using weighted reservoir sampling. I built Time Warp to preserve important backup snapshots and prevent Time Machine from deleting them.

via Aman. Nifty!
backup  python  time-machine  decay  exponential-decay  weighting  algorithms  snapshots  ops 
january 2014 by jm
Peter Norvig writes a program to play regex golf with arbitrary lists
In response to XKCD 1313. This is excellent. It's reminiscent of my SpamAssassin SOUGHT-ruleset regexp-discovery algorithm, described in , albeit without the BLAST step intended to maximise pattern length and minimise false positives
python  regex  xkcd  blast  rule-discovery  spamassassin  rules  regexps  regular-expressions  algorithms  peter-norvig 
january 2014 by jm
Storm at - London Storm Meetup 2013-06-18
Not just a Storm success story. Interesting slides indicating where a startup *stopped* using Storm as realtime wasn't useful to their customers
storm  realtime  hadoop  cascading  python  cep  anti-spam  events  architecture  distcomp  low-latency  slides  rabbitmq 
october 2013 by jm
DevOps Eye for the Coding Guy: Metrics
a pretty good description of the process of adding service metrics to a Django webapp using graphite and statsd. Bookmarking mainly for the great real-time graphing hack at the end...
statsd  django  monitoring  metrics  python  graphite 
september 2013 by jm
The algorithm for a perfectly balanced photo gallery – Summit Stories from Crispy Mountain
Nice application of a partitioning exhaustive search algorithm using dynamic programming (via Tom)
algorithms  javascript  python  dynamic-programming  partitioning  images  gallery 
august 2013 by jm
Python Infrastructure Status - SSL Verification Errors on PyPI
There appears to be a problem affecting a number of users where SSL verification errors will be shown saying "" does not match "". As Best we can tell this appears to be related to the ISP. It seems to be affecting folks using O2 or O2 related companies. We've also reports of it affecting people using Free.

Cause appears to be one of the IP addresses returned in the Geo DNS for Europe returning a certificate for It's not clear at this time *why* that IP address is returning a certificate for

Turned out to be a routing loop in the London POP (via Mick Twomey)
via:micktwomey  o2  censorship  filtering  internet  ssl  tls  pypi  python  geodns  pki 
july 2013 by jm
Abusing hash kernels for wildly unprincipled machine learning
what, is this the first time our spam filtering approach of hashing a giant feature space is hitting mainstream machine learning? that can't be right!
ai  machine-learning  python  data  hashing  features  feature-selection  anti-spam  spamassassin 
april 2013 by jm
CRDT toolbox
'The CRDT toolbox provides a collection of basic Conflict-free replicated data types as well as a common interface for defining your own CRDTs'. - in Eric Moritz' github. Also includes some more links to CRDT background reading.
crdt  github  eric-moritz  python  algorithms 
april 2013 by jm
Reddit’s ranking algorithms
so Reddit uses the Wilson score confidence interval approach, it turns out; more details here (via Toby diPasquale)
ranking  rating  algorithms  popularity  python  wilson-score-interval  sorting  statistics  confidence-sort 
january 2013 by jm
Requests: HTTP for Humans
'an elegant and simple HTTP library for Python, built for human beings.' 'Requests is an Apache2 Licensed HTTP library, written in Python, for human beings. Python’s standard urllib2 module provides most of the HTTP capabilities you need, but the API is thoroughly broken. It was built for a different time — and a different web. It requires an enormous amount of work (even method overrides) to perform the simplest of tasks. Requests takes all of the work out of Python HTTP/1.1 — making your integration with web services seamless. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic, powered by urllib3, which is embedded within Requests.'
python  http  urllib  libraries  requests  via:mikeste 
january 2013 by jm
'a HTTP client mock library for Python, 100% inspired on ruby's FakeWeb [ ].' 'HTTPretty monkey patches Python's socket core module, reimplementing the HTTP protocol by mocking requests and responses.'
mocking  testing  http  python  ruby  unit-tests  tests  monkey-patching 
january 2013 by jm
Expensive lessons in Python performance tuning
some good advice for large-scale Python performance: prun and guppy for profiling, namedtuples for memory efficiency, and picloud for trivial EC2-based scale-out. (via Nelson)
picloud  prun  guppy  namedtuples  python  optimization  performance  tuning  profiling 
july 2012 by jm
'SSH-Based Configuration Management & Deployment'. deploy via SSH; no target-side daemons required. GPLv3 licensed, unfortunately :(
ansible  devops  configuration  deployment  sysadmin  python  ssh 
july 2012 by jm
Metricfire - Powerful Application Metrics Made Easy
Irish "metrics as a service" company, Python-native; they've just gone GA and announced their pricing plans
python  metrics  service-metrics 
april 2012 by jm
Python Idioms and Efficiency Suggestions
will have to run this by our resident Pythonistas in work as a good set of guidelines
idioms  programming  python  reference  tips  via:hn 
june 2011 by jm
pyflakes.vim - on-the-fly Python code checking in Vim
Vim gets a good IDE feature. 'highlights common Python errors like misspelling a variable name on the fly. It also warns about unused imports, redefined functions, etc.'
ide  vim  python  programming  via:preddit  coding 
april 2011 by jm
Silver Lining
'an application packaging format, a server configuration library, a cloud server management tool, a persistence management tool, and a tool to manage the application with respect to all these services over time.'  interesting, possibly too Pythonic
python  programming  dist  deployment  packaging  from delicious
april 2011 by jm
Quora’s Technology Examined
Python, Nginx, Tornado for COMET stuff, MySQL as a data store, memcached, Thrift, haproxy, AWS, Pylons.  fantastic, very detailed post (via Nelson)
quora  python  nginx  tornado  comet  mysql  memcached  thrift  haproxy  aws  pylons  via:nelson  from delicious
february 2011 by jm
A high-performance compressor optimized for binary data -- 'designed to transmit data to the processor cache faster than a traditional, non-compressed, direct memory fetch via memcpy()' (via Bill de hOra)
via:dehora  compression  memcpy  caching  l1  software  memory  optimization  performance  python  pytables  from delicious
october 2010 by jm
Mongrel2 Says, "Goodbye Python"
Linux distros ship ancient Python interpreters, hence it's impossible to rely on recent language features because they won't be there, making it useless to write code in Python. We have similar problems in perl-land, but it's easy enough to get by without the latest-and-greatest; maybe Python is different in that regard? ... or is it Zed?
zed-shaw  python  mongrel  distros  linux  sysadmin  packaging  from delicious
september 2010 by jm
torrent automation from RSS feeds; will work nicely with Transmission
bittorrent  automation  boxee  linux  python  rss  torrents  tv  flexget  from delicious
july 2010 by jm
A fast, fuzzy, full-text index using Redis
quite easy, using a Metaphone sound-like indexing scheme to provide the fuzz
metaphone  sounds-like  indexing  python  redis  search  full-text  fuzzy  from delicious
may 2010 by jm
Hudson at PyCon | the official hudson weblog
"Yeah, we used Buildbot until recently, then I switched us to Hudson and my life got a lot better" -- heh ;)
hudson  buildbot  python  ci  junit  from delicious
march 2010 by jm
Unit Testing Achievements
XBox style achievements for Python's 'nose' unit testing framework, eg. 'Major Letdown: all tests in a suite of at least 100 pass except the last.' genius!
via:simonw  funny  testing  unit-tests  python  xbox  gaming  achievements  nose  from delicious
march 2010 by jm
Mindblowing Python GIL
'presentation about how the Python GIL actually works and why it's even worse than most people even imagine.' A good chunk btw could be rephrased as 'pthreads is worse than most people even imagine'. pretty awful data, though
python  gil  locking  synchronization  ouch  performance  tuning  coding  interpreters  threads  pthreads  from delicious
february 2010 by jm
How do we kick our synchronous addiction?
great post on the hazards of programming in an async framework, and how damn hard it is. good comments thread too (via jzawodny)
via:jzawodny  coding  python  javascript  scalability  ruby  concurrency  erlang  async  node.js  twisted  from delicious
february 2010 by jm
A new way to deploy web applications
interesting Django/Pythonic approach, based on concepts from AppEngine
django  python  virtualenv  deployment  web-apps  linux  appengine  from delicious
january 2010 by jm
Google employees now discouraged from using Python for new projects
'You have to balance
Python's strengths with its weaknesses: your engineers may be more
productive using Python, but if they have to work around more
platform-level performance/scaling limitations as volume increases, do
you come out ahead? etc.'
google  performance  scalability  python  unladen-swallow  languages  via:preddit  from delicious
november 2009 by jm
sregex - Structural Regular Expressions
'The sregex module implements Structural Regular Expressions.' Python, Apache-licensed
sregex  python  via:adulau  regexp  robpike  regex  library  text  structural  parsing  from delicious
november 2009 by jm
Why I like Redis
Simon Willison plugs Redis as a good datastore for quick-hack scripts with requirements for lots of fast, local data storage -- the kind of thing I'd often use a DB_File for
python  storage  databases  schemaless  nosql  redis  simon-willison  data-store  from delicious
october 2009 by jm
The technology behind Tornado, FriendFeed's web server
more on the new async HTTP server from FriendFeed/Facebook, in Python. looks lovely
async  http  epoll  python  comet  long-poll  facebook  scaling  scalability  web  friendfeed  tornado  opensource  from delicious
september 2009 by jm
Tornado Web Server
'an open source version of the scalable, non-blocking web server and tools that power FriendFeed. The FriendFeed application is written using a web framework that looks a bit like or Google's webapp, but with additional tools and optimizations to take advantage of the underlying non-blocking (epoll) infrastructure.'
epoll  open-source  python  http  scalability  facebook  scaling  web  from delicious
september 2009 by jm
'File-based, rather than tuple-based processing'; based around UNIX command-line toolset; good UNIXish UI; lots of caching of intermediate results; low setup overhead -- although it does require a shared POSIX filesystem, e.g. NFS, for synchronization
networking  python  opensource  grid  map-reduce  filemap  files  unix  command-line  parallel  distcomp 
july 2009 by jm

related tags

achievements  ai  airbnb  algorithms  android  animated-gifs  ansible  anti-spam  api-gateway  apis  appengine  approximation  architecture  asshats  async  automation  avro  aws  backoff  backup  baker-street  batch  big-data  bittorrent  blast  bloom-filters  books  boxee  bugs  buildbot  bytes  c  caching  carbon  cascading  censorship  cep  charts  ci  coding  comet  command-line  compression  concurrency  confidence-sort  configuration  cool  crdt  credstash  cron  cuckoo-filters  data  data-store  databases  dataviz  datawire  dates  decay  deployment  devops  dfa  disqus  dist  distcomp  distros  django  dynamic-programming  easter-eggs  ec2  edit-distance  editors  encoding  epoll  eric-moritz  erlang  error-checking  error-handling  errors  estimation  events  eventsource  ex-machina  exponential-decay  facebook  fail  fastavro  feature-selection  features  filemap  files  filtering  flake8  flexget  formal-methods  friendfeed  full-text  funny  fuzzy  fuzzy-search  gallery  gaming  generator  geodns  gevent  gif  gil  github  go  google  graphing  graphite  graphs  grid  guppy  hadoop  haproxy  hashing  hn  http  https  hudson  ide  ides  idioms  iframe  images  indexing  internet  interpreters  iso-8601  java  javascript  jobs  json  junit  key-management  keys  kms  l1  lambda  language  languages  levenshtein  libraries  library  linux  load-balancing  locking  long-poll  long-polling  looping  low-latency  machine-learning  map-reduce  marshalling  memcached  memcpy  memory  metaphone  metrics  microservices  mit  mocking  mocks  mongrel  monitoring  monkey-patching  movies  mysql  namedtuples  networking  nginx  node.js  nose  nosql  o2  open-source  openjdk  opensource  ops  optimization  ouch  packaging  parallel  parsing  partitioning  patents  performance  peter-norvig  picloud  pinball  pinterest  pki  plotly  plots  popularity  probabilistic  profiling  programming  proliferation  proofs  prun  pthreads  push  pylons  pymovie  pypi  pytables  python  quora  rabbitmq  ranking  rating  realtime  redis  reference  regex  regexp  regexps  regular-expressions  requests  retry  retrying  robpike  router  routing  rss  ruby  rule-discovery  rules  s3  s3funnel  scalability  scaling  scheduling  schemaless  search  secrets  security  serverless  service-discovery  service-metrics  sets  simon-willison  slides  smartstack  snapshots  software  sorting  sounds-like  spamassassin  sregex  ssh  ssl  static-typing  statistics  statsd  storage  storm  structural  synchronization  syntax  sysadmin  sysctl  system-tests  testing  tests  text  threading  threads  thrift  time  time-machine  timestamps  timezones  timsort  tips  tls  tools  tornado  torrents  tuning  tv  twisted  unit-tests  unix  unladen-swallow  urllib  utf-8  via:adulau  via:dehora  via:hn  via:jzawodny  via:micktwomey  via:mikeste  via:nelson  via:pdolan  via:preddit  via:simonw  video  vim  virtualenv  visualization  web  web-apps  websockets  weighting  wilson-score-interval  workflow  workflows  xbox  xkcd  zed-shaw 

Copy this bookmark: