jm + libraries   30
Foursquare's open source repo, where they extract reusable components for open sourcing -- I like the approach of using a separate top level module path for OSS bits
open-source  oss  foursquare  libraries  maintainance  coding  git  monorepos 
5 weeks ago by jm
The Rise of Pirate Libraries
The history of this is fascinating:
Today’s pirate libraries have their roots in the work of Russian academics to digitize texts in the 1990s. Scholars in that part of the world had long had a thriving practice of passing literature and scientific information underground, in opposition to government censorship—part of the samizdat culture, in which banned documents were copied and passed hand to hand through illicit channels. Those first digital collections were passed freely around, but when their creators started running into problems with copyright, their collections “retreated from the public view,” writes Balázs Bodó, a piracy researcher based at the University of Amsterdam. “The text collections were far too valuable to simply delete,” he writes, and instead migrated to “closed, membership-only FTP servers.” [....]

There’s always been osmosis within the academic community of copyrighted materials from people with access to scholar without. “Much of the life of a research academic in Kazakhstan or Iran or Malaysia involves this informal diffusion of materials across the gated walls of the top universities,” he says.
pirates  pirate-libraries  libraries  archival  history  russia  ussr  samizdat  samizdata  academia  papers 
april 2016 by jm
Publish JVM and Android libraries direct from github -- it'll build and package a lib on the fly, caching them via CDN
build  github  java  maven  gradle  dependencies  packaging  libraries 
april 2016 by jm
Dropwizard for Go, basically:
a distributed programming toolkit for building microservices in large organizations. We solve common problems in distributed systems, so you can focus on your business logic.
microservices  go  golang  http  libraries  open-source  rpc  circuit-breakers 
january 2016 by jm
A new HTTP client library for Android and Java, with a lot of nice features:
HTTP/2 and SPDY support allows all requests to the same host to share a socket.

Connection pooling reduces request latency (if SPDY isn’t available).

Transparent GZIP shrinks download sizes.

Response caching avoids the network completely for repeat requests.

OkHttp perseveres when the network is troublesome: it will silently recover from common connection problems. If your service has multiple IP addresses OkHttp will attempt alternate addresses if the first connect fails. This is necessary for IPv4+IPv6 and for services hosted in redundant data centers. OkHttp initiates new connections with modern TLS features (SNI, ALPN), and falls back to TLS 1.0 if the handshake fails.

Using OkHttp is easy. Its 2.0 API is designed with fluent builders and immutability. It supports both synchronous blocking calls and async calls with callbacks.
android  http  java  libraries  okhttp  http2  spdy  microservices  jdk 
july 2015 by jm
On Ruby
The horrors of monkey-patching:
I call out the Honeybadger gem specifically because was the most recent time I'd been bit by a seemingly good thing promoted in the community: monkey patching third party code. Now I don't fault Honeybadger for making their product this way. It provides their customers with direct business value: "just require 'honeybadger' and you're done!" I don't agree with this sort of practice. [....]

I distrust everything [in Ruby] but a small set of libraries I've personally vetted or are authored by people I respect. Why is this important? Without a certain level of scrutiny you will introduce odd and hard to reproduce bugs. This is especially important because Ruby offers you absolutely zero guarantee whatever the state your program is when a given method is dispatched. Constants are not constants. Methods can be redefined at run time. Someone could have written a time sensitive monkey patch to randomly undefined methods from anything in ObjectSpace because they can. This example is so horribly bad that no one should every do, but the programming language allows this. Much worse, this code be arbitrarily inject by some transitive dependency (do you even know what yours are?).
ruby  monkey-patching  coding  reliability  bugs  dependencies  libraries  honeybadger  sinatra 
april 2015 by jm
Our latest open source release from Swrve Labs: an Apache-licensed, SLF4J-compatible, simple, fluent API for rate-limited logging in Java:

'A RateLimitedLog object tracks the rate of log message emission, imposes an internal rate limit, and will efficiently suppress logging if this is exceeded. When a log is suppressed, at the end of the limit period, another log message is output indicating how many log lines were suppressed. This style of rate limiting is the same as the one used by UNIX syslog; this means it should be comprehensible, easy to predict, and familiar to many users, unlike more complex adaptive rate limits.'

We've been using this in production for months -- it's pretty nifty ;) Never fear your logs again!
logs  logging  coding  java  open-source  swrve  slf4j  rate-limiting  libraries 
february 2015 by jm
'Prometheus instrumentation library for JVM applications'
Good example of a clean java OSS release, from Soundcloud. will be copying bits of this myself soon...
prometheus  java  libraries  oss  github  sonatype  maven  releases 
february 2015 by jm
"Aeron: High-Performance Open Source Message Transport" [slides, PDF]
a new networked pub/sub library from Martin "Disruptor" Thompson, based around a replicated, persistent log of messages, with exceptionally low latency. Apache-licensed. Very similar to the realtime messaging stack we've built in Swrve. ;)
realtime  messaging  pub-sub  ipc  queues  transports  martin-thompson  slides  latencies  open-source  java  libraries 
november 2014 by jm
Introducing Proxygen, Facebook's C++ HTTP framework
Facebook's take on libevent, I guess:
We are excited to announce the release of Proxygen, a collection of C++ HTTP libraries, including an easy-to-use HTTP server. In addition to HTTP/1.1, Proxygen (rhymes with "oxygen") supports SPDY/3 and SPDY/3.1. We are also iterating and developing support for HTTP/2.

Proxygen is not designed to replace Apache or nginx — those projects focus on building extremely flexible HTTP servers written in C that offer good performance but almost overwhelming amounts of configurability. Instead, we focused on building a high performance C++ HTTP framework with sensible defaults that includes both server and client code and that's easy to integrate into existing applications. We want to help more people build and deploy high performance C++ HTTP services, and we believe that Proxygen is a great framework to do so.
c++  facebook  http  servers  libevent  https  spdy  proxygen  libraries 
november 2014 by jm
Smart Clients, haproxy, and Riak
Good, thought-provoking post on good client library approaches for complex client-server systems, particularly distributed stores like Voldemort or Riak. I'm of the opinion that a smart client lib is unavoidable, and in fact essential, since the clients are part of the distributed system, personally.
clients  libraries  riak  voldemort  distsys  haproxy  client-server  storage 
october 2014 by jm
a client side IPC library that is battle-tested in cloud. It provides the following features:

Load balancing;
Fault tolerance;
Multiple protocol (HTTP, TCP, UDP) support in an asynchronous and reactive model;
Caching and batching.

I like the integration of Eureka and Hystrix in particular, although I would really like to read more about Eureka's approach to availability during network partitions and CAP. has some interesting discussion on the topic. It actually sounds like the Eureka approach is more correct than using ZK: 'Eureka is available. ZooKeeper, while tolerant against single node failures, doesn't react well to long partitioning events. For us, it's vastly more important that we maintain an available registry than a necessary consistent registry. If us-east-1d sees 23 nodes, and us-east-1c sees 22 nodes for a little bit, that's OK with us.'

See also which corroborates this:

I went into one of the instances and quickly did an iptables DROP on all packets coming from the other two instances. This would simulate an availability zone continuing to function, but that zone losing network connectivity to the other availability zones. What I saw was that the two other instances noticed that the first server “going away”, but they continued to function as they still saw a majority (66%). More interestingly the first instance noticed the other two servers “going away” dropping the ensemble availability to 33%. This caused the first server to stop serving requests to clients (not only writes, but also reads). [...]

To me this seems like a concern, as network partitions should be considered an event that should be survived. In this case (with this specific configuration of zookeeper) no new clients in that availability zone would be able to register themselves with consumers within the same availability zone. Adding more zookeeper instances to the ensemble wouldn’t help considering a balanced deployment as in this case the availability would always be majority (66%) and non-majority (33%).
netflix  ribbon  availability  libraries  java  hystrix  eureka  aws  ec2  load-balancing  networking  http  tcp  architecture  clients  ipc 
july 2014 by jm
'better dates and times for Python', to fix the absurd proliferation of slightly-incompatible Python date/time types and APIs. unfortunately, applies....
python  libraries  time  dates  timestamps  timezones  apis  proliferation  iso-8601 
may 2014 by jm
Sirius by Comcast
At Comcast, our applications need convenient, low-latency access to important reference datasets. For example, our XfinityTV websites and apps need to use entertainment-related data to serve almost every API or web request to our datacenters: information like what year Casablanca was released, or how many episodes were in Season 7 of Seinfeld, or when the next episode of the Voice will be airing (and on which channel!).

We traditionally managed this information with a combination of relational databases and RESTful web services but yearned for something simpler than the ORM, HTTP client, and cache management code our developers dealt with on a daily basis. As main memory sizes on commodity servers continued to grow, however, we asked ourselves: How can we keep this reference data entirely in RAM, while ensuring it gets updated as needed and is easily accessible to application developers?

The Sirius distributed system library is our answer to that question, and we're happy to announce that we've made it available as an open source project. Sirius is written in Scala and uses the Akka actor system under the covers, but is easily usable by any JVM-based language.

Also includes a Paxos implementation with "fast follower" read-only slave replication. ASL2-licensed open source.

The only thing I can spot to be worried about is speed of startup; they note that apps need to replay a log at startup to rebuild state, which can be slow if unoptimized in my experience.

Update: in a twitter conversation at , Jon Moore indicated they haven't had problems with this even with 'datasets consuming 10-20GB of heap', and have 'benchmarked a 5-node Sirius ingest cluster up to 1k updates/sec write throughput.' That's pretty solid!
open-source  comcast  paxos  replication  read-only  datastores  storage  memory  memcached  redis  sirius  scala  akka  jvm  libraries 
april 2014 by jm
Dan Kaminsky on Heartbleed
When I said that we expected better of OpenSSL, it’s not merely that there’s some sense that security-driven code should be of higher quality.  (OpenSSL is legendary for being considered a mess, internally.)  It’s that the number of systems that depend on it, and then expose that dependency to the outside world, are considerable.  This is security’s largest contributed dependency, but it’s not necessarily the software ecosystem’s largest dependency.  Many, maybe even more systems depend on web servers like Apache, nginx, and IIS.  We fear vulnerabilities significantly more in libz than libbz2 than libxz, because more servers will decompress untrusted gzip over bzip2 over xz.  Vulnerabilities are not always in obvious places – people underestimate just how exposed things like libxml and libcurl and libjpeg are.  And as HD Moore showed me some time ago, the embedded space is its own universe of pain, with 90’s bugs covering entire countries.

If we accept that a software dependency becomes Critical Infrastructure at some level of economic dependency, the game becomes identifying those dependencies, and delivering direct technical and even financial support.  What are the one million most important lines of code that are reachable by attackers, and least covered by defenders?  (The browsers, for example, are very reachable by attackers but actually defended pretty zealously – FFMPEG public is not FFMPEG in Chrome.)

Note that not all code, even in the same project, is equally exposed.    It’s tempting to say it’s a needle in a haystack.  But I promise you this:  Anybody patches Linux/net/ipv4/tcp_input.c (which handles inbound network for Linux), a hundred alerts are fired and many of them are not to individuals anyone would call friendly.  One guy, one night, patched OpenSSL.  Not enough defenders noticed, and it took Neel Mehta to do something.
development  openssl  heartbleed  ssl  security  dan-kaminsky  infrastructure  libraries  open-source  dependencies 
april 2014 by jm
A sane Google Protocol Buffers library for Ruby. It's all about being Buf; ProtoBuf.
protobuf  google  protocol-buffers  ruby  coding  libraries  gems  open-source 
april 2014 by jm
Another cool library from Roy Holder: 'an Apache 2.0 licensed general-purpose retrying library, written in Python, to simplify the task of adding retry behavior to just about anything.'

Similar to his Guava-Retrier java lib, but using a decorator.
retrying  python  libraries  tools  backoff  retry  error-handling 
april 2014 by jm
an easily embeddable, decentralized, k-ordered unique ID generator. It can use the same encoded ID format as Twitter's Snowflake or Boundary's Flake implementations as well as any other customized encoding without too much effort. The fauxflake-core module has no external dependencies and is meant to be about as light as possible while still delivering useful functionality. Essentially, if you want to be able to generate a unique identifier across your infrastructure with reasonable assurances about collisions, then you might find this useful.

From the same guy as the excellent Guava Retrier library; java, ASL2-licensed open source.
open-source  java  asl2  fauxflake  tools  libraries  unique-ids  ids  unique  snowflake  distsys 
april 2014 by jm
Apache Curator
Netflix open-source library to make using ZooKeeper from Java less of a PITA. I really wish I'd used this now, having reimplemented some key parts of it after failures in prod ;)
zookeeper  netflix  apache  curator  java  libraries  open-source 
january 2014 by jm
British Library uploads one million public domain images to the net for remix and reuse - Boing Boing
this is excellent!
The British Library has uploaded one million public domain scans from 17th-19th century books to Flickr! They're embarking on an ambitious programme to crowdsource novel uses and navigation tools for the huge corpus. Already, the manifest of image descriptions is available through Github. This is a remarkable, public spirited, archival project, and the British Library is to be loudly applauded for it!
british-library  libraries  public-domain  art  graphics  images  history  19th-century  17th-century  18th-century  books  crowdsourcing  via:boingboing  github 
december 2013 by jm
Reactor hits GA
'It can't just be Big Data, it has to be Fast Data: Reactor 1.0 goes GA':

Reactor provides the necessary abstractions to build high-throughput, low-latency--what we now call "fast data"--applications that absolutely must work with thousands, tens of thousands, or even millions of concurrent requests per second. Modern JVM applications must be built on a solid foundation of asynchronous and reactive components that efficiently manage the execution of a very large number of tasks on a very small number of system threads. Reactor is specifically designed to help you build these kinds of applications without getting in your way or forcing you to work within an opinionated pattern.

Featuring the LMAX Disruptor ringbuffer, the JavaChronicle fast persistent message-passing queue, Groovy closures, and Netty 4.0. This looks very handy indeed....
disruptor  reactive-programming  reactor  async  libraries  java  jvm  frameworks  spring  netty  fast-data 
november 2013 by jm
HyperLevelDB: A High-Performance LevelDB Fork
'HyperLevelDB improves on LevelDB in two key ways:
Improved parallelism: HyperLevelDB uses more fine-grained locking internally to provide higher throughput for multiple writer threads.
Improved compaction: HyperLevelDB uses a different method of compaction that achieves higher throughput for write-heavy workloads, even as the database grows.'
leveldb  storage  key-value-stores  persistence  unix  libraries  open-source 
june 2013 by jm
Written by Google, this library is a flexible, efficient, and powerful Java client library for accessing any resource on the web via HTTP. It features a pluggable HTTP transport abstraction that allows any low-level library to be used, such as, Apache HTTP Client, or URL Fetch on Google App Engine. It also features efficient JSON and XML data models for parsing and serialization of HTTP response and request content. The JSON and XML libraries are also fully pluggable, including support for Jackson and Android's GSON libraries for JSON.

Not quite as simple an API as Python's requests, sadly, but still an improvement on the verbose Apache HttpComponent API. Good support for unit testing via a built-in mock-response class. Still in beta
google  beta  software  http  libraries  json  xml  transports  protocols 
april 2013 by jm
Netflix Curator
a high-level API that greatly simplifies using ZooKeeper. It adds many features that build on ZooKeeper and handles the complexity of managing connections to the ZooKeeper cluster and retrying operations. Some of the features are:

Automatic connection management: There are potential error cases that require ZooKeeper clients to recreate a connection and/or retry operations. Curator automatically and transparently (mostly) handles these cases.

Cleaner API: simplifies the raw ZooKeeper methods, events, etc.; provides a modern, fluent interface

Recipe implementations (see Recipes): Leader election, Shared lock, Path cache and watcher, Distributed Queue, Distributed Priority Queue
zookeeper  java  netflix  distcomp  libraries  oss  open-source  distributed 
march 2013 by jm
Requests: HTTP for Humans
'an elegant and simple HTTP library for Python, built for human beings.' 'Requests is an Apache2 Licensed HTTP library, written in Python, for human beings. Python’s standard urllib2 module provides most of the HTTP capabilities you need, but the API is thoroughly broken. It was built for a different time — and a different web. It requires an enormous amount of work (even method overrides) to perform the simplest of tasks. Requests takes all of the work out of Python HTTP/1.1 — making your integration with web services seamless. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic, powered by urllib3, which is embedded within Requests.'
python  http  urllib  libraries  requests  via:mikeste 
january 2013 by jm
Marsh's Library
Dublin museum of antiquarian books, open to the public -- well worth a visit, apparently (I will definitely be making my way there soon I suspect), to check out their new "Marvels of Science" exhibit. Not only that though, but they have a beautiful website with some great photos -- exemplary
museum  dublin  ireland  libraries  books  science 
july 2012 by jm

related tags

17th-century  18th-century  19th-century  academia  aeron  akka  android  apache  apis  architecture  archival  art  asl2  async  availability  aws  backoff  beta  books  british-library  bugs  build  c++  circuit-breakers  client-server  client-side  clients  coding  comcast  crowdsourcing  curator  dan-kaminsky  datastores  dates  dependencies  development  disruptor  distcomp  distributed  distsys  dublin  ec2  error-handling  eureka  facebook  fast-data  fauxflake  foursquare  frameworks  gems  git  github  go  golang  google  gradle  graphics  haproxy  heartbleed  history  honeybadger  http  http2  https  hystrix  ids  images  infrastructure  ipc  ireland  iso-8601  java  jdk  json  jvm  key-value-stores  latencies  leveldb  libevent  libraries  load-balancing  logging  logs  maintainance  martin-thompson  maven  mechanical-sympathy  memcached  memory  messaging  messing  microservices  monkey-patching  monorepo  monorepos  museum  netflix  netty  networking  okhttp  open-source  openssl  oss  packaging  papers  paxos  performance  persistence  pirate-libraries  pirates  proliferation  prometheus  protobuf  protocol-buffers  protocols  proxygen  pub-sub  public-domain  python  queueing  queues  rate-limiting  reactive-programming  reactor  read-only  realtime  redis  releases  reliability  replication  repository  requests  retry  retrying  riak  ribbon  rpc  ruby  russia  samizdat  samizdata  scala  science  security  servers  sinatra  sirius  slf4j  slides  snowflake  software  sonatype  spdy  spring  ssl  storage  swrve  tcp  time  timestamps  timezones  tools  transports  unique  unique-ids  unix  urllib  ussr  via:boingboing  via:mikeste  voldemort  xml  zookeeper 

Copy this bookmark: