jm + apache   30

[LEGAL-303] ASF, RocksDB, and Facebook's BSD+patent grant licensing
Facebook's licensing includes a "nuclear option" if a user acts in a way interpreted by Facebook as competing with them; the ASF has marked the license as "Category-X", and may not be included in Apache projects as a result. Looks like RocksDB are going to relicense as dual GPLv2/ASL2 to clear this up, but React.js has not shown any plans to do so yet
react  rocksdb  licensing  asl2  apache  asf  facebook  open-source  patents 
4 weeks ago by jm
Open-sourcing PalDB, a lightweight companion for storing side data
a new LinkedIn open source data store, for write-once/read-mainly side data, java, Apache licensed.

RocksDB discussion:
linkedin  open-source  storage  side-data  data  config  paldb  java  apache  databases 
october 2015 by jm
excellent offline mapping app MAPS.ME goes open source
"MAPS.ME is an open source cross-platform offline maps application, built on top of crowd-sourced OpenStreetMap data. It was publicly released for iOS and Android."  mapping  maps  open-source  apache  ios  android  mobile 
september 2015 by jm
an object pooling library for Java. Use it to recycle objects that are expensive to create. The library will take care of creating and destroying your objects in the background. Stormpot is very mature, is used in production, and has done over a trillion claim-release cycles in testing. It is faster and scales better than any competing pool.

Apache-licensed, and extremely fast:
java  stormpot  object-pooling  object-pools  pools  allocation  gc  open-source  apache  performance 
september 2015 by jm
Festina Lente
A lovely eulogy for Nóirín Plunkett, from Rich Bowen. RIP Nóirín :(
noirin-plunkett  memorials  eulogies  rip  asf  apache 
july 2015 by jm
Apache HTrace
a Zipkin-compatible distributed-system tracing framework in Java, in the Apache Incubator
zipkin  tracing  trace  apache  incubator  java  debugging 
may 2015 by jm
Spark 1.2 released
This is the version with the superfast petabyte-sort record:
Spark 1.2 includes several cross-cutting optimizations focused on performance for large scale workloads. Two new features Databricks developed for our world record petabyte sort with Spark are turned on by default in Spark 1.2. The first is a re-architected network transfer subsystem that exploits Netty 4’s zero-copy IO and off heap buffer management. The second is Spark’s sort based shuffle implementation, which we’ve now made the default after significant testing in Spark 1.1. Together, we’ve seen these features give as much as 5X performance improvement for workloads with very large shuffles.
spark  sorting  hadoop  map-reduce  batch  databricks  apache  netty 
december 2014 by jm
Felix says:

'Like I said, I'd like to move it to a more general / non-personal repo in the future, but haven't had the time yet. Anyway, you can still browse the code there for now. It is not a big code base so not that hard to wrap one's mind around it.

It is Apache licensed and both Kafka and Voldemort are using it so I would say it is pretty self-contained (although Kafka has not moved to Tehuti proper, it is essentially the same code they're using, minus a few small fixes missing that we added).

Tehuti is a bit lower level than CodaHale (i.e.: you need to choose exactly which stats you want to measure and the boundaries of your histograms), but this is the type of stuff you would build a wrapper for and then re-use within your code base. For example: the Voldemort RequestCounter class.'
asl2  apache  open-source  tehuti  metrics  percentiles  quantiles  statistics  measurement  latency  kafka  voldemort  linkedin 
october 2014 by jm
Spark Streaming
an extension of the core Spark API that allows enables high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ or plain old TCP sockets and be processed using complex algorithms expressed with high-level functions like map, reduce, join and window. Finally, processed data can be pushed out to filesystems, databases, and live dashboards. In fact, you can apply Spark’s in-built machine learning algorithms, and graph processing algorithms on data streams.
spark  streams  stream-processing  cep  scalability  apache  machine-learning  graphs 
may 2014 by jm
Building a large scale CDN with Apache Traffic Server
via Ilya Grigorik: 'Great under-the-hood look at how Comcast built and operates their internal CDN for delivering video (on-demand + live). Some highlights: switched to own (open-source) stack; ~250 servers pushing ~1.5Pb of data/day with ~5Pb of storage capacity.'
cdn  comcast  video  presentations  apache  traffic-server  vod 
may 2014 by jm
SpamAssassin 3.4.0 released
Good to see the guys cracking on without me ;)

'2014-02-11: SpamAssassin 3.4.0 has been released adding native support for IPv6, improved DNS Blocklist technology and support for massively-scalable Bayesian filtering using the Redis backend.'
antispam  open-source  spamassassin  apache 
february 2014 by jm
Apache Curator
Netflix open-source library to make using ZooKeeper from Java less of a PITA. I really wish I'd used this now, having reimplemented some key parts of it after failures in prod ;)
zookeeper  netflix  apache  curator  java  libraries  open-source 
january 2014 by jm
Randomly Failed! The State of Randomness in Current Java Implementations
This would appear to be the paper which sparked off the drama around BitCoin thefts from wallets generated on Android devices:

The SecureRandom PRNG is the primary source of randomness for Java and is used e.g., by cryptographic operations. This underlines its importance regarding security. Some of fallback solutions of the investigated implementations [are] revealed to be weak and predictable or capable of being influenced. Very alarming are the defects found in Apache Harmony, since it is partly used by Android.

More on the BitCoin drama: ,
android  java  prng  random  security  bugs  apache-harmony  apache  crypto  bitcoin  papers 
august 2013 by jm
Ivan Ristić: Defending against the BREACH attack
One interesting response to this HTTPS compression-based MITM attack:
The award for least-intrusive and entirely painless mitigation proposal goes to Paul Querna who, on the httpd-dev mailing list, proposed to use the HTTP chunked encoding to randomize response length. Chunked encoding is a HTTP feature that is typically used when the size of the response body is not known in advance; only the size of the next chunk is known. Because chunks carry some additional information, they affect the size of the response, but not the content. By forcing more chunks than necessary, for example, you can increase the length of the response. To the attacker, who can see only the size of the response body, but not anything else, the chunks are invisible. (Assuming they're not sent in individual TCP packets or TLS records, of course.) This mitigation technique is very easy to implement at the web server level, which makes it the least expensive option. There is only a question about its effectiveness. No one has done the maths yet, but most seem to agree that response length randomization slows down the attacker, but does not prevent the attack entirely. But, if the attack can be slowed down significantly, perhaps it will be as good as prevented.
mitm  attacks  hacking  security  compression  http  https  protocols  tls  ssl  tcp  chunked-encoding  apache 
august 2013 by jm
Good UI for exploration of HyperLogLog set intersections and unions.
One of the first things that we wanted to do with HyperLogLog when we first started playing with it was to support and expose it natively in the browser. The thought of allowing users to directly interact with these structures -- perform arbitrary unions and intersections on effectively unbounded sets all on the client -- was exhilarating to us. [...] we are pleased to announce the open-source release of AK’s HyperLogLog implementation for JavaScript, js-hll. We are releasing this code under the Apache License, Version 2.0.

We knew that we couldn’t just release a bunch of JavaScript code without allowing you to see it in action — that would be a crime. We passed a few ideas around and the one that kept bubbling to the top was a way to kill two birds with one stone. We wanted something that would showcase what you can do with HLL in the browser and give us a tool for explaining HLLs. It is typical for us to explain how HLL intersections work using a Venn diagram. You draw some overlapping circles with a border that represents the error and you talk about how if that border is close to or larger than the intersection then you can’t say much about the size of that intersection. This works just ok on a whiteboard but what you really want is to just build a visualization that allows you to select from some sets and see the overlap. Maybe even play with the precision a little bit to see how that changes the result. Well, we did just that!
javascript  ui  hll  hyperloglog  algorithms  sketching  js  sets  intersection  union  apache  open-source 
june 2013 by jm
Kafka 0.8 Producer Performance
Great benchmarking from Piotr Kozikowski at the LiveRamp team, into performance of the upcoming Kafka 0.8 release
performance  kafka  apache  benchmarks  ops  queueing 
april 2013 by jm
Riak CS is now ASL2 open source
'Organizations and users can now access the source code on Github and download the latest packages from the downloads page. Also, today, we announced that Riak CS Enterprise is now available as commercial licensed software, featuring multi-datacenter replication technology and 24×7 Basho customer support.'
riak  riak-cs  nosql  storage  basho  open-source  github  apache  asl2 
march 2013 by jm
'a D3 plugin for visualizing time series. Use Cubism to construct better realtime dashboards.' Apache-licensed; nice realtime update style; overlays multiple data sources well. I think I now have a good use-case for this
javascript  library  visualization  dataviz  tsd  data  apache  open-source 
april 2012 by jm
Apache Kafka
'Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. This kind of activity (page views, searches, and other user actions) are a key ingredient in many of the social feature on the modern web. This data is typically handled by "logging" and ad hoc log aggregation solutions due to the throughput requirements. This kind of ad hoc solution is a viable solution to providing logging data to an offline analysis system like Hadoop, but is very limiting for building real-time processing. Kafka aims to unify offline and online processing by providing a mechanism for parallel load into Hadoop as well as the ability to partition real-time consumption over a cluster of machines.' neat
kafka  linkedin  apache  distributed  messaging  pubsub  queue  incubator  scaling 
february 2012 by jm
Lucene Utilities and Bloom Filters - Greplin:tech
'Storing 50,000 2.5KB items in a traditional hash set requires over 125MB, but if you're willing to accept a 1-in-10,000 false positive rate on lookups, [this] bloom filter requires under 500KB' - interesting variation on the basic concept.  Java, Apache-licensed
search  bloom-filters  greplin  open-source  apache  false-positives  from delicious
april 2011 by jm
'platform for event-driven, scalable, and fault-tolerant architectures on the JVM' .. Actor-based, 'let-it-crash', Apache-licensed, Java and Scala APIs, remote Actors, transactional memory -- looks quite nice
scala  java  concurrency  scalability  apache  akka  actors  erlang  fault-tolerance  events  from delicious
march 2011 by jm
nifty; Apache-licensed distributed, RESTful, JSON-over-HTTP, schemaless search server with multi-tenancy
search  distributed  rest  json  apache  elasticsearch  http  from delicious
february 2010 by jm
DDOS mystery involving Linux and mod_ssl
connections to, "GET / HTTP/1.1", massive HTTPS DDOS. no idea what's going on
apache  asf  ddos  https  httpd  mod_ssl  from delicious
october 2009 by jm
glTail.rb - realtime logfile visualization
'View real-time data and statistics from any logfile on any server with SSH, in an intuitive and entertaining way', supporting postfix/spamd/clamd logs among loads of others. very cool if a little silly
dataviz  visualization  tail  gltail  opengl  linux  apache  spamd  spamassassin  logs  statistics  sysadmin  analytics  animation  analysis  server  ruby  monitoring  logging  logfiles 
july 2009 by jm
Launchpad is now open source
Canonical _finally_ open source (under the AGPL) their bug tracker/project hosting platform. yay! here's hoping it's reasonably easy to deploy. maybe it would be viable for the ASF... hmm
canonical  launchpad  open-source  apache  hosting  projects  ubuntu  agpl 
july 2009 by jm

related tags

3.3.0  actors  agpl  akka  algorithms  allocation  analysis  analytics  android  animation  anti-spam  antispam  apache  apache-harmony  asf  asl2  attacks  basho  batch  benchmarks  big-data  bitcoin  bloom-filters  bugs  bugzilla  canonical  cdn  cep  chunked-encoding  comcast  compression  concurrency  config  crypto  curator  data  databases  databricks  dataviz  ddos  debugging  distributed  elasticsearch  erlang  eulogies  events  facebook  false-positives  fault-tolerance  funny  gc  git  github  gltail  graphs  greplin  hacking  hacks  hadoop  hll  hosting  http  httpd  https  hyperloglog  incubator  intersection  ios  java  javascript  jira  js  json  kafka  latency  launchpad  libraries  library  licensing  linkedin  linux  logfiles  logging  logs  machine-learning  map-reduce  mapping  maps  measurement  memorials  messaging  metrics  mitm  mobile  mod_ssl  monitoring  netflix  netty  noirin-plunkett  nosql  object-pooling  object-pools  open-source  opengl  ops  paldb  papers  patents  percentiles  performance  pokemon  pools  presentations  prng  programming  projects  protocols  pubsub  quantiles  queue  queueing  quizzes  random  react  releases  rest  riak  riak-cs  rip  rocksdb  ruby  scala  scalability  scaling  search  security  server  sets  side-data  sketching  sorting  spamassassin  spamd  spark  ssl  statistics  storage  stormpot  stream-processing  streams  subversion  svn  sysadmin  tail  tcp  tehuti  tls  trace  tracing  traffic-server  tsd  ubuntu  ui  union  via:hn  video  visualization  vod  voldemort  xss  zipkin  zookeeper 

Copy this bookmark: