jm + scala   35

Neutrino Software Load Balancer
eBay's software LB, supporting URL matching, comparable to haproxy, built using Netty and Scala. Used in their QA infrastructure it seems
netty  scala  ebay  load-balancing  load-balancers  url  http  architecture 
february 2016 by jm
The End of Dynamic Languages
This is my bet: the age of dynamic languages is over. There will be no new successful ones. Indeed we have learned a lot from them. We’ve learned that library code should be extendable by the programmer (mixins and meta-programming), that we want to control the structure (macros), that we disdain verbosity. And above all, we’ve learned that we want our languages to be enjoyable.

But it’s time to move on. We will see a flourishing of languages that feel like you’re writing in a Clojure, but typed. Included will be a suite of powerful tools that we’ve never seen before, tools so convincing that only ascetics will ignore.
programming  scala  clojure  coding  types  strong-types  dynamic-languages  languages 
november 2015 by jm
Testing without mocking in Scala
mocks are the sound of your code crying out, "please structure me differently!"


+1
scala  via:jessitron  mocks  mock-objects  testing  testability  coding 
july 2015 by jm
Stu Hood and Brian Degenhardt, Scala at Twitter, SF Scala @Twitter 20150217
'Stu Hood and Brian Degenhardt talk about the history of Scala at Twitter, from inception until today, covering 2.10 migration, the original Alex Payne’s presentation from way back, pants, and more. The first five years of Scala at Twitter and the years ahead!'

Very positive indeed on the monorepo concept.
monorepo  talks  scala  sfscala  stu-hood  twitter  pants  history  repos  build  projects  compilation  gradle  maven  sbt 
march 2015 by jm
JClarity's Illuminate
Performance-diagnosis-as-a-service. Cool.
Users download and install an Illuminate Daemon using a simple installer which starts up a small stand alone Java process. The Daemon sits quietly unless it is asked to start gathering SLA data and/or to trigger a diagnosis. Users can set SLA’s via the dashboard and can opt to collect latency measurements of their transactions manually (using our library) or by asking Illuminate to automatically instrument their code (Servlet and JDBC based transactions are currently supported).

SLA latency data for transactions is collected on a short cycle. When the moving average of latency measurements goes above the SLA value (e.g. 150ms), a diagnosis is triggered. The diagnosis is very quick, gathering key data from O/S, JVM(s), virtualisation and other areas of the system. The data is then run through the machine learned algorithm which will quickly narrow down the possible causes and gather a little extra data if needed.

Once Illuminate has determined the root cause of the performance problem, the diagnosis report is sent back to the dashboard and an alert is sent to the user. That alert contains a link to the result of the diagnosis which the user can share with colleagues. Illuminate has all sorts of backoff strategies to ensure that users don’t get too many alerts of the same type in rapid succession!
illuminate  jclarity  java  jvm  scala  latency  gc  tuning  performance 
february 2015 by jm
OpenJDK: jol
'JOL (Java Object Layout) is the tiny toolbox to analyze object layout schemes in JVMs. These tools are using Unsafe, JVMTI, and Serviceability Agent (SA) heavily to decoder the actual object layout, footprint, and references. This makes JOL much more accurate than other tools relying on heap dumps, specification assumptions, etc.'

Recommended by Nitsan Wakart, looks pretty useful for JVM devs
java  jvm  tools  scala  memory  estimation  ram  object-layout  debugging  via:nitsan 
february 2015 by jm
Functional Programming Patterns (BuildStuff '14)
Good, and very accessible even for FP noobs like myself ;)
clojure  fp  functional  patterns  coding  scala 
january 2015 by jm
Why We Didn’t Use Kafka for a Very Kafka-Shaped Problem
A good story of when Kafka _didn't_ fit the use case:
We came up with a complicated process of app-level replication for our messages into two separate Kafka clusters. We would then do end-to-end checking of the two clusters, detecting dropped messages in each cluster based on messages that weren’t in both.

It was ugly. It was clearly going to be fragile and error-prone. It was going to be a lot of app-level replication and horrible heuristics to see when we were losing messages and at least alert us, even if we couldn’t fix every failure case.

Despite us building a Kafka prototype for our ETL — having an existing investment in it — it just wasn’t going to do what we wanted. And that meant we needed to leave it behind, rewriting the ETL prototype.
cassandra  java  kafka  scala  network-partitions  availability  multi-region  multi-az  aws  replication  onlive 
november 2014 by jm
Jauter
This Java library can route paths to targets and create paths from targets and params (reverse routing). This library is tiny, without additional dependencies, and is intended for use together with an HTTP server side library. If you want to use with Netty, see netty-router.
java  jauter  scala  request-routing  http  netty  open-source 
october 2014 by jm
Tehuti
An embryonic metrics library for Java/Scala from Felix GV at LinkedIn, extracted from Kafka's metric implementation and in the new Voldemort release. It fixes the major known problems with the Meter/Timer implementations in Coda-Hale/Dropwizard/Yammer Metrics.

'Regarding Tehuti: it has been extracted from Kafka's metric implementation. The code was originally written by Jay Kreps, and then maintained improved by some Kafka and Voldemort devs, so it definitely is not the work of just one person. It is in my repo at the moment but I'd like to put it in a more generally available (git and maven) repo in the future. I just haven't had the time yet...

As for comparing with CodaHale/Yammer, there were a few concerns with it, but the main one was that we didn't like the exponentially decaying histogram implementation. While that implementation is very appealing in terms of (low) memory usage, it has several misleading characteristics (a lack of incoming data points makes old measurements linger longer than they should, and there's also a fairly high possiblity of losing interesting outlier data points). This makes the exp decaying implementation robust in high throughput fairly constant workloads, but unreliable in sparse or spiky workloads. The Tehuti implementation provides semantics that we find easier to reason with and with a small code footprint (which we consider a plus in terms of maintainability). Of course, it is still a fairly young project, so it could be improved further.'

More background at the kafka-dev thread: http://mail-archives.apache.org/mod_mbox/kafka-dev/201402.mbox/%3C131A7649-ED57-45CB-B4D6-F34063267664@linkedin.com%3E
kafka  metrics  dropwizard  java  scala  jvm  timers  ewma  statistics  measurement  latency  sampling  tehuti  voldemort  linkedin  jay-kreps 
october 2014 by jm
Alex Payne — Thoughts On Five Years of Emerging Languages
One could read the success of Go as an indictment of contemporary PLT, but I prefer to see it as a reminder of just how much language tooling matters. Perhaps even more critical, Go’s lean syntax, selective semantics, and cautiously-chosen feature set demonstrate the importance of a strong editorial voice in a language’s design and evolution.

Having co-authored a book on Scala, it’s been painful to see systems programmers in my community express frustration with the ambitious hybrid language. I’ve watched them abandon ship and swim back to the familiar shores of Java, or alternately into the uncharted waters of Clojure, Go, and Rust. A pity, but not entirely surprising if we’re being honest with ourselves.

Unlike Go, Scala has struggled with tooling from its inception. More than that, Scala has had a growing editorial problem. Every shop I know that’s been successful with Scala has limited itself to some subset of the language. Meanwhile, in pursuit of enterprise developers, its surface area has expanded in seemingly every direction. The folks behind Scala have, thankfully, taken notice: upcoming releases are promised to focus on simplicity, clarity, and better tooling.
scala  go  coding  languages 
september 2014 by jm
How to take over the computer of any JVM developer
To prove how easy [MITM attacking Mavencentral JARs] is to do, I wrote dilettante, a man-in-the-middle proxy that intercepts JARs from maven central and injects malicious code into them. Proxying HTTP traffic through dilettante will backdoor any JARs downloaded from maven central. The backdoored version will retain their functionality, but display a nice message to the user when they use the library.
jars  dependencies  java  build  clojure  security  mitm  http  proxies  backdoors  scala  maven  gradle 
july 2014 by jm
ScalaTest
Scala's BDD approach -- very similar to Steak in Rubyland I think
scala  testing  bdd  acceptance-testing  steak  coding  scalatest 
june 2014 by jm
Pillar
Manages migrations for your Cassandra data stores. Pillar grew from a desire to automatically manage Cassandra schema as code. Managing schema as code enables automated build and deployment, a foundational practice for an organization striving to achieve Continuous Delivery.

Pillar is to Cassandra what Rails ActiveRecord migrations or Play Evolutions are to relational databases with one key difference: Pillar is completely independent from any application development framework.
migrations  database  ops  pillar  cassandra  activerecord  scala  continuous-delivery  automation  build 
june 2014 by jm
Monitoring Reactive Applications with Kamon
"quality monitoring tools for apps built in Akka, Spray and Play!". Uses Gil Tene's HDRHistogram and dropwizard Metrics under the hood.
metrics  dropwizard  hdrhistogram  gil-tene  kamon  akka  spray  play  reactive  statistics  java  scala  percentiles  latency 
may 2014 by jm
'Pickles & Spores: Improving Support for Distributed Programming in Scala
'Spores are "small units of possibly mobile functional behavior". They're a closure-like abstraction meant for use in distributed or concurrent environments. Spores provide a guarantee that the environment is effectively immutable, and safe to ship over the wire. Spores aim to give library authors some confidence in exposing functions (or, rather, spores) in public APIs for safe consumption in a distributed or concurrent environment.

The first part of the talk covers a simpler variant of spores as they are proposed for inclusion in Scala 2.11. The second part of the talk briefly introduces a current research project ongoing at EPFL which leverages Scala's type system to provide type constraints that give authors finer-grained control over spore capturing semantics. What's more, these type constraints can be composed during spore composition, so library authors are effectively able to propagate expert knowledge via these composable constraints.

The last part of the talk briefly covers Scala/Pickling, a fast new, open serialization framework.'
pickling  scala  presentations  spores  closures  fp  immutability  coding  distributed  distcomp  serialization  formats  network 
april 2014 by jm
Sirius by Comcast
At Comcast, our applications need convenient, low-latency access to important reference datasets. For example, our XfinityTV websites and apps need to use entertainment-related data to serve almost every API or web request to our datacenters: information like what year Casablanca was released, or how many episodes were in Season 7 of Seinfeld, or when the next episode of the Voice will be airing (and on which channel!).

We traditionally managed this information with a combination of relational databases and RESTful web services but yearned for something simpler than the ORM, HTTP client, and cache management code our developers dealt with on a daily basis. As main memory sizes on commodity servers continued to grow, however, we asked ourselves: How can we keep this reference data entirely in RAM, while ensuring it gets updated as needed and is easily accessible to application developers?

The Sirius distributed system library is our answer to that question, and we're happy to announce that we've made it available as an open source project. Sirius is written in Scala and uses the Akka actor system under the covers, but is easily usable by any JVM-based language.

Also includes a Paxos implementation with "fast follower" read-only slave replication. ASL2-licensed open source.

The only thing I can spot to be worried about is speed of startup; they note that apps need to replay a log at startup to rebuild state, which can be slow if unoptimized in my experience.

Update: in a twitter conversation at https://twitter.com/jon_moore/status/459363751893139456 , Jon Moore indicated they haven't had problems with this even with 'datasets consuming 10-20GB of heap', and have 'benchmarked a 5-node Sirius ingest cluster up to 1k updates/sec write throughput.' That's pretty solid!
open-source  comcast  paxos  replication  read-only  datastores  storage  memory  memcached  redis  sirius  scala  akka  jvm  libraries 
april 2014 by jm
Transitioning to Scala
Advice from a developer who helped rebuild Walmart.ca with Scala and Play


This is really good advice.
walmart  scala  java  languages  coding  relearning  play  akka 
april 2014 by jm
NettoSphere demo
'This article will use NettoSphere, a framework build on top of the popular Netty Framework and Atmosphere with support of WebSockets, Server Side Events and Long-Polling. NettoSphere allows [async JVM framework] Atmosphere's applications to run on top of the Netty Framework.'
atmosphere  netty  async  java  scala  websockets  sse  long-polling  http  demos  games 
november 2013 by jm
Streaming MapReduce with Summingbird
Before Summingbird at Twitter, users that wanted to write production streaming aggregations would typically write their logic using a Hadoop DSL like Pig or Scalding. These tools offered nice distributed system abstractions: Pig resembled familiar SQL, while Scalding, like Summingbird, mimics the Scala collections API. By running these jobs on some regular schedule (typically hourly or daily), users could build time series dashboards with very reliable error bounds at the unfortunate cost of high latency.

While using Hadoop for these types of loads is effective, Twitter is about real-time and we needed a general system to deliver data in seconds, not hours. Twitter’s release of Storm made it easy to process data with very low latencies by sacrificing Hadoop’s fault tolerant guarantees. However, we soon realized that running a fully real-time system on Storm was quite difficult for two main reasons:

Recomputation over months of historical logs must be coordinated with Hadoop or streamed through Storm with a custom log loading mechanism;
Storm is focused on message passing and random-write databases are harder to maintain.

The types of aggregations one can perform in Storm are very similar to what’s possible in Hadoop, but the system issues are very different. Summingbird began as an investigation into a hybrid system that could run a streaming aggregation in both Hadoop and Storm, as well as merge automatically without special consideration of the job author. The hybrid model allows most data to be processed by Hadoop and served out of a read-only store. Only data that Hadoop hasn’t yet been able to process (data that falls within the latency window) would be served out of a datastore populated in real-time by Storm. But the error of the real-time layer is bounded, as Hadoop will eventually get around to processing the same data and will smooth out any error introduced. This hybrid model is appealing because you get well understood, transactional behavior from Hadoop, and up to the second additions from Storm. Despite the appeal, the hybrid approach has the following practical problems:

Two sets of aggregation logic have to be kept in sync in two different systems;
Keys and values must be serialized consistently between each system and the client.

The client is responsible for reading from both datastores, performing a final aggregation and serving the combined results
Summingbird was developed to provide a general solution to these problems.


Very interesting stuff. I'm particularly interested in the design constraints they've chosen to impose to achieve this -- data formats which require associative merging in particular.
mapreduce  streaming  big-data  twitter  storm  summingbird  scala  pig  hadoop  aggregation  merging 
september 2013 by jm
On Scala
great, comprehensive review of the language, its pros and misfeatures, from Bill de hOra
scala  languages  coding  fp  reviews 
june 2013 by jm
From a monolithic Ruby on Rails app to the JVM
How Soundcloud have ditched the monolithic Rails for nimbler, small-scale distributed polyglot services running on the JVM
soundcloud  rails  slides  jvm  scalability  ruby  scala  clojure  coding 
march 2013 by jm
Real-time Analytics in Scala [slides, PDF]
some good approximation/streaming algorithms and tips on Scala implementation
streams  algorithms  approximation  coding  scala  slides 
february 2013 by jm
Scala 2.8 Collections API -- Performance Characteristics
wow. Every library vending a set of collection types should have a page like this
collections  scala  performance  reference  complexity  big-o  coding 
january 2013 by jm
Effective Scala
Twitter's Scala style guide. 'While highly effective, Scala is also a large language, and our experiences have taught us to practice great care in its application. What are its pitfalls? Which features do we embrace, which do we eschew? When do we employ “purely functional style”, and when do we avoid it? In other words: what have we found to be an effective use of the language? This guide attempts to distill our experience into short essays, providing a set of best practices. Our use of Scala is mainly for creating high volume services that form distributed systems — and our advice is thus biased — but most of the advice herein should translate naturally to other domains.'
twitter  scala  coding  style 
january 2013 by jm
peak6/scala-ssh-shell - GitHub
'Backdoor that gives you a scala shell over ssh on your jvm. The shell is not sandboxed, anyone access the shell can touch anything in the jvm and do anything the jvm can do including modifying and deleting files, etc.' nifty!
scala  ssh  repl  interactive  debugging  coding  jvm  java 
october 2011 by jm
Scala: The Static Language that Feels Dynamic
a good intro from Bruce Eckel. We need a good excuse to deploy some Scala ;)
scala  actors  java  language  programming  jvm  coding 
june 2011 by jm
Akka
'platform for event-driven, scalable, and fault-tolerant architectures on the JVM' .. Actor-based, 'let-it-crash', Apache-licensed, Java and Scala APIs, remote Actors, transactional memory -- looks quite nice
scala  java  concurrency  scalability  apache  akka  actors  erlang  fault-tolerance  events  from delicious
march 2011 by jm
Lift View First
explaining Lift's code-free "display only" templating system. I like it. Very similar concept to WebMake's "scraped templates": http://webmake.taint.org/doc/scraping.html , nearly 10 years old now!
java  scala  lift  templates  templating  scraping  from delicious
february 2010 by jm

related tags

acceptance-testing  activerecord  actors  aggregation  akka  algorithms  apache  approximation  architecture  async  atmosphere  automation  availability  aws  backdoors  bdd  big-data  big-o  build  cassandra  clojure  closures  cloudera  coding  collections  comcast  compilation  complexity  concurrency  continuous-delivery  database  datastores  debugging  demos  dependencies  dependency-injection  distcomp  distributed  dropwizard  dynamic-languages  ebay  erlang  estimation  events  ewma  fault-tolerance  formats  fp  functional  games  gc  gil-tene  gizzard  go  gradle  guardian  hadoop  hdrhistogram  history  http  illuminate  immutability  inheritance  interactive  jars  jauter  java  jay-kreps  jclarity  jvm  kafka  kamon  language  languages  latency  libraries  lift  linkedin  load-balancers  load-balancing  long-polling  mapreduce  maven  measurement  memcached  memory  merging  metrics  migrations  mitm  mock-objects  mocks  monorepo  multi-az  multi-region  netty  network  network-partitions  nosql  object-layout  onlive  oo  open-source  ops  pants  patterns  paxos  percentiles  performance  pickling  pig  pillar  play  presentations  programming  projects  proxies  rails  ram  reactive  read-only  redis  reference  relearning  repl  replication  repos  request-routing  reviews  ruby  sampling  sbt  scala  scalability  scalatest  scraping  security  serialization  sfscala  sharding  sirius  slides  soundcloud  spark  spores  spray  sse  ssh  statistics  steak  storage  storm  streaming  streams  strong-types  stu-hood  style  summingbird  talks  tehuti  templates  templating  testability  testing  timers  tools  traits  travis-brown  tuning  twitter  types  url  via:jessitron  via:nitsan  voldemort  walmart  websockets 

Copy this bookmark:



description:


tags: