jm + development   12

'Software Engineering at Google'
20 pages of Google's software dev practices, with emphasis on the build system (since it was written by the guy behind Blaze). Naturally, some don't make a whole lot of sense outside of Google, but still some good stuff here
development  engineering  google  papers  software  coding  best-practices 
february 2017 by jm
Open Whisper Systems >> Blog >> Reflections: The ecosystem is moving
Very interesting post on federation vs centralization for new services:
One of the controversial things we did with Signal early on was to build it as an unfederated service. Nothing about any of the protocols we've developed requires centralization; it's entirely possible to build a federated Signal Protocol based messenger, but I no longer believe that it is possible to build a competitive federated messenger at all.
development  encryption  communication  network-effects  federation  signal  ip  protocols  networking  smtp  platforms 
may 2016 by jm
Try Server
Good terminology for this concept:
The try server runs a similar configuration to the continuous integration server, except that it is triggered not on commits but on "try job request", in order to test code pre-commit.

See also https://wiki.mozilla.org/ReleaseEngineering/TryServer for the Moz take on it.
build  ci  integration  try-server  jenkins  buildbot  chromium  development 
march 2015 by jm
Why we run an open source program - Walmart Labs
This is a great exposition of why it's in a company's interest to engage with open source. Not sure I agree with 'engineers are the artists of our generation' but the rest are spot on
development  open-source  walmart  node  coding  via:hn  hiring 
february 2015 by jm
'Machine Learning: The High-Interest Credit Card of Technical Debt' [PDF]
Oh god yes. This is absolutely spot on, as you would expect from a Google paper -- at this stage they probably have accumulated more real-world ML-at-scale experience than anywhere else.

'Machine learning offers a fantastically powerful toolkit for building complex systems
quickly. This paper argues that it is dangerous to think of these quick wins
as coming for free. Using the framework of technical debt, we note that it is remarkably
easy to incur massive ongoing maintenance costs at the system level
when applying machine learning. The goal of this paper is highlight several machine
learning specific risk factors and design patterns to be avoided or refactored
where possible. These include boundary erosion, entanglement, hidden feedback
loops, undeclared consumers, data dependencies, changes in the external world,
and a variety of system-level anti-patterns.

[....]

'In this paper, we focus on the system-level interaction between machine learning code and larger systems
as an area where hidden technical debt may rapidly accumulate. At a system-level, a machine
learning model may subtly erode abstraction boundaries. It may be tempting to re-use input signals
in ways that create unintended tight coupling of otherwise disjoint systems. Machine learning
packages may often be treated as black boxes, resulting in large masses of “glue code” or calibration
layers that can lock in assumptions. Changes in the external world may make models or input
signals change behavior in unintended ways, ratcheting up maintenance cost and the burden of any
debt. Even monitoring that the system as a whole is operating as intended may be difficult without
careful design.

Indeed, a remarkable portion of real-world “machine learning” work is devoted to tackling issues
of this form. Paying down technical debt may initially appear less glamorous than research results
usually reported in academic ML conferences. But it is critical for long-term system health and
enables algorithmic advances and other cutting-edge improvements.'
machine-learning  ml  systems  ops  tech-debt  maintainance  google  papers  hidden-costs  development 
december 2014 by jm
#AltDevBlog » Parallel Implementations
John Carmack describes this code-evolution approach to adding new code:
The last two times I did this, I got the software rendering code running on the new platform first, so everything could be tested out at low frame rates, then implemented the hardware accelerated version in parallel, setting things up so you could instantly switch between the two at any time.  For a mobile OpenGL ES application being developed on a windows simulator, I opened a completely separate window for the accelerated view, letting me see it simultaneously with the original software implementation.  This was a very significant development win.

If the task you are working on can be expressed as a pure function that simply processes input parameters into a return structure, it is easy to switch it out for different implementations.  If it is a system that maintains internal state or has multiple entry points, you have to be a bit more careful about switching it in and out.  If it is a gnarly mess with lots of internal callouts to other systems to maintain parallel state changes, then you have some cleanup to do before trying a parallel implementation.

There are two general classes of parallel implementations I work with:  The reference implementation, which is much smaller and simpler, but will be maintained continuously, and the experimental implementation, where you expect one version to “win” and consign the other implementation to source control in a couple weeks after you have some confidence that it is both fully functional and a real improvement.

It is completely reasonable to violate some generally good coding rules while building an experimental implementation – copy, paste, and find-replace rename is actually a good way to start.  Code fearlessly on the copy, while the original remains fully functional and unmolested.  It is often tempting to shortcut this by passing in some kind of option flag to existing code, rather than enabling a full parallel implementation.  It is a  grey area, but I have been tending to find the extra path complexity with the flag approach often leads to messing up both versions as you work, and you usually compromise both implementations to some degree.


(via Marc)
via:marc  coding  john-carmack  parallel  development  evolution  lifecycle  project-management 
june 2014 by jm
Dan Kaminsky on Heartbleed
When I said that we expected better of OpenSSL, it’s not merely that there’s some sense that security-driven code should be of higher quality.  (OpenSSL is legendary for being considered a mess, internally.)  It’s that the number of systems that depend on it, and then expose that dependency to the outside world, are considerable.  This is security’s largest contributed dependency, but it’s not necessarily the software ecosystem’s largest dependency.  Many, maybe even more systems depend on web servers like Apache, nginx, and IIS.  We fear vulnerabilities significantly more in libz than libbz2 than libxz, because more servers will decompress untrusted gzip over bzip2 over xz.  Vulnerabilities are not always in obvious places – people underestimate just how exposed things like libxml and libcurl and libjpeg are.  And as HD Moore showed me some time ago, the embedded space is its own universe of pain, with 90’s bugs covering entire countries.

If we accept that a software dependency becomes Critical Infrastructure at some level of economic dependency, the game becomes identifying those dependencies, and delivering direct technical and even financial support.  What are the one million most important lines of code that are reachable by attackers, and least covered by defenders?  (The browsers, for example, are very reachable by attackers but actually defended pretty zealously – FFMPEG public is not FFMPEG in Chrome.)

Note that not all code, even in the same project, is equally exposed.    It’s tempting to say it’s a needle in a haystack.  But I promise you this:  Anybody patches Linux/net/ipv4/tcp_input.c (which handles inbound network for Linux), a hundred alerts are fired and many of them are not to individuals anyone would call friendly.  One guy, one night, patched OpenSSL.  Not enough defenders noticed, and it took Neel Mehta to do something.
development  openssl  heartbleed  ssl  security  dan-kaminsky  infrastructure  libraries  open-source  dependencies 
april 2014 by jm
The best "why estimation is hard" parable I've read this week
'A tense silence falls between us. The phone call goes unmade. I'll call tomorrow once my comrade regains his senses and is willing to commit to something reasonable.'
agile  development  management  programming  teams  estimation  tasks  software 
february 2012 by jm
Forking is a Feature - Anil Dash
thought-provoking piece about GitHub-style forking applied to other disciplines; Tumblr, Dribbble, Forrst being cases where it's happening now
community  development  forking  github  git  opensource  tumblr  dribbble  forrst  wikipedia  from delicious
september 2010 by jm

Copy this bookmark:



description:


tags: