jm + coding   106

Rusty's API Design Manifesto
This classic came up in discussions yesterday...

In the Linux Kernel community Rusty Russell came up with a API rating scheme to help us determine if our API is sensible, or not.  It's a rating from -10 to 10, where 10 is perfect is -10 is hell. Unfortunately there are too many examples at the wrong end of the scale.
rusty-russell  quality  coding  kernel  linux  apis  design  code-reviews  code 
10 days ago by jm
Approximate Heavy Hitters -The SpaceSaving Algorithm
nice, readable intro to SpaceSaving (which I've linked to before) -- a simple stream-processing cardinality top-K estimation algorithm with bounded error.
algorithms  coding  space-saving  cardinality  streams  stream-processing  estimation 
11 days ago by jm
Older Is Wiser: Study Shows Software Developers’ Skills Improve Over Time
At least in terms of StackOverflow rep:
For the first part of the study, the researchers compared the age of users with their reputation scores. They found that an individual’s reputation increases with age, at least into a user’s 40s. There wasn’t enough data to draw meaningful conclusions for older programmers. The researchers then looked at the number of different subjects that users asked and answered questions about, which reflects the breadth of their programming interests. The researchers found that there is a sharp decline in the number of subjects users weighed in on between the ages of 15 and 30 – but that the range of subjects increased steadily through the programmers’ 30s and into their early 50s.

Finally, the researchers evaluated the knowledge of older programmers (ages 37 and older) compared to younger programmers (younger than 37) in regard to relatively recent technologies – meaning technologies that have been around for less than 10 years. For two smartphone operating systems, iOS and Windows Phone 7, the veteran programmers had a significant edge in knowledge over their younger counterparts. For every other technology, from Django to Silverlight, there was no statistically significant difference between older and younger programmers. “The data doesn’t support the bias against older programmers – if anything, just the opposite,” Murphy-Hill says.


Damn right ;)
coding  age  studies  software  work  stack-overflow  ncsu  knowledge  skills  life 
26 days ago by jm
Lectures in Advanced Data Structures (6.851)
Good lecture notes on the current state of the art in data structure research.
Data structures play a central role in modern computer science. You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major results and current directions of research in data structures:

TIME TRAVEL We can remember the past efficiently (a technique called persistence), but in general it's difficult to change the past and see the outcomes on the present (retroactivity). So alas, Back To The Future isn't really possible.
GEOMETRY When data has more than one dimension (e.g. maps, database tables).
DYNAMIC OPTIMALITY Is there one binary search tree that's as good as all others? We still don't know, but we're close.
MEMORY HIERARCHY Real computers have multiple levels of caches. We can optimize the number of cache misses, often without even knowing the size of the cache.
HASHING Hashing is the most used data structure in computer science. And it's still an active area of research.
INTEGERS Logarithmic time is too easy. By careful analysis of the information you're dealing with, you can often reduce the operation times substantially, sometimes even to constant. We will also cover lower bounds that illustrate when this is not possible.
DYNAMIC GRAPHS A network link went down, or you just added or deleted a friend in a social network. We can still maintain essential information about the connectivity as it changes.
STRINGS Searching for phrases in giant text (think Google or DNA).
SUCCINCT Most “linear size” data structures you know are much larger than they need to be, often by an order of magnitude. Some data structures require almost no space beyond the raw data but are still fast (think heaps, but much cooler).


(via Tim Freeman)
data-structures  lectures  mit  video  data  algorithms  coding  csail  strings  integers  hashing  sorting  bst  memory 
26 days ago by jm
jq
like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text. [it] is written in portable C, and it has zero runtime dependencies. You can download a single binary, scp it to a far away machine, and expect it to work.


Nice tool. Needs to get into the Debian/Ubuntu apt repos pronto ;)
jq  tools  cli  via:peakscale  json  coding  data  sed  unix 
27 days ago by jm
Log4j2 Asynchronous Loggers for Low-Latency Logging - Apache Log4j 2
implemented using the LMAX Disruptor library -- very impressive performance figures. I presume in real-world usage, these latencies are dwarfed by hardware costs, though
disruptor  coding  java  log4j  logging  async  performance 
4 weeks ago by jm
You probably shouldn’t use a spreadsheet for important work
Daniel Lemire comments on the recent cases of bugs in spreadsheets causing major impact:
There are several critical problems with a tool like Excel that need to be widely known:

* Spreadsheets do not support testing. For anything that matters, you should validate and test your code automatically and systematically;

* Spreadsheets make code reviews impractical. To visually inspect the code, you need to click and each and every cell. In practice, this means that you cannot reasonably ask someone to read over your formulas to make sure that there is no mistake;

* Spreadsheets encourage redundancies. Spreadsheets encourage copy-and-paste. Though copying and pasting is sometimes the right tool, it also creates redundancies. These redundancies make it very difficult to update a spreadsheet: are you absolutely sure that you have changed the formula throughout?


Agreed on all three, particularly on the impossibility of testing. IMO, everyone who may be in a job where automation via spreadsheet is likely, needs training in SDE fundamentals: unit testing, the important of open source and open data for reproducibility, version control, and code review. We are all computer scientists now.
spreadsheets  excel  coding  errors  bugs  testability  unit-testing  testing  quality  sde  sde-fundamentals  dry 
4 weeks ago by jm
Functional Reactive Programming in the Netflix API with RxJava
Hmm, this seems nifty as a compositional building block for Java code to enable concurrency without thread-safety and sync problems.
Functional reactive programming offers efficient execution and composition by providing a collection of operators capable of filtering, selecting, transforming, combining and composing Observable's.

The Observable data type can be thought of as a "push" equivalent to Iterable which is "pull". With an Iterable, the consumer pulls values from the producer and the thread blocks until those values arrive. By contrast with the Observable type, the producer pushes values to the consumer whenever values are available. This approach is more flexible, because values can arrive synchronously or asynchronously.
concurrency  java  jvm  threads  thread-safety  coding  rx  frp  fp  functional-programming  reactive  functional  async  observable 
4 weeks ago by jm
Archiving Gmail to Evernote
Google Drive and GMail have a built-in scripting engine. I had no idea
gmail  evernote  archival  scripting  coding  hacks  google-drive 
4 weeks ago by jm
Microsoft Code Digger extension
Miguel de Icaza says it's witchcraft -- I'm inclined to agree:

Code Digger analyzes possible execution paths through your .NET code. The result is a table where each row shows a unique behavior of your code. The table helps you understand the behavior of the code, and it may also uncover hidden bugs. Through the new context menu item "Generate Inputs / Outputs Table" in the Visual Studio editor, you can invoke Code Digger to analyze your code. Code Digger computes and displays input-output pairs. Code Digger systematically hunts for bugs, exceptions, and assertion failures.
testing  constraint-solving  solver  witchcraft  magic  dot-net  coding  tests  code-digger  microsoft 
4 weeks ago by jm
Lucene 4 - Revisiting Problems For Speed [slides]
a Presentation from Simon Willnauer on optimization work performed on Lucene in 2011. The most interesting stuff here is the work done to replace an O(n^2) FuzzyQuery fuzzy-match algorithm with a FSM trie is extremely cool -- benchmarked at 214 times faster!
benchmarks  slides  lucene  search  fuzzy-matching  text-matching  strings  algorithms  coding  fsm  tries 
4 weeks ago by jm
The Excel Depression - NYTimes.com
Krugman on the Reinhart-Rogoff Excel-bug fiasco.
What the Reinhart-Rogoff affair shows is the extent to which austerity has been sold on false pretenses. For three years, the turn to austerity has been presented not as a choice but as a necessity. Economic research, austerity advocates insisted, showed that terrible things happen once debt exceeds 90 percent of G.D.P. But “economic research” showed no such thing; a couple of economists made that assertion, while many others disagreed. Policy makers abandoned the unemployed and turned to austerity because they wanted to, not because they had to. So will toppling Reinhart-Rogoff from its pedestal change anything? I’d like to think so. But I predict that the usual suspects will just find another dubious piece of economic analysis to canonize, and the depression will go on and on.
paul-krugman  economics  excel  coding  bugs  software  austerity  debt 
5 weeks ago by jm
Excel, untestability, and the reliability of quants
Wow, this is a great software-quality story -- I knew Excel was the most widely used programming environment out there, but this is a factor I'd overlooked:

In his remarks on the final panel, Frank Partnoy mentioned something I missed when it came out a few weeks ago: the role of Microsoft Excel in the “London Whale” trading debacle. [..] To summarize: JPMorgan’s Chief Investment Office needed a new value-at-risk (VaR) model for the synthetic credit portfolio (the one that blew up) and assigned a quantitative whiz [...] to create it. The new model “operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another.” The internal Model Review Group identified this problem as well as a few others, but approved the model, while saying that it should be automated and another significant flaw should be fixed. After the London Whale trade blew up, the Model Review Group discovered that the model had not been automated and found several other errors. Most spectacularly, “After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR ...”

I write periodically about the perils of bad software in the business world in general and the financial industry in particular, by which I usually mean back-end enterprise software that is poorly designed, insufficiently tested, and dangerously error-prone. But this is something different. [...] While Excel the program is reasonably robust, the spreadsheets that people create with Excel are incredibly fragile. There is no way to trace where your data come from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets, for starters. The biggest problem is that anyone can create Excel spreadsheets -- badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way.

This is why the JPMorgan VaR model is the rule, not the exception: manual data entry, manual copy-and-paste, and formula errors. This is another important reason why you should pause whenever you hear that banks’ quantitative experts are smarter than Einstein, or that sophisticated risk management technology can protect banks from blowing up. At the end of the day, it’s all software. While all software breaks occasionally, Excel spreadsheets break all the time. But they don’t tell you when they break: they just give you the wrong number.
excel  reliability  software  coding  ides  jpmorgan  value-at-risk  finance  london-whale  quants  spreadsheets  unit-tests  testability  testing 
5 weeks ago by jm
Reality, Reactivity, Relevance and Repeatability in Java Application Profiling
this product from JInspired appears to support runtime profiling of java apps with < 5% performance impact
profiling  performance  java  coding  measurement 
5 weeks ago by jm
Austerity policies founded on Excel typo
You've probably heard that countries with a high debt:GDP ratio suffer from slow economic growth. The specific number 90 percent has been invoked frequently. That's all thanks to a study conducted by Carmen Reinhardt and Kenneth Rogoff for their book This Time It's Different. But the results have been difficult for other researchers to replicate. Now three scholars at the University of Massachusetts have done so in "Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff" and they find that the Reinhart/Rogoff result is based on opportunistic exclusion of Commonwealth data in the late-1940s, a debatable premise about how to weight the data, and most of all a sloppy Excel coding error.

Read Mike Konczal for the whole rundown, but I'll just focus on the spreadsheet part. At one point they set cell L51 equal to AVERAGE(L30:L44) when the correct procuedure was AVERAGE(L30:L49). By typing wrong, they accidentally left Denmark, Canada, Belgium, Austria, and Australia out of the average. When you run the math correctly "the average real GDP growth rate for countries carrying a public debt-to-GDP ratio of over 90 percent is actually 2.2 percent, not -0.1 percent."
austerity  politics  excel  coding  errors  bugs  spreadsheets  economics  economy 
5 weeks ago by jm
Ked
To our knowledge, Ked is the first scripting language to emerge from The People's Republic of Cork. Below is an account of what we know so far about the mysterious Corkonian language. Any suggested updates or contributions are encouraged.

Genius.
coding  cork  jokes  funny  like  languages  programming 
5 weeks ago by jm
Cap'n Proto
Cap’n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except faster. In fact, in benchmarks, Cap’n Proto is INFINITY TIMES faster than Protocol Buffers.


Basically, marshalling like writing an aligned C struct to the wire, QNX messaging protocol-style. Wasteful on space, but responds to this by suggesting compression (which is a fair point tbh). C++-only for now. I'm not seeing the same kind of support for optional data that protobufs has though. Overall I'm worried there's some useful features being omitted here...
serialization  formats  protobufs  capn-proto  protocols  coding  c++  rpc  qnx  messaging  compression  compatibility  interoperability  i14y 
7 weeks ago by jm
JPL Institutional Coding Standard for the Java Programming Language
From JPL's Laboratory for Reliable Software (LaRS). Great reference; there's some really useful recommendations here, and good explanations of familiar ones like "prefer composition over inheritance". Many are supported by FindBugs, too.

Here's the full list:

compile with checks turned on;
apply static analysis;
document public elements;
write unit tests;
use the standard naming conventions;
do not override field or class names;
make imports explicit;
do not have cyclic package and class dependencies;
obey the contract for equals();
define both equals() and hashCode();
define equals when adding fields;
define equals with parameter type Object;
do not use finalizers;
do not implement the Cloneable interface;
do not call nonfinal methods in constructors;
select composition over inheritance;
make fields private;
do not use static mutable fields;
declare immutable fields final;
initialize fields before use;
use assertions;
use annotations;
restrict method overloading;
do not assign to parameters;
do not return null arrays or collections;
do not call System.exit;
have one concept per line;
use braces in control structures;
do not have empty blocks;
use breaks in switch statements;
end switch statements with default;
terminate if-else-if with else;
restrict side effects in expressions;
use named constants for non-trivial literals;
make operator precedence explicit;
do not use reference equality;
use only short-circuit logic operators;
do not use octal values;
do not use floating point equality;
use one result type in conditional expressions;
do not use string concatenation operator in loops;
do not drop exceptions;
do not abruptly exit a finally block;
use generics;
use interfaces as types when available;
use primitive types;
do not remove literals from collections;
restrict numeric conversions;
program against data races;
program against deadlocks;
do not rely on the scheduler for synchronization;
wait and notify safely;
reduce code complexity
nasa  java  reference  guidelines  coding-standards  jpl  reliability  software  coding  oo  concurrency  findbugs  bugs 
8 weeks ago by jm
SpaceX software dev practices
Metrics rule the roost -- I guess there's been a long history of telemetry in space applications.

To make software more visible, you need to know what it is doing, he said, which means creating "metrics on everything you can think of".... Those metrics should cover areas like performance, network utilization, CPU load, and so on.

The metrics gathered, whether from testing or real-world use, should be stored as it is "incredibly valuable" to be able to go back through them, he said. For his systems, telemetry data is stored with the program metrics, as is the version of all of the code running so that everything can be reproduced if needed.

SpaceX has programs to parse the metrics data and raise an alarm when "something goes bad". It is important to automate that, Rose said, because forcing a human to do it "would suck". The same programs run on the data whether it is generated from a developer's test, from a run on the spacecraft, or from a mission. Any failures should be seen as an opportunity to add new metrics. It takes a while to "get into the rhythm" of doing so, but it is "very useful". He likes to "geek out on error reporting", using tools like libSegFault and ftrace.

Automation is important, and continuous integration is "very valuable", Rose said. He suggested building for every platform all of the time, even for "things you don't use any more". SpaceX does that and has found interesting problems when building unused code. Unit tests are run from the continuous integration system any time the code changes. "Everyone here has 100% unit test coverage", he joked, but running whatever tests are available, and creating new ones is useful. When he worked on video games, they had a test to just "warp" the character to random locations in a level and had it look in the four directions, which regularly found problems.

"Automate process processes", he said. Things like coding standards, static analysis, spaces vs. tabs, or detecting the use of Emacs should be done automatically. SpaceX has a complicated process where changes cannot be made without tickets, code review, signoffs, and so forth, but all of that is checked automatically. If static analysis is part of the workflow, make it such that the code will not build unless it passes that analysis step.

When the build fails, it should "fail loudly" with a "monitor that starts flashing red" and email to everyone on the team. When that happens, you should "respond immediately" to fix the problem. In his team, they have a full-size Justin Bieber cutout that gets placed facing the team member who broke the build. They found that "100% of software engineers don't like Justin Bieber", and will work quickly to fix the build problem.
spacex  dev  coding  metrics  deplyment  production  space  justin-bieber 
8 weeks ago by jm
CS in VN
Neil Fraser visits a school in Vietnam, and investigates their computer science curriculum. They are doing an incredible job, it looks like -- very impressive!
vietnam  programming  education  cs  computer-science  schools  coding  children 
9 weeks ago by jm
Data Corruption To Go: The Perils Of sql_mode = NULL « Code as Craft
bloody hell. A load of cases where MySQL will happily accommodate all sorts of malformed and invalid input -- thankfully with fixes.

Also includes a very nifty example of Etsy tee'ing their production db traffic (30k pps in and out) via tcpdump and pt-query-digest to a test database host. Fantastic hackery
mysql  input  corrupt  invalid  validation  coding  databases  sql  testing  tcpdump  percona  pt-query-digest  tee 
9 weeks ago by jm
Single Producer/Consumer lock free Queue step by step
great dissection of Martin "Disruptor" Thompson's lock-free single-producer/single-consumer queue data structure, with benchmark results showing crazy speedups. This is particularly useful since it's a data structure that can be used to provide good lock-free speedups without adopting the entire Disruptor design pattern.
disruptor  coding  java  jvm  martin-thompson  lock-free  volatile  atomic  queue  data-structures 
9 weeks ago by jm
From a monolithic Ruby on Rails app to the JVM
How Soundcloud have ditched the monolithic Rails for nimbler, small-scale distributed polyglot services running on the JVM
soundcloud  rails  slides  jvm  scalability  ruby  scala  clojure  coding 
9 weeks ago by jm
4 Things Java Programmers Can Learn from Clojure (without learning Clojure)
'1. Use immutable values; 2. Do no work in the constructor; 3. Program to small interfaces; 4. Represent computation, not the world'. Strongly agreed with #1, and the others look interesting too
clojure  lisp  design  programming  coding  java 
11 weeks ago by jm
Test-Driven Infrastructure with Chef
Interesting idea.
The book introduces “Infrastructure as Code,” test-driven development, Chef, and cucumber-chef, and then proceeds to a simple example using Chef to provision a shared Linux server. The recipes for the server are developed test-first, demonstrating both the technique and the workflow.
tdd  chef  server  provisioning  build  deploy  linux  coding  ops  sysadmin 
11 weeks ago by jm
#AltDevBlogADay » Latency Mitigation Strategies
John Carmack on the low-latency coding techniques used to support head mounted display devices.

Virtual reality (VR) is one of the most demanding human-in-the-loop applications from a latency standpoint. The latency between the physical movement of a user’s head and updated photons from a head mounted display reaching their eyes is one of the most critical factors in providing a high quality experience.

Human sensory systems can detect very small relative delays in parts of the visual or, especially, audio fields, but when absolute delays are below approximately 20 milliseconds they are generally imperceptible. Interactive 3D systems today typically have latencies that are several times that figure, but alternate configurations of the same hardware components can allow that target to be reached.

A discussion of the sources of latency throughout a system follows, along with techniques for reducing the latency in the processing done on the host system.
head-mounted-display  display  ui  latency  vision  coding  john-carmack 
february 2013 by jm
Crash-only software
I couldn't remember the name for this design principle, so it's worth a bookmark to remind me in future...

'This refers to computer programs that handle failures by simply restarting, without attempting any sophisticated recovery. Correctly written components of crash-only software can microreboot to a known-good state without the help of a user. Since failure-handling and normal startup use the same methods, this can increase the chance that bugs in failure-handling code will be noticed.'
crashing  crash-only-software  design  architecture  coding  software  fault-tolerance  erlang  let-it-fail  microreboot  recovery  autosave 
february 2013 by jm
clearspring / stream-lib
ASL-licensed open source library of stream-processing/approximation algorithms: count-min sketch, space-saving top-k, cardinality estimation, LogLog, HyperLogLog, MurmurHash, lookup3 hash, Bloom filters, q-digest, stochastic top-k
algorithms  coding  streams  cep  stream-processing  approximation  probabilistic  space-saving  top-k  cardinality  estimation  bloom-filters  q-digest  loglog  hyperloglog  murmurhash  lookup3 
february 2013 by jm
Real-time Analytics in Scala [slides, PDF]
some good approximation/streaming algorithms and tips on Scala implementation
streams  algorithms  approximation  coding  scala  slides 
february 2013 by jm
Jetty-9 goes fast with Mechanical Sympathy
This is very cool! Applying Mechanical Sympathy optimization techniques to Jetty, specifically: "False sharing" on the BlockingArrayQueue data structure resolved; a new ArrayTernaryTrie data structure to improve header field storage, making it faster to build. look up, efficient on RAM, cheap to GC, and more cache-friendly than a traditional trie; and a branchless hex-to-byte conversion statement. The results are a 30%-faster microbenchmark on amd64, with 50% less Young Gen garbage collections. Lovely to see low-level infrastructure libs like Jetty getting this kind of optimization.
jetty  java  mechanical-sympathy  optimization  coding  tries 
february 2013 by jm
Programming Language Checklist
'You appear to be advocating a new:
[ ] functional [ ] imperative [ ] object-oriented [ ] procedural [ ] stack-based
[ ] "multi-paradigm" [ ] lazy [ ] eager [ ] statically-typed [ ] dynamically-typed
[ ] pure [ ] impure [ ] non-hygienic [ ] visual [ ] beginner-friendly
[ ] non-programmer-friendly [ ] completely incomprehensible
programming language. Your language will not work. Here is why it will not work.'
humor  programming  funny  coding  languages 
february 2013 by jm
"Security Engineering" now online in full
Ross Anderson says: 'I’m delighted to announce that my book Security Engineering – A Guide to Building Dependable Distributed Systems is now available free online in its entirety. You may download any or all of the chapters from the book’s web page.'
security  books  reference  coding  software  encryption  ross-anderson 
february 2013 by jm
Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com
Using new Intel Core i7 instructions to speed up string manipulation. Fascinating stuff. SSE ftw
sse  optimization  simd  assembly  intel  i7  intel-core  strstr  strings  string-matching  strchr  strlen  coding 
january 2013 by jm
Scala 2.8 Collections API -- Performance Characteristics
wow. Every library vending a set of collection types should have a page like this
collections  scala  performance  reference  complexity  big-o  coding 
january 2013 by jm
Notes on Distributed Systems for Young Bloods -- Something Similar
'Below is a list of some lessons I’ve learned as a distributed systems engineer that are worth being told to a new engineer. Some are subtle, and some are surprising, but none are controversial. This list is for the new distributed systems engineer to guide their thinking about the field they are taking on. It’s not comprehensive, but it’s a good beginning.' This is a pretty nice list, a little over-stated, but that's the format. I particularly like the following: 'Exploit data-locality'; 'Learn to estimate your capacity'; 'Metrics are the only way to get your job done'; 'Use percentiles, not averages'; 'Extract services'.
systems  distributed  distcomp  cap  metrics  coding 
january 2013 by jm
Effective Scala
Twitter's Scala style guide. 'While highly effective, Scala is also a large language, and our experiences have taught us to practice great care in its application. What are its pitfalls? Which features do we embrace, which do we eschew? When do we employ “purely functional style”, and when do we avoid it? In other words: what have we found to be an effective use of the language? This guide attempts to distill our experience into short essays, providing a set of best practices. Our use of Scala is mainly for creating high volume services that form distributed systems — and our advice is thus biased — but most of the advice herein should translate naturally to other domains.'
twitter  scala  coding  style 
january 2013 by jm
"Matters Computational - Ideas, Algorithms, Source Code"
A hefty tome (in PDF format) containing lots of interesting algorithms and computational tricks; code is GPLv3 licensed
coding  algorithms  computation  via:cliffc  pdf  books 
january 2013 by jm
Shell Scripts Are Like Gremlins
Shell Scripts are like Gremlins. You start out with one adorably cute shell script. You commented it and it does one thing really well. It’s easy to read, everyone can use it. It’s awesome! Then you accidentally spill some water on it, or feed it late one night and omgwtf is happening!?


+1. I have to wean myself off the habit of automating with shell scripts where a clean, well-unit-tested piece of code would work better.
shell-scripts  scripting  coding  automation  sysadmin  devops  chef  deployment 
december 2012 by jm
The Mathematical Hacker
'The trouble with the Lisp-hacker tradition is that it is overly focused on the problem of programming -- compilers, abstraction, editors, and so forth -- rather than the problems outside the programmer's cubicle. I conjecture that the Lisp-school essayists -- Raymond, Graham, and Yegge -- have not “needed mathematics” because they spend their time worrying about how to make code more abstract. This kind of thinking may lead to compact, powerful code bases, but in the language of economics, there is an opportunity cost.'
mathematics  coding  maths  essay  hackers  lisp  fortran 
december 2012 by jm
Special encoding of small aggregate data types in Redis
Nice performance trick in Redis on hash storage:

'In theory in order to guarantee that we perform lookups in constant time (also known as O(1) in big O notation) there is the need to use a data structure with a constant time complexity in the average case, like an hash table. But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains will grow too much (you can configure the limit in redis.conf). This does not work well just from the point of view of time complexity, but also from the point of view of constant times, since a linear array of key value pairs happens to play very well with the CPU cache (it has a better cache locality than an hash table).'
memory  redis  performance  big-o  hash-tables  storage  coding  cache  arrays 
november 2012 by jm
Does it run Minecraft? Well, since you ask…
Going by the number of Minecraft fans among my friends' sons and daughters in the 8-12 age group, this is a great idea:
We sent a bunch of [Raspberry Pi] boards out to Notch and the guys at Mojang in Stockholm a little while back, and they’ve produced a port of Minecraft: Pocket Edition which they’re calling  Minecraft: Pi Edition. It’ll carry a revised feature set and support for several programming languages, so you can code direct into Minecraft before you start playing. (Or you can just – you know – play.)
minecraft  gaming  programming  coding  raspberry-pi  kids  learning  education 
november 2012 by jm
John Carmack's .plan update from 10/14/98
John Carmack presciently defines the benefits of an event sourcing architecture in 1998, as a key part of Quake 3's design:

"The key point: Journaling of time along with other inputs turns a
realtime application into a batch process, with all the attendant
benefits for quality control and debugging. These problems, and
many more, just go away. With a full input trace, you can accurately
restart the session and play back to any point (conditional
breakpoint on a frame number), or let a session play back at an
arbitrarily degraded speed, but cover exactly the same code paths."

(This was the first time I'd heard of the concept, at least.)
john-carmack  design  software  coding  event-sourcing  events  quake-3 
november 2012 by jm
#AltDevBlogADay » Functional Programming in C++
John Carmack makes a case for writing C++ in an FP style, with wide use of const and pure functions. something similar can be achieved in pure Java using Guava's Immutable types, to a certain extent. I love his other posts on this site -- he argues persuasively for static code analysis and keeping multiple alternative subsystem implementations, too
c++  programming  functional-programming  fp  coding  john-carmack  const  immutability 
october 2012 by jm
On Being A Senior Engineer
Encyclopedic post from John Allspaw (of Etsy) on the topic, with an "Obligatory [List Of] Pithy Characteristics"
senior  engineering  career  tech  coding  work 
october 2012 by jm
ElementCostInDataStructures
"The cost per element in major data structures offered by Java and Guava (r11)]." A very useful reference!

Ever wondered what's the cost of adding each entry to a HashMap? Or one new element in a TreeSet? Here are the answers: the cost per-entry for each well-known structure in Java and Guava. You can use this to estimate the cost of a structure, like this: if the per-entry cost of a structure is 32 bytes, and your structure contains 1024 elements, the structure's footprint will be around 32 kilobytes. Note that non-tree mutable structures are amortized (adding an element might trigger a resize, and be expensive, otherwise it would be cheap), making the measurement of the "average per element cost" measurement hard, but you can expect that the real answers are close to what is reported below.
java  coding  guava  reference  memory  cost  performance  data-structures 
october 2012 by jm
Estonia introduces coding classes to 8-year-olds
'ProgreTiiger education will start with students in the first grade, which starts around the age of 7 or 8 for Estonians. The compsci education will continue through a student’s final years of public school, around age 16. Teachers are being trained on the new skills, and private sector IT companies are also getting involved, which makes sense, given that these entities will likely end up being the long-term beneficiaries of a technologically literate populace. The ProgreTiiger program is launching at a few pilot schools and will soon be rolling out to all general education schools in Estonia.'
estonia  education  coding  programming  kids  children  students  learning  school 
september 2012 by jm
Martin "Disruptor" Thompson's Single Writer Principle
Contains these millisecond estimates for highly-contended inter-thread signalling when incrementing a 64-bit counter in java:
One Thread 300<br>
One Thread with Memory Barrier 4,700<br>
One Thread with CAS 5,700<br>
Two Threads with CAS 18,000<br>
One Thread with Lock 10,000<br>
Two Threads with Lock 118,000<br>


Undoubtedly not realistic for a lot of cases, but it's still useful for order-of-magnitude estimates of locking cost. Bottom line: don't lock if you can avoid it, even with 'volatile' or AtomicFoo types.
java  jvm  performance  coding  concurrency  threading  cas  locking 
september 2012 by jm
HotSpot JVM garbage collection options cheat sheet (v2)
'In this article I have collected a list of options related to GC tuning in JVM. This is not a comprehensive list, I have only collected options which I use in practice (or at least understand why I may want to use them).
Compared to previous version a few useful diagnostic options was added. Additionally section for G1 specific options was introduced.'
hotspot  jvm  coding  gc  java  performance 
september 2012 by jm
Striped (Guava: Google Core Libraries for Java 13.0.1 API)
Nice piece of Guava concurrency infrastructure in the latest release:
A striped Lock/Semaphore/ReadWriteLock. This offers the underlying lock striping similar to that of ConcurrentHashMap in a reusable form, and extends it for semaphores and read-write locks. Conceptually, lock striping is the technique of dividing a lock into many stripes, increasing the granularity of a single lock and allowing independent operations to lock different stripes and proceed concurrently, instead of creating contention for a single lock.<br>

The guarantee provided by this class is that equal keys lead to the same lock (or semaphore), i.e. if (key1.equals(key2)) then striped.get(key1) == striped.get(key2) (assuming Object.hashCode() is correctly implemented for the keys). Note that if key1 is not equal to key2, it is not guaranteed that striped.get(key1) != striped.get(key2); the elements might nevertheless be mapped to the same lock. The lower the number of stripes, the higher the probability of this happening.<br>

Prior to this class, one might be tempted to use Map<K, Lock>, where K represents the task. This maximizes concurrency by having each unique key mapped to a unique lock, but also maximizes memory footprint. On the other extreme, one could use a single lock for all tasks, which minimizes memory footprint but also minimizes concurrency. Instead of choosing either of these extremes, Striped allows the user to trade between required concurrency and memory footprint. For example, if a set of tasks are CPU-bound, one could easily create a very compact Striped<Lock> of availableProcessors() * 4 stripes, instead of possibly thousands of locks which could be created in a Map<K, Lock> structure.
locking  concurrency  java  guava  semaphores  coding  via:twitter 
september 2012 by jm
1024cores
Some good algorithms and notes by Dmitry Vyukov on 'lockfree, waitfree, obstruction-free synchronization algorithms and data structures, scalability-oriented architecture, multicore/multiprocessor design patterns, high-performance computing, threading technologies and libraries (OpenMP, TBB, PPL), message-passing systems and related topics.' The catalog of lock-free queue implementations is particularly extensive (via Sergio Bossa)
algorithms  concurrency  articles  dmitry-vyukov  go  c++  coding  via:sergio-bossa 
august 2012 by jm
Rootbeer
The Rootbeer GPU Compiler makes it easy to use Graphics Processing Units from
within Java.

Rootbeer is more advanced that CUDA or OpenCL Java Language Bindings. With
bindings the developer must serialize complex graphs of objects into arrays
of primitive types. With Rootbeer this is done automatically. Also with language
bindings, the developer must write the GPU kernel in CUDA or OpenCL. With
Rootbeer a static analysis of the Java Bytecode is done (using Soot) and CUDA
code is automatically generated.

[...] All of the familar Java code you have been writing can be
executed on the GPU.
gpu  java  coding  cuda  compiler 
august 2012 by jm
the recruiter honeypot
wow, I thought it was hard hiring in Dublin. Sounds like Silicon Valley is insane.

"Unfortunately, it’s not all about the numbers. Though external recruiters perform well for start-ups, there’s another side to this story. It pains me to write this but I think it’s important to share. Meebo employed lots of external recruiters when we were getting off the ground. We had standard 18-month no-poach restrictions with all of our contractors that specified that those recruiters were not allowed to contact Meebo employees within 18 months of our contract expiring. Most of those contracts expired in 2008-2009.

However, every recruiter and firm we’d worked with who was still in the recruiting business tried to poach [the 'honeypot' employee] Pete London."

(Another lesson: don't build a product in javascript, since it's impossible to hire engineers ;)
honeypots  hiring  silicon-valley  recruiting  coding  experts  meebo 
june 2012 by jm
Open Data Structures
A free-as-in-speech as well as -beer textbook of data structures, covering a great range, including some I hadn't heard of before. Here's the full list: ArrayStack, FastArrayStack, ArrayQueue, ArrayDeque, DualArrayDeque, RootishArrayStack, SLList, DLList,
SEList, SkiplistSSet, SkiplistList, ChainedHashTable, LinearHashTable, BinaryTree, BinarySearchTree, Treap, ScapegoatTree, RedBlackTree, BinaryHeap, MeldableHeap, AdjacencyMatrix, AdjacencyLists, BinaryTrie, XFastTrie, and YFastTrie
algorithms  books  data-structures  computer-science  coding  tries  skiplists  arrays  queues  heap  trees  graphs  hashtables 
may 2012 by jm
An IDE is not enough
Very thought-provoking response to that 'Light Table' demo which went round the aggregators a couple of weeks back. 'The fundamental reason IDEs have dead-ended is that they are constrained by the syntax and semantics of our programming languages. Our programming languages were all designed to be used with a text editor. It is therefore not surprising that our IDEs amount to tarted-up text editors. Likewise our programming languages were all designed with an imperative semantics that efficiently matches the hardware but defies static visualization. Indeed it would be a miracle if we could slap a new IDE on top of an old language and magically alter its syntactic and semantic assumptions. I don’t believe in miracles. Languages and IDEs have co-evolved and neither can change without the other also changing. That is why three years ago I put aside my IDE work to focus on language design. Getting rid of imperative semantics is one of the goals. Another is getting rid of source text files (as well as ASTs, which carry all the baggage of a textual encoding minus the readability). This has turned out to be really really hard. And lonely – no one wants to even talk about these crazy ideas. Nevertheless I firmly believe that so long as we are programming in decendants of assembly language we will continue to program in descendants of text editors.' (via Chris Horn)
via:cjhorn  ide  programming  coding  programming-languages  semantics  syntax  source-code  text 
may 2012 by jm
Chronon DVR for Java
"record entire execution of your Java app; play it back on any machine". Other features: time-travelling debugger -- step backwards, jump to any point in execution, designed for long running programs; post-execution logging -- add log statements after the program has run, and see what it would have logged. Looks extremely nifty, but I wonder how big those recording files get...
debugging  via:peakscale  eclipse  chronon  dvr  java  coding  logging  jvm 
may 2012 by jm
Ask For Forgiveness Programming - Or How We'll Program 1000 Cores
Nifty concept from IBM Research's David Ungar -- "race-and-repair". Simply put, allow lock-free lossy/inconsistent calculation, and backfill later, using concepts like "freshener" threads, to reconcile inconsistencies. This is a familiar concept in distributed computing nowadays thanks to CAP, but I hadn't heard it being applied to single-host multicore parallel programming before -- I can already think of an application in our codebase...
race-and-repair  concurrency  coding  ibm  parallelism  parallel  david-ungar  cap  multicore 
april 2012 by jm
Girls and coding: female peer pressure scares them off | Education | The Observer
'Coding and digital prowess is still niche at a young age, self-taught by the studious. It is often considered a bit nerdy in senior school, where it is not currently taught as a part of the curriculum, although this is changing in senior schools from September 2012. Therefore, generally speaking, those who code have taught themselves. Teaching yourself something that should really be covered as a part of lessons is a bit like doing extra homework – why, ask many teens, would anyone do that? There is no way the majority of hormonally challenged, desperate-to-find-their-place-in-the-world teenage girls would risk ridicule or isolation by doing such a thing – let alone be open and proud about it. (Boys of the same age have different social challenges and do not measure their societal worth so much by peer review.)'
girls  coding  education  peer-pressure  software  teaching  kids 
march 2012 by jm
Google Guava BloomFIlter
neat, Guava now has a builtin Bloom filter implementation using the murmur hash. that'll potentially save a little hassle in the future
guava  coding  java  bloom-filters  data-structures  sets 
march 2012 by jm
The day I tried teaching primary school kids to code (and succeeded)
via Niamh -- 'I learned a bit about teaching at primary level and I learned that it is pretty fun although REALLY hard work! I learned that if you make a complex subject engaging kids will learn it and are probably capable of a great deal more than they are often given credit for. The youngest kids on the day were year four which is aged 8-9 and although they were definitely more able than some of their peers, you can expect that by year 5-6 (aged 9-11) probably a lot of the kids could follow it and indeed learn to code.'
coding  education  kids  programming  teaching  school 
march 2012 by jm
JS1k, 1k demo submission
a speech synthesizer in 1 KB of javascript. truly awesome, nice work by @p01
js1k  javascript  demos  speech  hacks  coding 
march 2012 by jm
SiliconRepublic story on CoderDojo
'it's both incredible and poignant that a voluntary movement that was born in Ireland during the summer is about to go international. Coder Dojo, the brainchild of 19-year-old entrepreneur and programmer James Whelton from Cork and tech entrepreneur Bill Liao, began as a Saturday morning club for kids to teach each other software programming. It has grown into a national movement up and down Ireland, a place where kids and their parents can go and learn to write software code in a friendly environment. The first UK Coder Dojo was held in London only last week and other countries in Europe are clamouring to get the initiative started there, too.' Good on them!
coderdojo  programming  coding  kids  children  teaching  education  tech  ireland 
december 2011 by jm
peak6/scala-ssh-shell - GitHub
'Backdoor that gives you a scala shell over ssh on your jvm. The shell is not sandboxed, anyone access the shell can touch anything in the jvm and do anything the jvm can do including modifying and deleting files, etc.' nifty!
scala  ssh  repl  interactive  debugging  coding  jvm  java 
october 2011 by jm
A few git tips you didn't know about
'git checkout -t' alone is worth the bookmark
git  tips  coding  unix  reference  tricks  via:proggit 
september 2011 by jm
'What Idiot Wrote The Patent That Might Invalidate Software Patents? Oh, Wait, That Was Me' | Techdirt
'So I was thinking - great they invalidated software patents, lets see what crappy patent written by an idiot they picked to do it - then I realized the idiot in question was me :-)

Not sure how I feel about this.

John - inventor of the patent in question.'
patents  swpats  reform  usa  software-development  coding  funny  techdirt 
august 2011 by jm
Scala: The Static Language that Feels Dynamic
a good intro from Bruce Eckel. We need a good excuse to deploy some Scala ;)
scala  actors  java  language  programming  jvm  coding 
june 2011 by jm
pyflakes.vim - on-the-fly Python code checking in Vim
Vim gets a good IDE feature. 'highlights common Python errors like misspelling a variable name on the fly. It also warns about unused imports, redefined functions, etc.'
ide  vim  python  programming  via:preddit  coding 
april 2011 by jm
Bulletproof Node.js Coding
lots of patterns to write safe node.js code.  Pretty daunting, to be honest
javascript  node.js  coding  programming  async  from delicious
march 2011 by jm
Gerrit, Git and Jenkins
This is the future of code review. Commit directly from your git checkout to the Gerrit code-review system; change is immediately web-visible and enters the review workflow; at the same time, Jenkins checks out the proposed change and runs the test suite; once it's approved, it automatically gets checked in. Brilliant!
git  coding  code-review  workflows  jenkins  gerrit  c-i  testing  automation  from delicious
february 2011 by jm
Contracts for Java
'Preconditions, postconditions, and invariants are added as Java boolean expressions inside annotations.'  nice
java  google  coding  open-source  contracts  eiffel  preconditions  invariants  annotations  from delicious
february 2011 by jm
The things make got right (and how to make it better)
jgc provides a good demonstration of how a general-purpose programming language tends to make a crappy DSL -- specifically Rakefiles
dsl  build  make  coding  jgc  languages  configuration  makefiles  rake  ruby  from delicious
january 2011 by jm
good Hacker News thread on djb's "redo"
YA make-replacement build system. the thread is better than the linked article, btw
hacker-news  via:fanf  make  build  djb  redo  compilation  building  coding  open-source  from delicious
january 2011 by jm
Rules of SCRAM
'GOATS just stand around during this phase and stare at each other, rolling their eyes frequently at howlers (such as using serialization to SOAP for storage, or databases as RPC mechanisms). It is often useful for GOATS — or anybody, really — to take notes for the monthly BACKSTABBING drill.'
funny  scrum  software  project-management  coding  work  from delicious
january 2011 by jm
The Day MAME Saved My Ass
'Publishers would have people believe that MAME and the emulation scene is the root of all evil, that it promotes piracy and ultimately hurts the poor, starving developers slaving away on the game. Not only is this claim patently false, it ignores the fact that many developers use things like MAME, mod chips, and homebrew development utilities to help us overcome the day-to-day frustrations caused by the people behind the real problems in our industry.'
mame  games  coding  legal  spy-hunter  emulation  rips  takedowns  from delicious
december 2010 by jm
Andrew Tridgell's pair programming experience
Pair-programming with another SAMBA developer over the course of a year, using a SIP server and a VNC-shared desktop. very positive review indeed
pair-programming  xp  tridge  coding  collaboration  agile  vnc  sip  from delicious
december 2010 by jm
The Effectiveness of Test Driven Development (TDD)
huh. Test-driven development is slower than traditional write-first-test-at-the-end development, but it results in less bugs. Grokcode theorise that its big win is amortising the cost of testing throughout the product iteration, hence reducing the temptation to skip testing when the crunch phase happens
tdd  programming  testing  qa  coding  from delicious
december 2010 by jm
« earlier      
per page:    204080120160

related tags

1980s  actors  advice  age  agile  algorithms  allan-klumpp  annotations  api  apis  apollo-program  approximation  architecture  architecture-astronauts  archival  arrays  articles  assembly  async  atomic  austerity  automation  autosave  benchmarks  best-practices  big-o  block-oriented  bloom-filters  book  books  branch  branching  bst  bug-tracking  bugs  build  building  c  c++  c-i  c=64  cache  cap  capn-proto  cardinality  career  cas  cep  cheat-sheet  chef  children  chrome  chronon  cli  client-side  clojure  code  code-digger  code-review  code-reviews  coderdojo  coding  coding-standards  collaboration  collections  compatibility  compilation  compiler  compilers  complexity  compression  computation  computer-science  concurrency  configuration  const  constraint-solving  contracts  cork  corrupt  cost  crash-only-software  crashing  cs  csail  css  cuda  dashcode  data  data-structures  databases  david-ungar  debt  debugger  debugging  demos  dependency-injection  deploy  deployment  deplyment  design  dev  development  devops  display  disruptor  distcomp  distributed  djb  dmitry-vyukov  don-eyles  dot-net  download  dry  dsl  duct-tape  dvr  eclipse  economics  economy  education  eiffel  elitism  emulation  encapsulation  encryption  engineering  engines  erlang  errors  essay  estimation  estonia  event-sourcing  events  evernote  evolution  excel  experts  extensions  false-positives  fault-tolerance  final  finance  findbugs  firefox  flickr  fluent-interfaces  formats  fortran  fp  free  frp  fsm  functional  functional-programming  funny  fuzzy-matching  ga  games  gaming  gc  gdb  geek  genetic-algorithms  gerrit  gil  girls  git  gmail  go  google  google-drive  gpu  graphs  guardian  guava  guidelines  hacker-news  hackers  hacking  hacks  hardware  hash-tables  hashing  hashtables  head-mounted-display  heap  hijack  hiring  history  hobbies  honeypots  hotspot  html  http  humor  hyperloglog  i7  i14y  ibm  ide  ides  immutability  input  integers  integration  intel  intel-core  interactive  interfaces  internet  interoperability  interpreters  interviews  invalid  invariants  iphone  irb  ireland  james-hamilton  jargon  java  javascript  jenkins  jetty  jgc  joel-spolsky  john-carmack  jokes  jpl  jpmorgan  jq  js  js1k  json  justin-bieber  jvm  jwz  kernel  kids  knowledge  language  languages  latency  learning  lectures  legal  leonard-richardson  let-it-fail  libraries  library  life  like  linux  lisp  live  load-balancing  lock-free  locking  log4j  logging  loglog  london-whale  lookup3  lua  lucene  magic  make  makefiles  mame  management  martin-fowler  martin-thompson  mathematics  maths  matrix  measurement  mechanical-sympathy  meebo  memory  messaging  metrics  microreboot  microsoft  minecraft  mit  multicore  murmurhash  mysql  nasa  ncsu  neologisms  netflix  node.js  nostalgia  observable  oo  oop  open-source  ops  optimization  ouch  overengineering  pair-programming  parallel  parallelism  patents  patterns  paul-krugman  pdf  peer-pressure  percona  performance  philosophy  politics  preconditions  premature-flexibilization  probabilistic  production  profiling  programming  programming-languages  project-management  prophet  protobufs  protocols  provisioning  pt-query-digest  pthreads  puzzles  python  q-digest  qa  qnx  quake-3  quality  quants  questions  queue  queues  race-and-repair  rails  rake  raspberry-pi  reactive  recovery  recruiting  redis  redo  reference  reform  refuctoring  reliability  repl  rest  restful  rips  ross-anderson  rpc  ruby  rubygems  rusty-russell  rx  safety  scala  scalability  scaling  school  schools  scripting  scrum  sd  sde  sde-fundamentals  search  security  sed  semantics  semaphores  senior  serialization  server  services  sets  shell-scripts  silicon-valley  simd  sip  skills  skiplists  slang  slides  software  software-development  solver  sorting  soundcloud  source-code  space  space-saving  spacex  specifications  speech  speed  spreadsheets  spy-hunter  sql  sse  ssh  stack-overflow  starcraft  storage  strchr  stream-processing  streams  string-matching  stringly-typed  strings  strlen  strstr  students  studies  style  swpats  synchronization  syntax  sysadmin  systems  takedowns  tcpdump  tdd  teaching  tech  techdirt  tee  testability  testing  tests  text  text-matching  the-duck  thread-safety  threading  threads  tips  tools  top-k  toread  trac  trees  tricks  tridge  tries  tuning  turing-complete  twisted  twitter  ui  unit-testing  unit-tests  unix  usa  user-scripts  validation  value-at-risk  version-control  via:cjhorn  via:cliffc  via:fanf  via:iamcal  via:janl  via:jzawodny  via:Mozai  via:peakscale  via:preddit  via:proggit  via:sergio-bossa  via:twitter  video  vietnam  vim  vision  vnc  volatile  web  web-services  witchcraft  work  workflows  wtf  xp  yagni  zerg-rush 

Copy this bookmark:



description:


tags: