1640
David MacKay: How the Laws of Physics Constrain Our Sustainable Energy Options
Adding Sustainable Energy: Without the Hot Air to the reading list. From TEDxWarwick.
video  talks  energy  poltics  to:twitter  to:linkedin  to:facebook 
16 days ago
"We Have Met The Enemy... And He Is Us"
Lessons from Twenty Years of the Kauffman Foundation's Investments in Venture Capital Funds and The Triumph of Hope over Experience.
investing  funding  vc 
18 days ago
Airport express not connecting to iTunes
I haven't been able to stream from iTunes to my Airport Express 802.11g. Disabling IPv6 fixed the issue.
airplay  airportexpress  itunes  osx 
18 days ago
The Taste of War
Lizzie Collingham soberly argues that the expansionist designs of both Nazi Germany and imperial Japan must be understood within a world political economy in which the single crucial commodity was food.
nytimes  reviews  books  to:twitter 
18 days ago
Bring back the 40-hour work week
This is what work looks like now. It’s been this way for so long that most American workers don’t realize that for most of the 20th century, the broad consensus among American business leaders was that working people more than 40 hours a week was stupid, wasteful, dangerous and expensive — and the most telling sign of dangerously incompetent management to boot.
The most essential thing to know about the 40-hour work-week is that, while it was the unions that pushed it, business leaders ultimately went along with it because their own data convinced them this was a solid, hard-nosed business decision.
In fact, research shows that knowledge workers actually have fewer good hours in a day than manual laborers do — on average, about six hours, as opposed to eight.
And finally: these death marches take a longer-term productivity toll as well. Once the crisis has passed and that 60-hour-a-week team gets to go back to its regular 40, it can take several more weeks before the burnout begins to lift enough for them to resume their typical productivity level. So, for a while, you’ll get significantly less than a full 40 out of them.
salon  jobs  work  productivity  management 
23 days ago
The Jig Is Up: Time to Get Past Facebook and Invent a New Future
Alexis Madrigal raises some great points about the current state of technology innovation, what we build, and the advertising model.
innovation  technology  startup  business  to:twitter  to:facebook  to:linkedin  theatlantic 
24 days ago
Does Pleiades have an API?
Sean Gillies has thoughts on API and exporting data.
api  data  engineering  development  to:twitter  to:linkedin 
24 days ago
Publishing Open Data – Do you really need an API?
Peter Krantz suggets using file dumps over APIs in some situations.
data  api  development  engineering  to:twitter  to:linkedin 
24 days ago
Dealing With ICD-10
How NLP is helping the move to ICD-10.
health  medicine  coding  nlp  to:linkedin  to:twitter 
24 days ago
How one man escaped from a North Korean prison camp
Edited extract from Escape From Camp 14, by Blaine Harden.
guardian  northkorea  to:twitter  books 
24 days ago
Innovation Starvation by Neal Stephenson
"Still, I worry that our inability to match the achievements of the 1960s space program might be symptomatic of a general failure of our society to get big things done."
innovation  technology  progress  science  research  sciencefiction  to:twitter  to:linkedin  to:facebook 
25 days ago
The Cabal: Valve’s Design Process For Creating Half-Life
The Cabal exists because the mythical person you need does not.

After reaching the end of the original Half-Life schedule Valve started again:
We set up a small group of people to take every silly idea, every cool trick, everything interesting that existed in any kind of working state somewhere in the game and put them into a single prototype level. ... They all worked together on this one small level for a month while the rest of us basically did nothing. When they were done, we all played it. It was great. It was Die Hard meets Evil Dead. It was the vision. It was going to be our game.

On looking for the game designer:
We looked at hundreds of resumes and interviewed a lot of promising applicants, but no one we looked at had enough of the qualities we wanted for us to seriously consider them the overall godlike “game designer” that we were told we needed. In the end, we came to the conclusion that this ideal person didn’t actually exist. Instead, we would create our own ideal by combining the strengths of a cross section of the company, putting them together in a group we called the “Cabal.”

The components of the Cabal:
The Cabal consisted only of people that had actual shipping components in the game; there were no dedicated designers. Every member of the Cabal was someone with the responsibility of actually doing the work that their design specified, or at least had the ability to do it if need be. ... Internally, once the success of the Cabal process was obvious, mini-Cabals were formed to come up with answers to a variety of design problems. These mini-Cabals would typically include people most effected by the decision, as well as try to include people completely outside the problem being addressed in order to keep a fresh perspective on things.

A tip:
Write down everything. Brainstorming is fine during the meetings, but unless it’s all written down, your best ideas will be forgotten within days. The goal is to end up with a document that captures as much as is reasonable about your game, and more importantly answers questions about what people need to work on.
design  games  to:twitter  to:linkedin  management  development 
25 days ago
As We May Think - Dr. Vannevar Bush (1945)
From Atlantic Magazine July 1945. "The world has arrived at an age of cheap complex devices of great reliability; and something is bound to come of it."
theatlantic  science  progress  history  technology  research  to:twitter  to:linkedin 
25 days ago
Anatomy Of An Idea by Steven Johnson
"The discovery process is remarkably social, and the social interactions come in amazingly diverse forms."
ideas  information  to:twitter  sharing  to:linkedin  to:facebook 
28 days ago
Shark
Lighting Fast Data Warehouse System. Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can answer Hive QL queries up to 30 times faster than Hive without modification to the existing data nor queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions.
database  data  olap  hadoop  hive  spark  to:twitter  to:linkedin 
4 weeks ago
Jets’ Trevor Pryce Is Retired, and Getting Tired of It
With millions of Americans out of work or doing work for which they are overqualified, I consider myself lucky. But starting from scratch can be unsettling. If you’re not prepared for it, retirement can become a form of self-imposed exile from the fulfillment and the exhilaration of knowing you did a good job.

Many people retire around 65. I will turn 37 this summer, yet like all former N.F.L. players, I face greater health risks, both physical and psychological, that compound my fears.
nfl  nytimes  work 
4 weeks ago
Clustering Related Stories
Prismatic's @jrfinkel on feature space and speeding things up
clustering  applications  nlp  machinelearning  to:twitter  to:linkedin 
5 weeks ago
Valve: How I Got Here, What It’s Like, and What I’m Doing
Culture, trust, informal consensus at Value. And a throw away quote about NLP.
business  management  games  to:twitter  to:linkedin 
5 weeks ago
Dyson Dog grooming
Pretty awesome, watch the videos. Say goodbye to tranquil grooming though.
dog  to:facebook  to:twitter 
5 weeks ago
What are the keys to operationalizing a machine learning ranking system from an organization and/or engineering management point of view? - Quora
Brandon Ballinger, working on a machine learning startup

I worked on speech recognition and ads at Google, both of which have machine learning at their core. Here are some important lessons I learned about how to operationalize a machine learning system "in the wild":

Make your success metric user happiness. Traditional accuracy measures like precision, square error, ROC, etc. don't capture what you really care about--how users react to your model. For example, if you're running an ad system, your metrics should be revenue per pageview and click through rate. It's completely possible to have a model with a reduced error rate which lowers revenue, due to Simpson's paradox and a host of other reasons.

A drop in core metrics should generate a page. You should treat a model whose metrics nosedive the same as a sever that goes down or a database with bad data. Whoever is oncall should try to diagnose the problem quickly, pulling in others if necessary to fix the problem.

A/B test every model launch. You should always be able to run two models in parallel, showing different models to different users and comparing user behavior. That's the only way to really know you're doing the right thing.

Beware of ugly duckling effects. Machine learning systems tend to become more accurate when they can learn from their own mistakes. That means new models are at an inherent disadvantage: your historical training data contains mistakes from previous models, but not the new one you just trained. That means your brand spankin' new algorithm may look like an ugly duckling initially, but turn into a beautiful swan when you allow it to decide the results that users actually see. You can partially counteract ugly duckling effects by running the new model on a small percentage of traffic, and then ramping that percentage up over time.

Make algorithms robust to noisy features. Techniques like L1 regularization let your machine learning algorithm prune features that contribute relatively little to prediction accuracy. This lets you nicely separate your team: some people focus on the algorithm, and others write features that feed into it. The people inventing the features can just "throw them into the pot" and let the algorithm figure out what's good and bad. (Similarly, you should try to pick an algorithm with reasonable convergence guarantees.)

You can partially de-couple the infrastructure and algorithm teams. Fundamentally, most machine learning algorithms accumulate a distributed hash table of statistics, and then combine those statistics into a score. Developing the distributed hash table is a different task than developing the algorithms that do the accumulation, and can be done by separate sub-teams. However, it's still important that those people work closely together. For example, some algorithms will oscillate if there's a delay between writes and those writes becoming available, which is done in many "eventual consistency" systems like Dynamo. So this is only a partial de-coupling and you still need people on the team to be the "bridge" between infrastructure and algorithms.

Choose online vs. batch carefully. An online system learns in real time, reacting to new user behavior minutes after it happens. But it comes at a huge cost: an online system takes 2-3x as long to develop and maintain, and it's much more sensitive to transient changes. For example, if you get a burst of spam, that noise will immediately be incorporated into your model and start degrading the user experience. Likewise if a machine runs out of memory, the network becomes disconnected, a particular machine is slow (introducing feature "skew"), one of the inputs into the model starts generating garbage, etc.

Version everything. A trained model depends on having stable identifiers; if you change an identifier, the model instantly goes stale. For example, let's say you include the user's language ("en-us") as a feature. Then somebody submits a one-line change to use underscores rather than hyphens ("en_us"). That effectively forces the model to instantly "forget" everything it learned about "en-us", making every language look the same. So version any changes to code/data that generates identifiers in your model.

You need to crunch data at 100x-1000x realtime. If your model is trained on a year of historical data, and your learning process is 10x realtime, then it will take over a month to test out a new feature. Write everything as a Map-Reduce or Storm topology so that you can scale when your data gets big enough.
machinelearning  advice  experience  engineering 
6 weeks ago
cassovary
JVM graph processing library from Twitter. Efficiently handle graphs with billions of nodes and edges.
data  graph  twitter  scala  to:twitter  to:linkedin 
6 weeks ago
Probabilistic Soft Logic
Another declarative language for combining first-order logic and probabilistic graphical models. From Lise Getoor's group at the University of Maryland.
research  programming  machinelearning  to:twitter  to:linkedin 
6 weeks ago
BIOLAP
Open source OLAP for biology data
biology  olap  database  to:twitter  to:linkedin 
7 weeks ago
Cognition, Computers, and Car Bombs: How Yale Prepared Me for the 90’s | Wendy G. Lehnert 1994
In Beliefs, Reasoning, and Decision Making: Psycho-logic in Honor of Bob Abelson (eds: Schank & Langer), Lawrence Erlbaum Associates, Hillsdale, NJ. pp. 143-173.

Alan Perlis: With computers, everything is possible and nothing is easy.

"Problem-driven researchers start with a problem and look for a technology that can handle the problem. Sometimes nothing works very well and a new technology has to be invented. Technology driven researchers start with a technology and look for a problem that the technology can handle. Sometimes nothing works very well and a new problem has to be invented. Both camps are equally dedicated and passionate about their principal alliance. Some of us fall in love with problems and some of us fall in love with technologies. Does a chicken lay eggs to get more chickens or do eggs make chickens to get more eggs?"

"When it comes to AI, systems are somehow expected to amaze us by doing something smart that was never anticipated by the programmers. Other areas of computer science do not generally look for this element of surprise."

"The difference is that research within an R&D framework is always directed toward some hopeful application or product. Basic research, on the other hand, is conducted only to expand the boundaries of human knowledge. Basic research produces knowledge for the sake of knowledge. R&D research produces knowledge from which we expect to derive some concrete benefits."

"The graduate students who implemented the UMass/MUC-3 system had no desire to ever build anything like it again. Their labor was time-consuming and tedious. They established the viability of the UMass approach relative to other approaches, but with a human labor factor that threw into question the practicality of the technology."
papers  research  nlp 
8 weeks ago
Simple Made Easy
Rich Hickey on the differences between "simple" and "easy" in software design.
programming  design  to:linkedin  to:twitter 
8 weeks ago
cc0 and git for data
Issues regarding licensing and versioning metadata and documents
data  git  to:linkedin  to:twitter 
8 weeks ago
The Free Universal Construction Kit
a matrix of nearly 80 adapter bricks that enable complete interoperability between ten popular children’s construction toys.
opensource  toys  to:twitter  to:facebook 
8 weeks ago
For High Tech Companies, Going Public Sucks
But it [the goal of IPO] has created a series of perverse incentives, in which investors’ interests conflict with—and usually trump—those of the companies they fund.
startup  funding  business  to:linkedin  to:twitter 
9 weeks ago
Ruff Wear Performance Dog Gear
Looks like @ruffwear dog beds are good for hiking.
dog  shopping  to:twitter  to:facebook 
9 weeks ago
How We Know by Freeman Dyson
This is news to me: "Thanks to the discoveries of astronomers in the twentieth century, we now know that the heat death is a myth. The heat death can never happen, and there is no paradox."
books  history  science  nybooks  to:facebook  to:twitter  to:linkedin 
9 weeks ago
Lessons from a 40 year old
Matt Haughey's Webstock '12 talk. On startups, funding, work-life balance, and taking the long-term view.
advice  business  startup  talk  video  to:linkedin  to:twitter  funding 
9 weeks ago
GNU Parallel
build and execute command lines from standard input in parallel on local or remote machines.
gnu  unix  tool  to:linkedin  to:twitter 
9 weeks ago
Crossfilter
Crossfilter is a JavaScript library for filtering large multivariate datasets in the browser. Crossfilter supports extremely fast (<30ms) interaction with coordinated views, even with datasets containing a million or more records.
javascript  visualization  data 
9 weeks ago
The Way We Read Now
Dwight Garner explores "are some reading materials better suited to one platform than another?"
books  reading  nytimes  to:twitter 
9 weeks ago
Video! The search quality meeting, uncut (annotated) - Inside Search
A look inside Google's Search Quality Meetings. Talking about spelling correction of long queries.
google  video  nlp  spellingcorrection 
10 weeks ago
zohmg
Zohmg is a data store for aggregation of multi-dimensional time series data, built on top of Hadoop, Dumbo and HBase.
database  olap  hadoop  lastfm 
11 weeks ago
Mondrain
Open source analysis OLAP server written in Java. Enabling interactive analysis of very large datasets stored in SQL databases without writing SQL.
olap  java 
11 weeks ago
What A CEO Does
A CEO does only three things. Sets the overall vision and strategy of the company and communicates it to all stakeholders. Recruits, hires, and retains the very best talent for the company. Makes sure there is always enough cash in the bank.
business  entrepreneurship  management 
11 weeks ago
coal mining of the information age
extracting, transforming, and loading (ETL): dirty, exhausting, absolutely necessary
etl  data 
11 weeks ago
FT’s understatement on Newssift.com
Pete Bell (Co-founder Endeca) analyzing the end of Newssift.com
newssift  business 
11 weeks ago
How similar are faceted search and OLAP? See CIO Mag: 20 to Watch in 2010
Adam Ferrari (CTO Endeca) on relationship between faceted search and OLAP
olap  facet  search  endeca 
11 weeks ago
Data Visualization with ElasticSearch and Protovis
As it happens, we can use facets as a pretty powerful analytical engine for our data, without writing any OLAP implementations.
search  olap  visualization  elasticsearch  facet 
11 weeks ago
Introducing Druid: Real-Time Analytics at a Billion Rows Per Second
Over the last twelve months, we tried and failed to achieve scale and speed with relational databases (Greenplum, InfoBright, MySQL) and NoSQL offerings (HBase). So instead we did something crazy: we rolled our own database. Druid is the distributed, in-memory OLAP data store that resulted.
olap  database 
11 weeks ago
csvkit
csvkit is a suite of utilities for converting to and working with CSV, the king of tabular file formats.
csv  python  to:twitter 
11 weeks ago
Basic Oral Language Documentation
Preserving the world's languages using smartphones and wireless technology in remote areas.
linguistics  language  to:twitter 
11 weeks ago
Management Debt // ben's blog
Every really good, really experienced CEO that I know shares one important characteristic: they tend to opt for the hard answer to organizational issues. If faced with giving everyone the same bonus to make things easy or sharply rewarding performance and ruffling many feathers, they’ll ruffle the feathers.
12 weeks ago
« earlier      
See this user's network
*todo advertising aex ai airportexpress algorithms amazon amplifier analysis api apple appletv arcadefire architecture art arts audio backup bag banking bbc biology blog books boston brain business camera cc chambana climate clock clothing cms code coffee colour comedy comment community company consumer cooking copyright cs culture cycling data database debug delicious design dev development discussion DIY documentary dog drm economics economist edinburgh education election emacs encryption engineering environment essay evolution exercise extension film finance firefox fitbit flickr food forums framework frightenedrabbit future gallery games git github globalwarming gmail google gps graph graphics greatsmokymountains gtd guardian guide hacks hardware headphones health hiking history hosting howto html ide illinois ilp internet interview iphone ipod itunes java javascript jobs journalism kindle language lastfm latex law library life lightroom linguistics linux london lrb mac machinelearning magazine mail management map math media medicine microsoft mobile money monitor movies mp3 music nationalpark news newyorker nikon nlp nybooks nytimes observer olap olympics openlibrary opensource optimization osx papers parser pasta people perl phone photo photographer photography photos php plugin podcast politics pork privacy productivity profiles programming prolog psychology publichealth python radio recipe reference relationextraction research restaurant reviews running scanning science scm scotland search security sentiment shop shopping socialnetworks software soup space sport startup statistics storage style svn swimming talks technology text thrift time to:facebook to:linkedin to:twitter tool tourdefrance tracking training travel tutorial tv twitter uk unix usa useful vegetarian via:blech video visualization war water web wiki wikipedia windows wordpress workflow writing xml yahoo

Copy this bookmark:



description:


tags: