Nicolas Jaar - Essential Mix (05-19-2012)
yesterday
Listen; get work done.
music
to:twitter
to:facebook
to:linkedin
yesterday
Baselines and Bigrams: Simple, Good Sentiment and Text Classification | Wang and Manning | ACL 2012
3 days ago
A nice short paper reinforcing that we should do the simple things first.
research
papers
nlp
machinelearning
sentiment
acl
to:twitter
to:linkedin
3 days ago
Patagonia's Founder is America's Most Unlikely Business Guru
8 days ago
Profile of Yvon Chouinard
business
environment
8 days ago
Raptor Codes | Amin Shokrollahi | Information Theory 2006
16 days ago
Fountain codes with linear time encoding and decoding.
research
ieee
informationtheory
papers
erasure
to:twitter
to:linkedin
16 days ago
David MacKay: How the Laws of Physics Constrain Our Sustainable Energy Options
16 days ago
Adding Sustainable Energy: Without the Hot Air to the reading list. From TEDxWarwick.
video
talks
energy
poltics
to:twitter
to:linkedin
to:facebook
16 days ago
"We Have Met The Enemy... And He Is Us"
18 days ago
Lessons from Twenty Years of the Kauffman Foundation's Investments in Venture Capital Funds and The Triumph of Hope over Experience.
investing
funding
vc
18 days ago
Airport express not connecting to iTunes
18 days ago
I haven't been able to stream from iTunes to my Airport Express 802.11g. Disabling IPv6 fixed the issue.
airplay
airportexpress
itunes
osx
18 days ago
The Taste of War
18 days ago
Lizzie Collingham soberly argues that the expansionist designs of both Nazi Germany and imperial Japan must be understood within a world political economy in which the single crucial commodity was food.
nytimes
reviews
books
to:twitter
18 days ago
John Langford on Microsoft Research, New York City
22 days ago
Many from Yahoo! Research are becoming MSR NYC.
microsoft
yahoo
research
nyc
to:twitter
22 days ago
Bring back the 40-hour work week
salon
jobs
work
productivity
management
23 days ago
This is what work looks like now. It’s been this way for so long that most American workers don’t realize that for most of the 20th century, the broad consensus among American business leaders was that working people more than 40 hours a week was stupid, wasteful, dangerous and expensive — and the most telling sign of dangerously incompetent management to boot.
The most essential thing to know about the 40-hour work-week is that, while it was the unions that pushed it, business leaders ultimately went along with it because their own data convinced them this was a solid, hard-nosed business decision.
In fact, research shows that knowledge workers actually have fewer good hours in a day than manual laborers do — on average, about six hours, as opposed to eight.
And finally: these death marches take a longer-term productivity toll as well. Once the crisis has passed and that 60-hour-a-week team gets to go back to its regular 40, it can take several more weeks before the burnout begins to lift enough for them to resume their typical productivity level. So, for a while, you’ll get significantly less than a full 40 out of them.
23 days ago
The Jig Is Up: Time to Get Past Facebook and Invent a New Future
24 days ago
Alexis Madrigal raises some great points about the current state of technology innovation, what we build, and the advertising model.
innovation
technology
startup
business
to:twitter
to:facebook
to:linkedin
theatlantic
24 days ago
Does Pleiades have an API?
24 days ago
Sean Gillies has thoughts on API and exporting data.
api
data
engineering
development
to:twitter
to:linkedin
24 days ago
Publishing Open Data – Do you really need an API?
24 days ago
Peter Krantz suggets using file dumps over APIs in some situations.
data
api
development
engineering
to:twitter
to:linkedin
24 days ago
Dealing With ICD-10
24 days ago
How NLP is helping the move to ICD-10.
health
medicine
coding
nlp
to:linkedin
to:twitter
24 days ago
How one man escaped from a North Korean prison camp
24 days ago
Edited extract from Escape From Camp 14, by Blaine Harden.
guardian
northkorea
to:twitter
books
24 days ago
Innovation Starvation by Neal Stephenson
25 days ago
"Still, I worry that our inability to match the achievements of the 1960s space program might be symptomatic of a general failure of our society to get big things done."
innovation
technology
progress
science
research
sciencefiction
to:twitter
to:linkedin
to:facebook
25 days ago
The Cabal: Valve’s Design Process For Creating Half-Life
25 days ago
The Cabal exists because the mythical person you need does not.
After reaching the end of the original Half-Life schedule Valve started again:
On looking for the game designer:
The components of the Cabal:
A tip:
design
games
to:twitter
to:linkedin
management
development
After reaching the end of the original Half-Life schedule Valve started again:
We set up a small group of people to take every silly idea, every cool trick, everything interesting that existed in any kind of working state somewhere in the game and put them into a single prototype level. ... They all worked together on this one small level for a month while the rest of us basically did nothing. When they were done, we all played it. It was great. It was Die Hard meets Evil Dead. It was the vision. It was going to be our game.
On looking for the game designer:
We looked at hundreds of resumes and interviewed a lot of promising applicants, but no one we looked at had enough of the qualities we wanted for us to seriously consider them the overall godlike “game designer” that we were told we needed. In the end, we came to the conclusion that this ideal person didn’t actually exist. Instead, we would create our own ideal by combining the strengths of a cross section of the company, putting them together in a group we called the “Cabal.”
The components of the Cabal:
The Cabal consisted only of people that had actual shipping components in the game; there were no dedicated designers. Every member of the Cabal was someone with the responsibility of actually doing the work that their design specified, or at least had the ability to do it if need be. ... Internally, once the success of the Cabal process was obvious, mini-Cabals were formed to come up with answers to a variety of design problems. These mini-Cabals would typically include people most effected by the decision, as well as try to include people completely outside the problem being addressed in order to keep a fresh perspective on things.
A tip:
Write down everything. Brainstorming is fine during the meetings, but unless it’s all written down, your best ideas will be forgotten within days. The goal is to end up with a document that captures as much as is reasonable about your game, and more importantly answers questions about what people need to work on.
25 days ago
As We May Think - Dr. Vannevar Bush (1945)
25 days ago
From Atlantic Magazine July 1945. "The world has arrived at an age of cheap complex devices of great reliability; and something is bound to come of it."
theatlantic
science
progress
history
technology
research
to:twitter
to:linkedin
25 days ago
Anatomy Of An Idea by Steven Johnson
28 days ago
"The discovery process is remarkably social, and the social interactions come in amazingly diverse forms."
ideas
information
to:twitter
sharing
to:linkedin
to:facebook
28 days ago
Shark
4 weeks ago
Lighting Fast Data Warehouse System. Shark is a large-scale data warehouse system for Spark designed to be compatible with Apache Hive. It can answer Hive QL queries up to 30 times faster than Hive without modification to the existing data nor queries. Shark supports Hive's query language, metastore, serialization formats, and user-defined functions.
database
data
olap
hadoop
hive
spark
to:twitter
to:linkedin
4 weeks ago
Jets’ Trevor Pryce Is Retired, and Getting Tired of It
4 weeks ago
With millions of Americans out of work or doing work for which they are overqualified, I consider myself lucky. But starting from scratch can be unsettling. If you’re not prepared for it, retirement can become a form of self-imposed exile from the fulfillment and the exhilaration of knowing you did a good job.
Many people retire around 65. I will turn 37 this summer, yet like all former N.F.L. players, I face greater health risks, both physical and psychological, that compound my fears.
nfl
nytimes
work
Many people retire around 65. I will turn 37 this summer, yet like all former N.F.L. players, I face greater health risks, both physical and psychological, that compound my fears.
4 weeks ago
Clustering Related Stories
5 weeks ago
Prismatic's @jrfinkel on feature space and speeding things up
clustering
applications
nlp
machinelearning
to:twitter
to:linkedin
5 weeks ago
Valve: How I Got Here, What It’s Like, and What I’m Doing
5 weeks ago
Culture, trust, informal consensus at Value. And a throw away quote about NLP.
business
management
games
to:twitter
to:linkedin
5 weeks ago
Dyson Dog grooming
5 weeks ago
Pretty awesome, watch the videos. Say goodbye to tranquil grooming though.
dog
to:facebook
to:twitter
5 weeks ago
What are the keys to operationalizing a machine learning ranking system from an organization and/or engineering management point of view? - Quora
machinelearning
advice
experience
engineering
6 weeks ago
Brandon Ballinger, working on a machine learning startup
I worked on speech recognition and ads at Google, both of which have machine learning at their core. Here are some important lessons I learned about how to operationalize a machine learning system "in the wild":
Make your success metric user happiness. Traditional accuracy measures like precision, square error, ROC, etc. don't capture what you really care about--how users react to your model. For example, if you're running an ad system, your metrics should be revenue per pageview and click through rate. It's completely possible to have a model with a reduced error rate which lowers revenue, due to Simpson's paradox and a host of other reasons.
A drop in core metrics should generate a page. You should treat a model whose metrics nosedive the same as a sever that goes down or a database with bad data. Whoever is oncall should try to diagnose the problem quickly, pulling in others if necessary to fix the problem.
A/B test every model launch. You should always be able to run two models in parallel, showing different models to different users and comparing user behavior. That's the only way to really know you're doing the right thing.
Beware of ugly duckling effects. Machine learning systems tend to become more accurate when they can learn from their own mistakes. That means new models are at an inherent disadvantage: your historical training data contains mistakes from previous models, but not the new one you just trained. That means your brand spankin' new algorithm may look like an ugly duckling initially, but turn into a beautiful swan when you allow it to decide the results that users actually see. You can partially counteract ugly duckling effects by running the new model on a small percentage of traffic, and then ramping that percentage up over time.
Make algorithms robust to noisy features. Techniques like L1 regularization let your machine learning algorithm prune features that contribute relatively little to prediction accuracy. This lets you nicely separate your team: some people focus on the algorithm, and others write features that feed into it. The people inventing the features can just "throw them into the pot" and let the algorithm figure out what's good and bad. (Similarly, you should try to pick an algorithm with reasonable convergence guarantees.)
You can partially de-couple the infrastructure and algorithm teams. Fundamentally, most machine learning algorithms accumulate a distributed hash table of statistics, and then combine those statistics into a score. Developing the distributed hash table is a different task than developing the algorithms that do the accumulation, and can be done by separate sub-teams. However, it's still important that those people work closely together. For example, some algorithms will oscillate if there's a delay between writes and those writes becoming available, which is done in many "eventual consistency" systems like Dynamo. So this is only a partial de-coupling and you still need people on the team to be the "bridge" between infrastructure and algorithms.
Choose online vs. batch carefully. An online system learns in real time, reacting to new user behavior minutes after it happens. But it comes at a huge cost: an online system takes 2-3x as long to develop and maintain, and it's much more sensitive to transient changes. For example, if you get a burst of spam, that noise will immediately be incorporated into your model and start degrading the user experience. Likewise if a machine runs out of memory, the network becomes disconnected, a particular machine is slow (introducing feature "skew"), one of the inputs into the model starts generating garbage, etc.
Version everything. A trained model depends on having stable identifiers; if you change an identifier, the model instantly goes stale. For example, let's say you include the user's language ("en-us") as a feature. Then somebody submits a one-line change to use underscores rather than hyphens ("en_us"). That effectively forces the model to instantly "forget" everything it learned about "en-us", making every language look the same. So version any changes to code/data that generates identifiers in your model.
You need to crunch data at 100x-1000x realtime. If your model is trained on a year of historical data, and your learning process is 10x realtime, then it will take over a month to test out a new feature. Write everything as a Map-Reduce or Storm topology so that you can scale when your data gets big enough.
6 weeks ago
cassovary
6 weeks ago
JVM graph processing library from Twitter. Efficiently handle graphs with billions of nodes and edges.
data
graph
twitter
scala
to:twitter
to:linkedin
6 weeks ago
Probabilistic Soft Logic
6 weeks ago
Another declarative language for combining first-order logic and probabilistic graphical models. From Lise Getoor's group at the University of Maryland.
research
programming
machinelearning
to:twitter
to:linkedin
6 weeks ago
Data-Intensive Text Processing with MapReduce
6 weeks ago
Jimmy Lin and Chris Dyer's book now on github.
books
research
nlp
data
to:twitter
to:linkedin
6 weeks ago
A Universal Part-of-Speech Tagset | Petrov et al. | LREC 2012
7 weeks ago
12 universal pos tags and a mapping from 25 treebanks.
nlp
papers
research
to:twitter
to:linkedin
7 weeks ago
SimCity Insider's Look GlassBox Game Engine
7 weeks ago
Interesting look at the game dynamics.
youtube
video
games
simcity
to:linkedin
to:twitter
to:facebook
7 weeks ago
Cognition, Computers, and Car Bombs: How Yale Prepared Me for the 90’s | Wendy G. Lehnert 1994
8 weeks ago
In Beliefs, Reasoning, and Decision Making: Psycho-logic in Honor of Bob Abelson (eds: Schank & Langer), Lawrence Erlbaum Associates, Hillsdale, NJ. pp. 143-173.
Alan Perlis: With computers, everything is possible and nothing is easy.
"Problem-driven researchers start with a problem and look for a technology that can handle the problem. Sometimes nothing works very well and a new technology has to be invented. Technology driven researchers start with a technology and look for a problem that the technology can handle. Sometimes nothing works very well and a new problem has to be invented. Both camps are equally dedicated and passionate about their principal alliance. Some of us fall in love with problems and some of us fall in love with technologies. Does a chicken lay eggs to get more chickens or do eggs make chickens to get more eggs?"
"When it comes to AI, systems are somehow expected to amaze us by doing something smart that was never anticipated by the programmers. Other areas of computer science do not generally look for this element of surprise."
"The difference is that research within an R&D framework is always directed toward some hopeful application or product. Basic research, on the other hand, is conducted only to expand the boundaries of human knowledge. Basic research produces knowledge for the sake of knowledge. R&D research produces knowledge from which we expect to derive some concrete benefits."
"The graduate students who implemented the UMass/MUC-3 system had no desire to ever build anything like it again. Their labor was time-consuming and tedious. They established the viability of the UMass approach relative to other approaches, but with a human labor factor that threw into question the practicality of the technology."
papers
research
nlp
Alan Perlis: With computers, everything is possible and nothing is easy.
"Problem-driven researchers start with a problem and look for a technology that can handle the problem. Sometimes nothing works very well and a new technology has to be invented. Technology driven researchers start with a technology and look for a problem that the technology can handle. Sometimes nothing works very well and a new problem has to be invented. Both camps are equally dedicated and passionate about their principal alliance. Some of us fall in love with problems and some of us fall in love with technologies. Does a chicken lay eggs to get more chickens or do eggs make chickens to get more eggs?"
"When it comes to AI, systems are somehow expected to amaze us by doing something smart that was never anticipated by the programmers. Other areas of computer science do not generally look for this element of surprise."
"The difference is that research within an R&D framework is always directed toward some hopeful application or product. Basic research, on the other hand, is conducted only to expand the boundaries of human knowledge. Basic research produces knowledge for the sake of knowledge. R&D research produces knowledge from which we expect to derive some concrete benefits."
"The graduate students who implemented the UMass/MUC-3 system had no desire to ever build anything like it again. Their labor was time-consuming and tedious. They established the viability of the UMass approach relative to other approaches, but with a human labor factor that threw into question the practicality of the technology."
8 weeks ago
Simple Made Easy
8 weeks ago
Rich Hickey on the differences between "simple" and "easy" in software design.
programming
design
to:linkedin
to:twitter
8 weeks ago
cc0 and git for data
8 weeks ago
Issues regarding licensing and versioning metadata and documents
data
git
to:linkedin
to:twitter
8 weeks ago
The Free Universal Construction Kit
8 weeks ago
a matrix of nearly 80 adapter bricks that enable complete interoperability between ten popular children’s construction toys.
opensource
toys
to:twitter
to:facebook
8 weeks ago
For High Tech Companies, Going Public Sucks
9 weeks ago
But it [the goal of IPO] has created a series of perverse incentives, in which investors’ interests conflict with—and usually trump—those of the companies they fund.
startup
funding
business
to:linkedin
to:twitter
9 weeks ago
Ruff Wear Performance Dog Gear
9 weeks ago
Looks like @ruffwear dog beds are good for hiking.
dog
shopping
to:twitter
to:facebook
9 weeks ago
How We Know by Freeman Dyson
9 weeks ago
This is news to me: "Thanks to the discoveries of astronomers in the twentieth century, we now know that the heat death is a myth. The heat death can never happen, and there is no paradox."
books
history
science
nybooks
to:facebook
to:twitter
to:linkedin
9 weeks ago
Lessons from a 40 year old
9 weeks ago
Matt Haughey's Webstock '12 talk. On startups, funding, work-life balance, and taking the long-term view.
advice
business
startup
talk
video
to:linkedin
to:twitter
funding
9 weeks ago
GNU Parallel
9 weeks ago
build and execute command lines from standard input in parallel on local or remote machines.
gnu
unix
tool
to:linkedin
to:twitter
9 weeks ago
Crossfilter
9 weeks ago
Crossfilter is a JavaScript library for filtering large multivariate datasets in the browser. Crossfilter supports extremely fast (<30ms) interaction with coordinated views, even with datasets containing a million or more records.
javascript
visualization
data
9 weeks ago
The Way We Read Now
9 weeks ago
Dwight Garner explores "are some reading materials better suited to one platform than another?"
books
reading
nytimes
to:twitter
9 weeks ago
Video! The search quality meeting, uncut (annotated) - Inside Search
10 weeks ago
A look inside Google's Search Quality Meetings. Talking about spelling correction of long queries.
google
video
nlp
spellingcorrection
10 weeks ago
Mondrain
11 weeks ago
Open source analysis OLAP server written in Java. Enabling interactive analysis of very large datasets stored in SQL databases without writing SQL.
olap
java
11 weeks ago
What A CEO Does
11 weeks ago
A CEO does only three things. Sets the overall vision and strategy of the company and communicates it to all stakeholders. Recruits, hires, and retains the very best talent for the company. Makes sure there is always enough cash in the bank.
business
entrepreneurship
management
11 weeks ago
coal mining of the information age
11 weeks ago
extracting, transforming, and loading (ETL): dirty, exhausting, absolutely necessary
etl
data
11 weeks ago
FT’s understatement on Newssift.com
11 weeks ago
Pete Bell (Co-founder Endeca) analyzing the end of Newssift.com
newssift
business
11 weeks ago
How similar are faceted search and OLAP? See CIO Mag: 20 to Watch in 2010
11 weeks ago
Adam Ferrari (CTO Endeca) on relationship between faceted search and OLAP
olap
facet
search
endeca
11 weeks ago
Data Visualization with ElasticSearch and Protovis
11 weeks ago
As it happens, we can use facets as a pretty powerful analytical engine for our data, without writing any OLAP implementations.
search
olap
visualization
elasticsearch
facet
11 weeks ago
Introducing Druid: Real-Time Analytics at a Billion Rows Per Second
11 weeks ago
Over the last twelve months, we tried and failed to achieve scale and speed with relational databases (Greenplum, InfoBright, MySQL) and NoSQL offerings (HBase). So instead we did something crazy: we rolled our own database. Druid is the distributed, in-memory OLAP data store that resulted.
olap
database
11 weeks ago
csvkit
11 weeks ago
csvkit is a suite of utilities for converting to and working with CSV, the king of tabular file formats.
csv
python
to:twitter
11 weeks ago
Basic Oral Language Documentation
11 weeks ago
Preserving the world's languages using smartphones and wireless technology in remote areas.
linguistics
language
to:twitter
11 weeks ago
Management Debt // ben's blog
Every really good, really experienced CEO that I know shares one important characteristic: they tend to opt for the hard answer to organizational issues. If faced with giving everyone the same bonus to make things easy or sharply rewarding performance and ruffling many feathers, they’ll ruffle the feathers.
12 weeks ago
See this user's network
*todo
advertising
aex
ai
airportexpress
algorithms
amazon
amplifier
analysis
api
apple
appletv
arcadefire
architecture
art
arts
audio
backup
bag
banking
bbc
biology
blog
books
boston
brain
business
camera
cc
chambana
climate
clock
clothing
cms
code
coffee
colour
comedy
comment
community
company
consumer
cooking
copyright
cs
culture
cycling
data
database
debug
delicious
design
dev
development
discussion
DIY
documentary
dog
drm
economics
economist
edinburgh
education
election
emacs
encryption
engineering
environment
essay
evolution
exercise
extension
film
finance
firefox
fitbit
flickr
food
forums
framework
frightenedrabbit
future
gallery
games
git
github
globalwarming
gmail
google
gps
graph
graphics
greatsmokymountains
gtd
guardian
guide
hacks
hardware
headphones
health
hiking
history
hosting
howto
html
ide
illinois
ilp
internet
interview
iphone
ipod
itunes
java
javascript
jobs
journalism
kindle
language
lastfm
latex
law
library
life
lightroom
linguistics
linux
london
lrb
mac
machinelearning
magazine
mail
management
map
math
media
medicine
microsoft
mobile
money
monitor
movies
mp3
music
nationalpark
news
newyorker
nikon
nlp
nybooks
nytimes
observer
olap
olympics
openlibrary
opensource
optimization
osx
papers
parser
pasta
people
perl
phone
photo
photographer
photography
photos
php
plugin
podcast
politics
pork
privacy
productivity
profiles
programming
prolog
psychology
publichealth
python
radio
recipe
reference
relationextraction
research
restaurant
reviews
running
scanning
science
scm
scotland
search
security
sentiment
shop
shopping
socialnetworks
software
soup
space
sport
startup
statistics
storage
style
svn
swimming
talks
technology
text
thrift
time
to:facebook
to:linkedin
to:twitter
tool
tourdefrance
tracking
training
travel
tutorial
tv
twitter
uk
unix
usa
useful
vegetarian
via:blech
video
visualization
war
water
web
wiki
wikipedia
windows
wordpress
workflow
writing
xml
yahoo