The Life of a South Central Statistic | The New Yorker
What sets the course of a life? Three years before my beloved cousin’s murder—before the weeping, before the raging, before the heated self-recriminations and icy reckonings—I awoke with the most glorious sense of anticipation I’ve ever felt. It was June 29, 2006, the day that Michael was going to be freed. Outside my vacation condo in Hollywood, I climbed into the old white BMW I’d bought from my mother and headed to my aunt’s small stucco home, in South Central. On the corner, a fortified drug house stood like a sentry, but her pale cottage seemed serene, aglow in the morning sun. Poverty never looks quite as bad in the City of Angels as it does elsewhere.
crime  judicial-system 
4 days ago
Worldbuilding - Atomic Rockets

There is a grand tradition of scientifically minded science fiction authors creating not just the characters in their novels but also the brass tacks scientific details of the planets they reside on. This is the art and science of Worldbuilding.
10 days ago
She Just Won 3 Gold Medals for Her Swimming. She’s Only 73. - The New York Times
“Our bodies are made for being used,” she said. “Physical fitness and activity improves brain function. Anyone who is keeping up physical activity — both the aerobic part, which is really important, and the strength and balance and flexibility — is reducing the risks and buffering the decline that is going on.”

For Mr. Cheek, the nation’s fastest 100-meter sprinter in his age group, there is “a pride and a mental discipline that carries over into your whole lifestyle,” he said. Consistent exercise, said Mr. Cheek, who is a part-time professor of social psychology at California State University, Fresno, allows you to have “a body that can perform for you any time you want.”
10 days ago
zeeshanu/learn-regex: Learn regex the easy way
A regular expression is a pattern that is matched against a subject string from left to right. The word "Regular expression" is a mouthful, you will usually find the term abbreviated as "regex" or "regexp". Regular expression is used for replacing a text within a string, validating form, extract a substring from a string based upon a pattern match, and so much more.
regex  tutorial 
12 days ago
We Trained A Computer To Search For Hidden Spy Planes. This Is What It Found.
From planes tracking drug traffickers to those testing new spying technology, US airspace is buzzing with surveillance aircraft operated for law enforcement and the military.
compciv  machine-learning  data-journalism 
15 days ago
Economic diversity and student outcomes at Stanford University
The median family income of a student from Stanford is $167,500, and 66% come from the top 20 percent. About 2.2% of students at Stanford came from a poor family but became a rich adult.
19 days ago
Troy Hunt: Passwords Evolved: Authentication Guidance for the Modern Era

In the beginning, things were simple: you had two strings (a username and a password) and if someone knew both of them, they could log in. Easy.
But the ecosystem in which they were used was simple too, for example in MIT's Time-Sharing Computer, considered to be the first computer system to use passwords:
security  password 
27 days ago
What happened to Trump's war on data?
Straightforward as “data collection” may sound, in practice there’s often a strong political component to government data. What information to collect, about whom and how it’s collected are critical questions that don’t always have one objective answer. “Data is inherently political,” said Wonderlich. “And how it’s used depends on who’s collecting it and what they’re representing about the world.”

Sometimes, those questions fall on Congress, such as when lawmakers created the unemployment rate—as unbiased a statistic as exists today—in the 1930’s. According to a history of the U.S. Census, the unemployment rate was the subject of a fierce political fight upon its creation, including how often to collect data on unemployment and who would be counted as unemployed. President Herbert Hoover and his allies thought that the crude unemployment figures, which came from limited Bureau of Labor Statistics surveys and business reports at the time, were adequate measures during the Great Depression. Democrats and many labor economists disagreed and called for additional surveys to determine the true extent of unemployment—and the federal response necessary to alleviate it.
data  padjo 
28 days ago
Startup Engineers and Our Mistakes with MongoDB
MongoDB got rave reviews for its usability. But other features mattered too when choosing a database for a growing startup.
mongodb  databases 
4 weeks ago
Improving the Realism of Synthetic Images - Apple Machine Learning Journal
Most successful examples of neural nets today are trained with supervision. However, to achieve high accuracy, the training sets need to be large, diverse, and accurately annotated, which is costly. An alternative to labelling huge amounts of data is to use synthetic images from a simulator. This is cheap as there is no labeling cost, but the synthetic images may not be realistic enough, resulting in poor generalization on real test images. To help close this performance gap, we’ve developed a method for refining synthetic images to make them look more realistic. We show that training models on these refined images leads to significant improvements in accuracy on various machine learning tasks.
4 weeks ago
The limitations of deep learning
The most surprising thing about deep learning is how simple it is. Ten years ago, no one expected that we would achieve such amazing results on machine perception problems by using simple parametric models trained with gradient descent. Now, it turns out that all you need is sufficiently large parametric models trained with gradient descent on sufficiently many examples. As Feynman once said about the universe, "It's not complicated, it's just a lot of it".

In deep learning, everything is a vector, i.e. everything is a point in a geometric space. Model inputs (it could be text, images, etc) and targets are first "vectorized", i.e. turned into some initial input vector space and target vector space. Each layer in a deep learning model operates one simple geometric transformation on the data that goes through it. Together, the chain of layers of the model forms one very complex geometric transformation, broken down into a series of simple ones. This complex transformation attempts to maps the input space to the target space, one point at a time. This transformation is parametrized by the weights of the layers, which are iteratively updated based on how well the model is currently performing. A key characteristic of this geometric transformation is that it must be differentiable, which is required in order for us to be able to learn its parameters via gradient descent. Intuitively, this means that the geometric morphing from inputs to outputs must be smooth and continuous—a significant constraint.

The whole process of applying this complex geometric transformation to the input data can be visualized in 3D by imagining a person trying to uncrumple a paper ball: the crumpled paper ball is the manifold of the input data that the model starts with. Each movement operated by the person on the paper ball is similar to a simple geometric transformation operated by one layer. The full uncrumpling gesture sequence is the complex transformation of the entire model. Deep learning models are mathematical machines for uncrumpling complicated manifolds of high-dimensional data.
deep-learning  python 
5 weeks ago
Automatically generate beautiful visualizations from your data
And other bad ideas

I work in data visualization, a loosely defined and rapidly evolving field that is generally about taking data and turning it into something people can understand. There are a lot of different tools and disciplines involved in doing so, and it is impacting almost every field of study, industry and business. I believe it is essentially a new medium for communication, and we are still very much in the formative stages of its development.
5 weeks ago
Cafe Cracks: Attacks on Unsecured Wireless Networks
Mobile users demand high connectivity in today's world, often at the price of security. Requiring Internet access at the airport, public buildings, and restaurants, users will easily sacrifice a secure connection for a fast and reliable one. By broadcasting rogue access points at these compromising locations, crackers can launch effective Man-in-the-Middle attacks. Our developed crack, Cafe Crack, provides a platform built from open source software for deploying rogue access points and sophisticated Man-in-the-Middle attacks. Built around the Untangle Server software, Cafe Crack allows the hacker to dynamically measure, monitor and redirect network traffic. This paper will provide an example of DNS spoofing using the Cafe Crack platform and then provide simple and effective protection techniques against harmful rogue AP attacks.
5 weeks ago
Tracking Campaign Cash in Colorado - Columbia Journalism Review
It was a nightmare. You have to print out every single (independent expenditure) committee filing, and go through it by hand. You might have a committee that spent $800,000, and a lot of the money is in $200 increments spread over 20 races, and you have to add those with your little hand held calculator. There are 24 races in total across the state, so I had to add up for each committee what they spent on each race and then add all that up on both sides. It was a really time-consuming and tedious job.

We’re downloading the data onto our website, and one reason we’re doing that is because of the issues some people have had accessing the Secretary of State’s website—we think it’s a public service. We have the 2010 contributions up there, but hopefully in the next two months we’ll have up the 2010 expenditures, then the 2008 contributions and expenditures, and hopefully we’re going to be able to keep this going for the year.
campaign-finance  journalism 
5 weeks ago
18F Content Guide - Introduction
How to plan, write, and manage content at 18F.
6 weeks ago
In TV Ratings Game, Networks Try to Dissguys Bad Newz from Nielsen - WSJ
Walt Disney Co.’s ABC declined to comment. The network, though, groused last month when NBC News intentionally misspelled an entire week of “Nightly News” broadcasts. Altogether, NBC, which is ranked second behind ABC in ratings, has played the misspell card 14 times since the start of the 2016-17 television season last fall.

NBC News said it broke no rules. “As is standard industry practice, our broadcast is retitled when there are pre-emptions and inconsistencies or irregularities in the schedule, which can include holiday weekends and special sporting events,” a show spokesman said.
6 weeks ago
Welcome to the Hardware Forensic Database! — Hardware Forensic Database 1.0 documentation
he Hardware Forensic Database (or HFDB) is a project of CERT-UBIK aiming at providing a collaborative knowledge base related to IoT Forensic methodologies and tools.

This database provides multiple guides to collect valuable information from various smart/connected devices, as well as dedicated tools. These guides allows quick information extraction and provides all the required material to perform a forensic analysis on specific devices.
6 weeks ago
Sharing as the Future of Medicine | Shared Decision Making and Communication | JAMA Internal Medicine | The JAMA Network
f all the sciences, medicine has the most direct connection with human life. It encompasses everything from molecular mechanisms to the personal narratives, beliefs, and feelings of each individual. Sharing Medicine, starting with a series of 7 Viewpoints, tackles the challenge of this vast diversity head-on. The articles in this series summarize how physicians currently share knowledge, skills, and experiences among ourselves as a professional community and with the patients we serve; and in each article the authors suggest some ways in which we might do this better.
6 weeks ago
Running feature specs with Capybara and Chrome headless | Drivy Engineering
At Drivy, we’ve been using Capybara and PhantomJS to run our feature specs for years. Even with its issues, PhantomJS is a great way to interact with a browser without starting a graphical interface. Recently, Chrome added support for a headless flag so it could be started without any GUI. Following this announcement, the creator of PhantomJS even announced that he would be stepping down as a maintainer. .
phantomjs  headless-browser 
6 weeks ago
President Trump’s Lies, the Definitive List - The New York Times
Many Americans have become accustomed to President Trump’s lies. But as regular as they have become, the country should not allow itself to become numb to them. So we have catalogued nearly every outright lie he has told publicly since taking the oath of office.
data-visualization  listicle 
8 weeks ago
Preferential Rents in New York City - ProPublica Data Store
In 2003, lawmakers in New York State passed a law that in effect allowed landlords to bypass annual limits on rent increases for their rent-stabilized apartments. Owners could raise rents by more than the annual limits if they registered a high rent — often high above existing market rates -- but charged tenants a lower, “preferential” rent. Preferential rents are not regulated and can be raised up to the registered rate upon lease renewal. Today, more than 250,000 New York City apartments feature these rents.
8 weeks ago
The Platinum Patients - The Atlantic
Each year, 1 in every 20 Americans racks up just as much in medical bills as another 19 combined. This critical five percent of the U.S. population is key to solving the nation's health care spending crisis.
data-visualization  padjo 
8 weeks ago
Laurence Tratt: What Challenges and Trade-Offs do Optimising Compilers Face?
After immersing oneself in a research area for several years, it's easy to feel like the heir to a way of looking at the world that is criminally underrated: why don't more people want what I want? This can be a useful motivation, but it can also lead to ignoring the changing context in which most research sits. So, from time to time, I like to ask myself the question: “what if everything I believe about this area is wrong?” It’s a simple question, but hard to answer honestly. Mostly, of course, the only answer my ego will allow me is “I can’t see anything I could be wrong about”. Sometimes I spot a small problem that can be easily fixed. Much more rarely, I've been forced to acknowledge that there’s a deep flaw in my thinking, and that the direction I’m heading in is a dead-end. I’ve twice had to face this realisation, each time after a gradual accumulation of unpleasant evidence that I've tried to ignore. While abandoning several years work wasn’t easy, it felt a lot easier than trying to flog a horse I now knew to be dead.
8 weeks ago
Gene name errors are widespread in the scientific literature | Genome Biology | Full Text
The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.
8 weeks ago
Human error costs TransAlta $24-million on contract bids - The Globe and Mail
A slip of the hand in a computer spreadsheet for bidding on electricity transmission contracts in New York will cost TransAlta Corp. $24-million (U.S.), wiping out 10 per cent of the company's profit this year.

The error, although costly, was bizarrely simple. Someone preparing the electronic file of bids that TransAlta submitted at the end of April misaligned the rows of information in the spreadsheet. High bids intended for certain transmission paths were instead made for lower-demand routes -- meaning that TransAlta overpaid for transmission contracts, as well as buying more capacity than it intended in certain cases.

"It was literally a cut-and-paste error in an Excel spreadsheet that we did not detect when we did our final sorting and ranking of bids prior to submission," said Steve Snyder, TransAlta's president.
8 weeks ago
Excel Error by a Cleary Gottlieb Associate Alters Lehman Asset Deal
A first-year associate at Cleary Gottlieb Steen & Hamilton made an Excel reformatting error that mistakenly added 179 contracts to an agreement to buy Lehman Brothers assets, according to a motion filed with a New York bankruptcy court on Friday.

Cleary Gottlieb Steen & Hamilton represents Barclays Capital Inc. in its agreement to buy the assets. Its motion (PDF posted by Above the Law) says the law firm was working under a tight deadline when the mistake was made, Computerworld reports.

The motion seeks to exclude the 179 contracts from the purchase agreement.

Above the Law was the first to report the error. Its story said the mistake happened at around 11:30 p.m. on Sept. 18 when a second-year associate asked a first-year associate to reformat an Excel document of critical Lehman contracts to be assumed by Barclays.

The associate resized the rows when converting the spreadsheet into a PDF document, causing “hidden” contracts in the spreadsheet to be exposed. The spreadsheet, which had been e-mailed to the law firm by a Lehman representative, contained nearly 1,000 rows and more than 24,000 individual cells.
8 weeks ago
Researchers Finally Replicated Reinhart-Rogoff, and There Are Serious Problems. - Roosevelt Institute
In 2010, economists Carmen Reinhart and Kenneth Rogoff released a paper, “Growth in a Time of Debt.” Their “main result is that…median growth rates for countries with public debt over 90 percent of GDP are roughly one percent lower than otherwise; average (mean) growth rates are several percent lower.” Countries with debt-to-GDP ratios above 90 percent have a slightly negative average growth rate, in fact.
8 weeks ago
models/object_detection at master · tensorflow/models
Creating accurate machine learning models capable of localizing and identifying multiple objects in a single image remains a core challenge in computer vision. The TensorFlow Object Detection API is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models. At Google we’ve certainly found this codebase to be useful for our computer vision needs, and we hope that you will as well.

tensorflow  computer-vision 
9 weeks ago
Inside the Algorithm That Tries to Predict Gun Violence in Chicago - The New York Times

Gun violence in Chicago has surged since late 2015, and much of the news media attention on how the city plans to address this problem has focused on the Strategic Subject List, or S.S.L.

The list is made by an algorithm that tries to predict who is most likely to be involved in a shooting, either as perpetrator or victim. The algorithm is not public, but the city has now placed a version of the list — without names — online through its open data portal, making it possible for the first time to see how Chicago evaluates risk.

We analyzed that information and found that the assigned risk scores — and what characteristics go into them — are sometimes at odds with the Chicago Police Department’s public statements and cut against some common perceptions.
datasets  compciv  socrata 
10 weeks ago
Making the Internet Archive’s full text search faster.
This article describes how we made the full-text organic search faster at the Internet Archive — without scaling horizontally — allowing our users to search in just a few seconds across our collection of 35 million documents containing books, magazine, newspapers, scientific papers, patents and much more.

By organic search we mean the “method for entering one or a plurality of search items in a single data string into a search engine. Organic search results are listings on search engine results pages that appear because of their relevance to the search terms.” (1).

The relevance should be scored on the search query matches for every document. In other words, if the search query has a perfect match in some of our documents, we want to return these documents; otherwise, we want to return the documents containing a part of the query or one of its subset.
10 weeks ago
Comey hearing – Cable news coverage, chyrons - Washington Post
Coverage of former FBI director James B. Comey’s testimony looked about the same across cable news channels. A closeup of a senator forming a question, a wide show of the room — there’s just not much to show on TV. But the ALL CAPS text in the bar at screen bottom differentiates networks, exposing what they want viewers to take away from the hearing.
journalism  news  politics  best 
10 weeks ago
Crime and Punishment in Chicago
Crime and Punishment in Chicago is an index of data sources surrounding this criminal justice system as it is in Chicago. We track data sources from the commission of the crime all the way to prison. We aggregate sources of data, provide insight into how this data is generated, discuss how to get it, and expose what data is unavailable.
crime  data  reference  padjo  compciv 
10 weeks ago
Testing from the ground up
Tests are pieces of code that check if your main code works. I write tests to catch bugs when I refactor. I write tests to force myself to think through and handle edge cases. I write tests to show the users of my project that my code does what I say it does.
testing  READTODO 
10 weeks ago
Failure to warn: Hundreds died while taking an arthritis drug, but nobody alerted patients
In a review of millions of reports to the FDA involving more than 100 drugs approved since 2010, Actemra stood out. It showed that Actemra patients had experienced an unusually large number of serious side effects that didn’t appear on the drug’s warning label.

The initial review was performed for STAT by Advera Health Analytics in Santa Rosa, Calif., a company that collects and curates drug-related complaints to the FDA Adverse Events Reporting System, known as FAERS. The company then provided comparison data for all major rheumatoid arthritis drugs.

STAT’s analysis of that data, including more than 13,500 FAERS reports on Actemra, showed higher than expected numbers of several serious problems when compared to competing drugs. These included the blockbusters Humira and Remicade, which have many more users.
fda  padjo  drugs  investigations 
10 weeks ago
Comma Separated Vulnerabilities
This post introduces Formula Injection, a technique for exploiting ‘Export to Spreadsheet’ functionality in web applications to attack users and steal spreadsheet contents. It also details a command injection exploit for Apache OpenOffice and LibreOffice that can be delivered using this technique.
security  programming  excel 
11 weeks ago
« earlier      
a-b-testing academic advice ai algorithms amazon analysis analytics angularjs animation api apis apple apps architecture art article aws backbone bash bayesian best big-data bioinformatics book bots business c caching campaign-finance census cheatsheet cli clinicaltrials clojure code colors command-line compciv compilers computer computer-science computer-vision computing course crime crypto css d3 data data-analysis data-journalism data-mining data-munging data-science data-sharing data-visualization database databases datajournalism datasets ddj death-data debugging deep-learning deployment design design-example devops digital-humanities django drugs education elections email engineering essay excel facebook fakenews finance flux foia framework funny game game-dev games gaming git github golang google government graphics guide hack hacking hadoop hardware hash haskell health history howto html html5 http image-processing infographic interactive interesting internet introduction investigations ios java javascript journalism jquery json judicial-system language learning linux lisp mac machine-learning map-reduce mapping maps marketing math medicine mobile mongodb music mysql netsec network neural-networks news nlp nodejs nosql nyc nylist object-oriented ocr oop open-data opencv optimization osx padjo pandas papers patterns performance photography police politics postgres prisons privacy programming publicrecords punctuation python r rails react reactjs reference regex research ruby rust scalability science scraping search security semitechnical seo server server-ops shell spam spreadsheets sql standards startups statistics style-guide syllabus tdd teaching tensorflow testing text text-mining tools transparency tutorial twitter typography ui unicode unix ux video vim visualizations web web-design web-development web-scraping writing wtfviz

Copy this bookmark: