California Regulators Require Auto Insurers to Adjust Rates
The state changed its approach in response to ProPublica’s finding that minority neighborhoods were paying higher premiums than white areas with the same risk.
compciv  algorithms  transparency  best 
3 days ago
A Brief History of Religion and the U.S. Census | Pew Research Center
The U.S. Census Bureau has not asked questions about religion since the 1950s, but the federal government did gather some information about religion for about a century before that. Starting in 1850, census takers began asking a few questions about religious organizations as part of the decennial census that collected demographic and social statistics from the general population as well as economic data from business establishments. Federal marshals and assistant marshals, who acted as census takers until after the Civil War, collected information from members of the clergy and other religious leaders on the number of houses of worship in the U.S. and their respective denominations, seating capacities and property values. Although the census takers did not interview individual worshipers or ask about the religious affiliations of the general population, they did ask members of the clergy to identify their denomination – such as Methodist, Roman Catholic or Old School Presbyterian. The 1850 census found that there were 18 principal denominations in the U.S.
census  data-analysis 
4 days ago
The Politics of Last Names - The Atlantic
Last names are deeply personal, a kind of shorthand for expressing family bonds. But they’re also profoundly political, reflecting the machinations of governments in the countries that family has passed through over time. The latest example comes courtesy of Afghanistan, where officials are conducting the first nationwide census in three and a half decades—and confronting a major obstacle: names in the country are malleable, and many Afghans use only one. The government’s solution is to urge its people to take on surnames. “The remote, tribal nature of Afghan villages may have had something to do with the lack of surnames,” The New York Times recently noted. “So perhaps did the historic weakness of national governments, which have tended to require fixed names in the interest of keeping track of people, to draft them or tax them.”
10 days ago
Unemployed lumber worker goes with his wife to the bean harvest. Note social security number tattooed on his arm, Oregon, 1939 by Dorothea Lange. [1600x1195] : HistoryPorn
"Oregon, August 1939. "Unemployed lumber worker goes with his wife to the bean harvest. Note Social Security number tattooed on his arm."(And now a bit of Shorpy scholarship/detective work. A public records search shows that 535-07-5248 belonged to one Thomas Cave, born July 1912, died in 1980 in Portland. Which would make him 27 years old when this picture was taken.) Medium format safety negative by Dorothea Lange."

A search of the 1940 census finds his wife's name was Vivian (first wife it appears since his wife at the time of his death was Ann Kathryn Bloom. It also looks like he was employed in 1940, as it indicates on the census that he worked 48 hours the week of Mar 24-30, 1940.

10 days ago
The Myth Of The Actuary: Life Insurance And Frederick L. Hoffman's Race Traits And Tendencies Of The American Negro
In May 1896, Frederick L. Hoffman, a statistician at the Prudential Life Insurance Company, published a 330-page article in the prestigious Publications of the American Economic Association intended to prove—with statistical reliability—that the American Negro was uninsurable. Race Traits and Tendencies of the American Negro was a compilation of statistics, eugenic theory, observation, and speculation, solicited by the Prudential in response to a wave of state legislation banning discrimination against African Americans.

Race Traits immediately became a key text in one of the central social preoccupations of the turn of the century: the supposed Negro Problem. Numerous turn-of-the-century tracts (including Hoffman's) stipulated that minority racial groups were not only biologically inferior but also barriers to progress. Hoffman, a German immigrant, was one of the leading statisticians of his time and also a strong proponent of racial hierarchy and white supremacy.1 His application of mathematical tools to a social debate set a precedent for the use of statistics and actuarial science—two fields then in their infancies, which absorbed the biases and errors of their early participants. Though Race Traits was hailed by many as a work of genius, even in its own day critics attacked its racist premise and suppositions, noting that Hoffman's sources were problematical and his mathematical analysis flawed. Hoffman's work embedded racial ideologies within its approach to actuarial data, a legacy that remains with the field today.
data-analysis  dirty-data 
10 days ago
How Python does Unicode
As we all (hopefully) know by now, Python 3 made a significant change to how strings work in the language. I’m on the record as being strongly in favor of this change, and I’ve written at length about why I think it was the right thing to do. But for those who’ve been living under a rock the past ten years or so, here’s a brief summary, because it’s relevant to what I want to go into today:

In Python 2, two types could be used to represent strings. One of them, str, was a “byte string” type; it represented a sequence of bytes in some particular text encoding, and defaulted to ASCII. The other, unicode, was (as the name implies) a Unicode string type. Thus it did not represent any particular encoding (or did it? Keep reading to find out!). In Python 2, many operations allowed you to use either type, many comparisons worked even on strings of different types, and str and unicode were both subclasses of a common base class, basestring. To create a str in Python 2, you can use the str() built-in, or string-literal syntax, like so: my_string = 'This is my string.'. To create an instance of unicode, you can use the unicode() built-in, or prefix a string literal with a u, like so: my_unicode = u'This is my Unicode string.'.
unicode  python 
18 days ago
Billion-Dollar Weather and Climate Disasters: Table of Events | National Centers for Environmental Information (NCEI)

Below is a historical table of U.S. Billion-dollar disaster events, summaries, report links and statistics for the 1980–2017 period of record. In 2017 (as of July 7), there have been 9 weather and climate disaster events with losses exceeding $1 billion each across the United States. These events included 2 flooding events, 1 freeze event, and 6 severe storm events. Overall, these events resulted in the deaths of 57 people and had significant economic effects on the areas impacted.
19 days ago
Google's "Director of Engineering" Hiring Test

Recently, I have been interviewed over the phone by a Google recruiter. As I qualified for the (unsolicited) interview but failed to pass the test, this blog post lists the questions and the expected answers. That might be handy if Google calls you one day.
For the sake of the discussion, I started coding 37 years ago (I was 11 years old) and never stopped since then. Beyond having been appointed as R&D Director 24 years ago (I was 24 years old), among (many) other works, I have since then designed and implemented the most demanding parts of TWD's R&D projects* – all of them delivering commercial products:
google  interview-questions 
25 days ago
The Life of a South Central Statistic | The New Yorker
What sets the course of a life? Three years before my beloved cousin’s murder—before the weeping, before the raging, before the heated self-recriminations and icy reckonings—I awoke with the most glorious sense of anticipation I’ve ever felt. It was June 29, 2006, the day that Michael was going to be freed. Outside my vacation condo in Hollywood, I climbed into the old white BMW I’d bought from my mother and headed to my aunt’s small stucco home, in South Central. On the corner, a fortified drug house stood like a sentry, but her pale cottage seemed serene, aglow in the morning sun. Poverty never looks quite as bad in the City of Angels as it does elsewhere.
crime  judicial-system 
5 weeks ago
Worldbuilding - Atomic Rockets

There is a grand tradition of scientifically minded science fiction authors creating not just the characters in their novels but also the brass tacks scientific details of the planets they reside on. This is the art and science of Worldbuilding.
6 weeks ago
She Just Won 3 Gold Medals for Her Swimming. She’s Only 73. - The New York Times
“Our bodies are made for being used,” she said. “Physical fitness and activity improves brain function. Anyone who is keeping up physical activity — both the aerobic part, which is really important, and the strength and balance and flexibility — is reducing the risks and buffering the decline that is going on.”

For Mr. Cheek, the nation’s fastest 100-meter sprinter in his age group, there is “a pride and a mental discipline that carries over into your whole lifestyle,” he said. Consistent exercise, said Mr. Cheek, who is a part-time professor of social psychology at California State University, Fresno, allows you to have “a body that can perform for you any time you want.”
6 weeks ago
zeeshanu/learn-regex: Learn regex the easy way
A regular expression is a pattern that is matched against a subject string from left to right. The word "Regular expression" is a mouthful, you will usually find the term abbreviated as "regex" or "regexp". Regular expression is used for replacing a text within a string, validating form, extract a substring from a string based upon a pattern match, and so much more.
regex  tutorial 
6 weeks ago
We Trained A Computer To Search For Hidden Spy Planes. This Is What It Found.
From planes tracking drug traffickers to those testing new spying technology, US airspace is buzzing with surveillance aircraft operated for law enforcement and the military.
compciv  machine-learning  data-journalism 
6 weeks ago
Economic diversity and student outcomes at Stanford University
The median family income of a student from Stanford is $167,500, and 66% come from the top 20 percent. About 2.2% of students at Stanford came from a poor family but became a rich adult.
7 weeks ago
Troy Hunt: Passwords Evolved: Authentication Guidance for the Modern Era

In the beginning, things were simple: you had two strings (a username and a password) and if someone knew both of them, they could log in. Easy.
But the ecosystem in which they were used was simple too, for example in MIT's Time-Sharing Computer, considered to be the first computer system to use passwords:
security  password 
8 weeks ago
What happened to Trump's war on data?
Straightforward as “data collection” may sound, in practice there’s often a strong political component to government data. What information to collect, about whom and how it’s collected are critical questions that don’t always have one objective answer. “Data is inherently political,” said Wonderlich. “And how it’s used depends on who’s collecting it and what they’re representing about the world.”

Sometimes, those questions fall on Congress, such as when lawmakers created the unemployment rate—as unbiased a statistic as exists today—in the 1930’s. According to a history of the U.S. Census, the unemployment rate was the subject of a fierce political fight upon its creation, including how often to collect data on unemployment and who would be counted as unemployed. President Herbert Hoover and his allies thought that the crude unemployment figures, which came from limited Bureau of Labor Statistics surveys and business reports at the time, were adequate measures during the Great Depression. Democrats and many labor economists disagreed and called for additional surveys to determine the true extent of unemployment—and the federal response necessary to alleviate it.
data  padjo 
8 weeks ago
Startup Engineers and Our Mistakes with MongoDB
MongoDB got rave reviews for its usability. But other features mattered too when choosing a database for a growing startup.
mongodb  databases 
9 weeks ago
Improving the Realism of Synthetic Images - Apple Machine Learning Journal
Most successful examples of neural nets today are trained with supervision. However, to achieve high accuracy, the training sets need to be large, diverse, and accurately annotated, which is costly. An alternative to labelling huge amounts of data is to use synthetic images from a simulator. This is cheap as there is no labeling cost, but the synthetic images may not be realistic enough, resulting in poor generalization on real test images. To help close this performance gap, we’ve developed a method for refining synthetic images to make them look more realistic. We show that training models on these refined images leads to significant improvements in accuracy on various machine learning tasks.
9 weeks ago
The limitations of deep learning
The most surprising thing about deep learning is how simple it is. Ten years ago, no one expected that we would achieve such amazing results on machine perception problems by using simple parametric models trained with gradient descent. Now, it turns out that all you need is sufficiently large parametric models trained with gradient descent on sufficiently many examples. As Feynman once said about the universe, "It's not complicated, it's just a lot of it".

In deep learning, everything is a vector, i.e. everything is a point in a geometric space. Model inputs (it could be text, images, etc) and targets are first "vectorized", i.e. turned into some initial input vector space and target vector space. Each layer in a deep learning model operates one simple geometric transformation on the data that goes through it. Together, the chain of layers of the model forms one very complex geometric transformation, broken down into a series of simple ones. This complex transformation attempts to maps the input space to the target space, one point at a time. This transformation is parametrized by the weights of the layers, which are iteratively updated based on how well the model is currently performing. A key characteristic of this geometric transformation is that it must be differentiable, which is required in order for us to be able to learn its parameters via gradient descent. Intuitively, this means that the geometric morphing from inputs to outputs must be smooth and continuous—a significant constraint.

The whole process of applying this complex geometric transformation to the input data can be visualized in 3D by imagining a person trying to uncrumple a paper ball: the crumpled paper ball is the manifold of the input data that the model starts with. Each movement operated by the person on the paper ball is similar to a simple geometric transformation operated by one layer. The full uncrumpling gesture sequence is the complex transformation of the entire model. Deep learning models are mathematical machines for uncrumpling complicated manifolds of high-dimensional data.
deep-learning  python 
9 weeks ago
Automatically generate beautiful visualizations from your data
And other bad ideas

I work in data visualization, a loosely defined and rapidly evolving field that is generally about taking data and turning it into something people can understand. There are a lot of different tools and disciplines involved in doing so, and it is impacting almost every field of study, industry and business. I believe it is essentially a new medium for communication, and we are still very much in the formative stages of its development.
10 weeks ago
Cafe Cracks: Attacks on Unsecured Wireless Networks
Mobile users demand high connectivity in today's world, often at the price of security. Requiring Internet access at the airport, public buildings, and restaurants, users will easily sacrifice a secure connection for a fast and reliable one. By broadcasting rogue access points at these compromising locations, crackers can launch effective Man-in-the-Middle attacks. Our developed crack, Cafe Crack, provides a platform built from open source software for deploying rogue access points and sophisticated Man-in-the-Middle attacks. Built around the Untangle Server software, Cafe Crack allows the hacker to dynamically measure, monitor and redirect network traffic. This paper will provide an example of DNS spoofing using the Cafe Crack platform and then provide simple and effective protection techniques against harmful rogue AP attacks.
10 weeks ago
Tracking Campaign Cash in Colorado - Columbia Journalism Review
It was a nightmare. You have to print out every single (independent expenditure) committee filing, and go through it by hand. You might have a committee that spent $800,000, and a lot of the money is in $200 increments spread over 20 races, and you have to add those with your little hand held calculator. There are 24 races in total across the state, so I had to add up for each committee what they spent on each race and then add all that up on both sides. It was a really time-consuming and tedious job.

We’re downloading the data onto our website, and one reason we’re doing that is because of the issues some people have had accessing the Secretary of State’s website—we think it’s a public service. We have the 2010 contributions up there, but hopefully in the next two months we’ll have up the 2010 expenditures, then the 2008 contributions and expenditures, and hopefully we’re going to be able to keep this going for the year.
campaign-finance  journalism 
10 weeks ago
18F Content Guide - Introduction
How to plan, write, and manage content at 18F.
10 weeks ago
In TV Ratings Game, Networks Try to Dissguys Bad Newz from Nielsen - WSJ
Walt Disney Co.’s ABC declined to comment. The network, though, groused last month when NBC News intentionally misspelled an entire week of “Nightly News” broadcasts. Altogether, NBC, which is ranked second behind ABC in ratings, has played the misspell card 14 times since the start of the 2016-17 television season last fall.

NBC News said it broke no rules. “As is standard industry practice, our broadcast is retitled when there are pre-emptions and inconsistencies or irregularities in the schedule, which can include holiday weekends and special sporting events,” a show spokesman said.
11 weeks ago
Welcome to the Hardware Forensic Database! — Hardware Forensic Database 1.0 documentation
he Hardware Forensic Database (or HFDB) is a project of CERT-UBIK aiming at providing a collaborative knowledge base related to IoT Forensic methodologies and tools.

This database provides multiple guides to collect valuable information from various smart/connected devices, as well as dedicated tools. These guides allows quick information extraction and provides all the required material to perform a forensic analysis on specific devices.
11 weeks ago
Sharing as the Future of Medicine | Shared Decision Making and Communication | JAMA Internal Medicine | The JAMA Network
f all the sciences, medicine has the most direct connection with human life. It encompasses everything from molecular mechanisms to the personal narratives, beliefs, and feelings of each individual. Sharing Medicine, starting with a series of 7 Viewpoints, tackles the challenge of this vast diversity head-on. The articles in this series summarize how physicians currently share knowledge, skills, and experiences among ourselves as a professional community and with the patients we serve; and in each article the authors suggest some ways in which we might do this better.
11 weeks ago
Running feature specs with Capybara and Chrome headless | Drivy Engineering
At Drivy, we’ve been using Capybara and PhantomJS to run our feature specs for years. Even with its issues, PhantomJS is a great way to interact with a browser without starting a graphical interface. Recently, Chrome added support for a headless flag so it could be started without any GUI. Following this announcement, the creator of PhantomJS even announced that he would be stepping down as a maintainer. .
phantomjs  headless-browser 
11 weeks ago
President Trump’s Lies, the Definitive List - The New York Times
Many Americans have become accustomed to President Trump’s lies. But as regular as they have become, the country should not allow itself to become numb to them. So we have catalogued nearly every outright lie he has told publicly since taking the oath of office.
data-visualization  listicle 
june 2017
Preferential Rents in New York City - ProPublica Data Store
In 2003, lawmakers in New York State passed a law that in effect allowed landlords to bypass annual limits on rent increases for their rent-stabilized apartments. Owners could raise rents by more than the annual limits if they registered a high rent — often high above existing market rates -- but charged tenants a lower, “preferential” rent. Preferential rents are not regulated and can be raised up to the registered rate upon lease renewal. Today, more than 250,000 New York City apartments feature these rents.
june 2017
The Platinum Patients - The Atlantic
Each year, 1 in every 20 Americans racks up just as much in medical bills as another 19 combined. This critical five percent of the U.S. population is key to solving the nation's health care spending crisis.
data-visualization  padjo 
june 2017
Laurence Tratt: What Challenges and Trade-Offs do Optimising Compilers Face?
After immersing oneself in a research area for several years, it's easy to feel like the heir to a way of looking at the world that is criminally underrated: why don't more people want what I want? This can be a useful motivation, but it can also lead to ignoring the changing context in which most research sits. So, from time to time, I like to ask myself the question: “what if everything I believe about this area is wrong?” It’s a simple question, but hard to answer honestly. Mostly, of course, the only answer my ego will allow me is “I can’t see anything I could be wrong about”. Sometimes I spot a small problem that can be easily fixed. Much more rarely, I've been forced to acknowledge that there’s a deep flaw in my thinking, and that the direction I’m heading in is a dead-end. I’ve twice had to face this realisation, each time after a gradual accumulation of unpleasant evidence that I've tried to ignore. While abandoning several years work wasn’t easy, it felt a lot easier than trying to flog a horse I now knew to be dead.
june 2017
Gene name errors are widespread in the scientific literature | Genome Biology | Full Text
The spreadsheet software Microsoft Excel, when used with default settings, is known to convert gene names to dates and floating-point numbers. A programmatic scan of leading genomics journals reveals that approximately one-fifth of papers with supplementary Excel gene lists contain erroneous gene name conversions.
june 2017
Human error costs TransAlta $24-million on contract bids - The Globe and Mail
A slip of the hand in a computer spreadsheet for bidding on electricity transmission contracts in New York will cost TransAlta Corp. $24-million (U.S.), wiping out 10 per cent of the company's profit this year.

The error, although costly, was bizarrely simple. Someone preparing the electronic file of bids that TransAlta submitted at the end of April misaligned the rows of information in the spreadsheet. High bids intended for certain transmission paths were instead made for lower-demand routes -- meaning that TransAlta overpaid for transmission contracts, as well as buying more capacity than it intended in certain cases.

"It was literally a cut-and-paste error in an Excel spreadsheet that we did not detect when we did our final sorting and ranking of bids prior to submission," said Steve Snyder, TransAlta's president.
june 2017
Excel Error by a Cleary Gottlieb Associate Alters Lehman Asset Deal
A first-year associate at Cleary Gottlieb Steen & Hamilton made an Excel reformatting error that mistakenly added 179 contracts to an agreement to buy Lehman Brothers assets, according to a motion filed with a New York bankruptcy court on Friday.

Cleary Gottlieb Steen & Hamilton represents Barclays Capital Inc. in its agreement to buy the assets. Its motion (PDF posted by Above the Law) says the law firm was working under a tight deadline when the mistake was made, Computerworld reports.

The motion seeks to exclude the 179 contracts from the purchase agreement.

Above the Law was the first to report the error. Its story said the mistake happened at around 11:30 p.m. on Sept. 18 when a second-year associate asked a first-year associate to reformat an Excel document of critical Lehman contracts to be assumed by Barclays.

The associate resized the rows when converting the spreadsheet into a PDF document, causing “hidden” contracts in the spreadsheet to be exposed. The spreadsheet, which had been e-mailed to the law firm by a Lehman representative, contained nearly 1,000 rows and more than 24,000 individual cells.
june 2017
Researchers Finally Replicated Reinhart-Rogoff, and There Are Serious Problems. - Roosevelt Institute
In 2010, economists Carmen Reinhart and Kenneth Rogoff released a paper, “Growth in a Time of Debt.” Their “main result is that…median growth rates for countries with public debt over 90 percent of GDP are roughly one percent lower than otherwise; average (mean) growth rates are several percent lower.” Countries with debt-to-GDP ratios above 90 percent have a slightly negative average growth rate, in fact.
june 2017
models/object_detection at master · tensorflow/models
Creating accurate machine learning models capable of localizing and identifying multiple objects in a single image remains a core challenge in computer vision. The TensorFlow Object Detection API is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models. At Google we’ve certainly found this codebase to be useful for our computer vision needs, and we hope that you will as well.

tensorflow  computer-vision 
june 2017
Inside the Algorithm That Tries to Predict Gun Violence in Chicago - The New York Times

Gun violence in Chicago has surged since late 2015, and much of the news media attention on how the city plans to address this problem has focused on the Strategic Subject List, or S.S.L.

The list is made by an algorithm that tries to predict who is most likely to be involved in a shooting, either as perpetrator or victim. The algorithm is not public, but the city has now placed a version of the list — without names — online through its open data portal, making it possible for the first time to see how Chicago evaluates risk.

We analyzed that information and found that the assigned risk scores — and what characteristics go into them — are sometimes at odds with the Chicago Police Department’s public statements and cut against some common perceptions.
datasets  compciv  socrata 
june 2017
« earlier      
a-b-testing academic advice ai algorithms amazon analysis analytics angularjs animation api apis apple apps architecture art article aws backbone bash bayesian best big-data bioinformatics book bots business c caching campaign-finance census cheatsheet cli clinicaltrials clojure code colors command-line compciv compilers computer computer-science computer-vision computing course crime crypto css d3 data data-analysis data-journalism data-mining data-munging data-science data-sharing data-visualization database databases datajournalism datasets ddj death-data debugging deep-learning deployment design design-example devops digital-humanities django drugs education elections email engineering essay excel facebook fakenews finance flux foia framework funny game game-dev games gaming git github golang google government graphics guide hack hacking hadoop hardware hash haskell health history howto html html5 http image-processing infographic interactive interesting internet introduction investigations ios java javascript journalism jquery json judicial-system language learning linux lisp mac machine-learning map-reduce mapping maps marketing math medicine mobile mongodb music mysql naming-things netsec network neural-networks news nlp nodejs nosql nyc nylist object-oriented ocr oop open-data opencv optimization osx padjo pandas papers patterns performance photography police politics postgres prisons privacy programming publicrecords punctuation python r rails react reactjs reference regex research ruby rust scalability science scraping search security semitechnical seo server server-ops shell spam spreadsheets sql standards startups statistics style-guide syllabus tdd teaching tensorflow testing text text-mining tools transparency tutorial twitter typography ui unicode unix ux video vim visualizations web web-design web-development web-scraping writing wtfviz

Copy this bookmark: