3004
Why journalists should cover local jails | Poynter
While the nation's attention is focused on immigration detention centers along the U.S. border, more than 11 million people will spend time in local jails. They are caught in a complex and expensive system that treats poor people and minorities more severely. Most people in American jails have not been convicted of a crime. Many cannot afford even a few hundred dollars bail to get out awaiting trial.
journalism  justice  crime 
2 hours ago
A Beginner's Guide to Firewalling with pf
This guide is written for the person very new to firewalling. Please realize that the sample firewall we build should not be considered appropriate for actual use. I just try to cover a few basics, that took me awhile to grasp from the better known (and more detailed) documentation referenced below

It's my hope that this guide will not only get you started, but give you enough of a grasp of using pf so that you will then be able to go to those more advanced guides and perfect your firewalling skills.

The pf packet filter was developed for OpenBSD but is now included in FreeBSD, which is where I've used it. Having it run at boot and the like is covered in the various documents, however I'll quickly run through the steps for FreeBSD.
security  netsec  linux  guide 
yesterday
I discovered a browser bug - JakeArchibald.com
I accidentally discovered a huge browser bug a few months ago and I'm pretty excited about it. Security engineers always seem like the "cool kids" to me, so I'm hoping that now I can be part of the club, and y'know, get into the special parties or whatever.

I've noticed that a lot of these security disclosure things are only available as PDFs. Personally, I prefer the web, but if you're a SecOps PDF addict, check out the PDF version of this post.

Oh, I guess the vulnerability needs an extremely tenuous name and logo right? Here goes:
security  http  chrome 
yesterday
Twitter as Data
The rise of the internet and mobile telecommunications has created the possibility of using large datasets to understand behavior at unprecedented levels of temporal and geographic resolution. Online social networks attract the most users, though users of these new technologies provide their data through multiple sources, e.g. call detail records, blog posts, web forums, and content aggregation sites. These data allow scholars to adjudicate between competing theories as well as develop new ones, much as the microscope facilitated the development of the germ theory of disease. Of those networks, Twitter presents an ideal combination of size, international reach, and data accessibility that make it the preferred platform in academic studies. Acquiring, cleaning, and analyzing these data, however, require new tools and processes. This Element introduces these methods to social scientists and provides scripts and examples for downloading, processing, and analyzing Twitter data. All data and code for this Element is available at www.cambridge.org/twitter-as-data
book  twitter  data-mining 
5 days ago
David Eads
Hi, I'm David Eads. My work connects journalism, data, and social issues. I build and teach simple, direct solutions that help journalists effectively tell their stories on the web. I contribute to and organize projects that strive for democracy, diversity, and sustainability.

I make Internet journalism, most recently for ProPublica Illinois. I speak and teach about technology. I developed the Tarbell publishing platform. When I lived in Chicago, I organized a community data journalism workshop, and helped start and build FreeGeek Chicago.
portfolio 
5 days ago
Walt Hickey
I’m down to work with groups big and small about all sorts of topics related to my work, whether it’s walking undergrads in a stats course through how an article was written with the very techniques they’re learning or speaking in a corporate setting about how to effectively communicate compelling numbers.
portfolio 
5 days ago
Most Maps of the New Ebola Outbreak Are Wrong - The Atlantic
On Thursday, the World Health Organization released a map showing parts of the Democratic Republic of the Congo that are currently being affected by Ebola. The map showed four cases in Wangata, one of three “health zones” in the large city of Mbandaka. Wangata, according to the map, lies north of the main city, in a forested area on the other side of a river.

That is not where Wangata is.

#DRC #Ebola cases per Health Zone in Equateur province as of 15 May 2018 https://t.co/Rvh3QCso7J pic.twitter.com/zl88TqG53i

— Peter Salama (@PeteSalama) May 17, 2018
“It’s actually here, in the middle of Mbandaka city,” says Cyrus Sinai, indicating a region about 8 miles farther south, on a screen that he shares with me over Skype.

Almost all the maps of the outbreak zone that have thus far been released contain mistakes of this kind. Different health organizations all seem to use their own maps, most of which contain significant discrepancies. Things are roughly in the right place, but their exact positions can be off by miles, as can the boundaries between different regions.
mapping  maps  compciv  messy-data 
5 days ago
Lost in Migration: The American Chinese Menu
This essay is an analysis of 693 restaurant menus in seven American Chinatowns, of what the words “Chinese food” really mean and represent


For Chinatown, food is complicated. Historically, Chinese restaurants were at first considered “pest holes” by white America, plagued with disease and rats. Slowly, however, dining establishments became fascinating to non-Chinese Americans, especially as they began touring Chinese settlements. Slowly, ethnic dishes grew to be central to food businesses that support Chinatown’s economy.
data-journalism 
5 days ago
Methods of Comparison, Compared / Observable
I know it’s disappointing, but: none of them. No method is better universally, and none of them is “the best” even in the context of the dataset. (There are also a number of methods I did not cover, such as the relative difference.) What’s best depends on what you are trying to show. I’d favor absolute difference here as the simplest option, but log ratio might work if you want to show rate of growth.

There’s another important variable here which we’re ignoring, but which might influence our understanding of the data: population counts. This data is per capita (deaths per 100,000 people per year), which is helpful for understanding how likely any individual is to die, but not the number of people affected. Populations vary widely from county to county, and populations move over time. This makes it especially hard to understand trends that vary both geographically and temporally.
maps  visualizations 
5 days ago
[1805.12002] Why Is My Classifier Discriminatory?
Recent attempts to achieve fairness in predictive models focus on the balance between fairness and accuracy. In sensitive applications such as healthcare or criminal justice, this trade-off is often undesirable as any increase in prediction error could have devastating consequences. In this work, we argue that the fairness of predictions should be evaluated in context of the data, and that unfairness induced by inadequate samples sizes or unmeasured predictive variables should be addressed through data collection, rather than by constraining the model. We decompose cost-based metrics of discrimination into bias, variance, and noise, and propose actions aimed at estimating and reducing each term. Finally, we perform case-studies on prediction of income, mortality, and review ratings, confirming the value of this analysis. We find that data collection is often a means to reduce discrimination without sacrificing accuracy.
machine-learning 
5 days ago
Invisible asymptotes — Remains of the Day
Great ideas are only obvious in retrospect. Amazon Prime – the subscription that ensures you don’t pay for shipping – was a stroke of marketing genius. Former employee Eugene Wei’s blog has of how it came about.

https://www.theguardian.com/commentisfree/2018/jun/03/theranos-elizabeth-holmes-media-emperors-new-startup



My first job at Amazon was as the first analyst in strategic planning, the forward-looking counterpart to accounting, which records what already happened. We maintained several time horizons for our forward forecasts, from granular monthly forecasts to quarterly and annual forecasts to even five and ten year forecasts for the purposes of fund-raising and, well, strategic planning.
amazon  business 
5 days ago
[JDK-8203360] Release Note: Japanese New Era Implementation - Java Bug System
https://twitter.com/tagir_valeev/status/1007419414260486144

Emperor of Japan is the only governor in the world whose enthronement requires changes in #Java core library.

https://www.japantimes.co.jp/news/2018/05/17/national/japan-likely-announce-name-next-imperial-era-around-april-1-2019-suga/

The government is likely to announce the name of the next Imperial era around April 1, 2019, a month before Crown Prince Naruhito becomes the next emperor, Chief Cabinet Secretary Yoshihide Suga said Thursday.

The government will begin preparations for the change of gengō (era name) on the assumption that the new one will be announced about a month ahead of Naruhito’s ascension to the Chrysanthemum Throne on May 1, according to Suga.

“It takes roughly one month to adjust information systems to the new name in the public and private sectors,” Suga said, adding that they are working under an assumed timeline, and that the government has not decided the date when the name will be released.
datetime  naming-things  java 
5 days ago
Predicting Gender Using Historical Data
A common problem for researchers who work with data, especially historians, is that a dataset has a list of people with names but does not identify the gender of the person. Since first names often indicate gender, it should be possible to predict gender using names. However, the gender associated with names can change over time. To illustrate, take the names Madison, Hillary, Jordan, and Monroe. For babies born in the United States, those predominant gender associated with those names has changed over time.
data  statistics 
6 days ago
Giorgia Lupi: How we can find ourselves in data | TED Talk
Giorgia Lupi uses data to tell human stories, adding nuance to numbers. In this charming talk, she shares how we can bring personality to data, visualizing even the mundane details of our daily lives and transforming the abstract and uncountable into something that can be seen, felt and directly reconnected to our lives.
data-journalism 
6 days ago
TensorFlow.js — a practical guide – YellowAnt
Recently, Google introduced it’s most popular machine learning library: TensorFlow in Javascript. With the help of TensorFlow.js one can train and deploy ML models in the browser.

Goodbye to spending eons on complicated steps…
Before you start, I would recommend going through the docs of TensorFlow.js, to get a basic understanding of the context required for this article.
tensorflow  machine-learning 
9 days ago
Ahmed BESBES - Data Science Portfolio – Overview and benchmark of traditional and deep learning models in text classification
This article is an extension of a previous one I wrote when I was experimenting sentiment analysis on twitter data. Back in the time, I explored a simple model: a two-layer feed-forward neural network trained on keras. The input tweets were represented as document vectors resulting from a weighted average of the embeddings of the words composing the tweet.

The embedding I used was a word2vec model I trained from scratch on the corpus using gensim. The task was a binary classification and I was able with this setting to achieve 79% accuracy.

The goal of this post is to explore other NLP models trained on the same dataset and then benchmark their respective performance on a given test set.

We'll go through different models: from simple ones relying on a bag-of-word representation to a heavy machinery deploying convolutional/recurrent networks: We'll see if we'll score more than 79% accuracy!
NLP  text-mining  deep-learning 
9 days ago
Suicide is desperate. It is hostile. It is tragic. But mostly, it is a bloody mess.
he blood was like Jell-O. That is what blood gets like, after you die, before they tidy up.

Somehow, I had expected it would be gone. The police and coroner spent more than an hour behind the closed door; surely it was someone’s job to clean it up. But when they left, it still covered the kitchen floor like the glazing on a candy apple.

You couldn’t mop it. You needed a dustpan and a bucket.

I got on my knees, slid the pan against the linoleum and lifted chunks to the bucket. It took hours to clean it all up, and even after that we found pools I had missed under the stove and sink.

It wasn’t until I finally stood up that I noticed the pictures from his wallet. The wooden breadboard had been pulled out slightly, and four photographs were spilled across it. “Now what?” I thought with annoyance. “What were the police looking for?”

But then it hit me. The police hadn’t done it. These snapshots — one of my mother, one of our dog and two of my brother and me — had been carefully set out in a row, by my father.

It was his penultimate act, just before he knelt on the floor, put the barrel of a .22 rifle in his mouth, and squeezed the trigger.

He was 46 years old. I was 21. This week marks the 20th anniversary of his death. And I am still cleaning up.
best  longform  depression 
9 days ago
🚀 100 Times Faster Natural Language Processing in Python
I also published a Jupyter notebook with the examples I describe in this post.
When we published our Python coreference resolution package✨ last year, we got an amazing feedback from the community and people started to use it for many applications 📚, some very different from our original dialog use-case 👥.

And we discovered that, while the speed was totally fine for dialog messages, it could be really slow 🐌 on larger news articles.

I decided to investigate this in details and the result is NeuralCoref v3.0 which is about 100 times faster 🚀 than the previous version (several thousands words per seconds) while retaining the same accuracy, and the easiness of use and eco-system of a Python library.

In this post I wanted to share a few lessons learned on this project, and in particular:
python  NLP 
10 days ago
This Is America’s Richest Zip Code - Bloomberg
The richest zip code in America is just as exclusive and elite as the people who live there. Fisher Island, located just off the coast of Miami, is accessible only by ferry or water taxi and is a haven for the world’s richest.

The 216-acre island has diverse residents, representing over 50 nationalities and professions ranging from professional athletes and supermodels to executives and lawyers.

The average income in Fisher Island, zip code 33109, was $2.5 million in 2015, according to a Bloomberg analysis of 2015 Internal Revenue Service data. That’s $1 million more than the second-place spot, held by zip code 94027 in Silicon Valley, also known as the City of Atherton on the San Francisco Peninsula. The area’s neighbors include Stanford University and Menlo Park, home to Facebook and various tech companies. While the IRS data only provide the averages of tax returns, which can be skewed by outliers, Fisher Island is the only zip code in the Bloomberg analysis where more than half of all tax returns showed an income of over $200,000.
census 
13 days ago
‘Anything would be better:’ Critics warn Ottawa’s family-reunification lottery is flawed, open to manipulation - The Globe and Mail
Excel’s method for generating random numbers is “very bad,” according to Université de Montréal computer-science professor Pierre L’Ecuyer, an expert in random-number generation. “It’s a very old generator, and it’s really not state-of-the-art.” Prof. L’Ecuyer’s research has shown that Excel’s random-number generator doesn’t pass certain statistical tests, meaning it’s less random than it appears. Under the current system, “it may be that not everybody has exactly the same chance,” Prof. L’Ecuyer said.

Excel uses pseudo-random number generators, a class of algorithms that rely on formulas to generate numbers. These generators have a key flaw – they rely on a “seed” number to kick off the mathematical process. In the case of Excel, this seed is generated automatically by the application. “If you know one number at one step,” Prof. L’Ecuyer explained, “you can compute all the numbers that will follow.”

This means the process could be exploited by someone with the right skills. It’s happened before: In 1994, IT consultant Daniel Corriveau discovered a pattern in a keno game – which uses a random numbering system – at the Casino de Montréal and won $620,000 in a single evening. An investigation later determined the game was using the same seed number at the start of each day.
randomness  excel  spreadsheets 
13 days ago
An Investigative Arsenal: Power Chargers, Document Analysis Tools and More - The New York Times
I have a simple but low-tech trick for keeping track of documents. As I extract individual emails, other documents or audio files, I name them this way: “2017_04_24 Pruitt NMA Naples Fla Calendar Entry.” That date format means that if you have, say, 20 documents in a folder, they will automatically line up chronologically. It is a super fast way to have a timeline of all your primary source documents, and makes it easy to find them instantly in chronological order. Try it out. It is pretty cool.
ISO8601  time 
14 days ago
Open Data Policing
Open Data Policing is a project of the Southern Coalition for Social Justice. The site’s North Carolina platform was launched in December 2015. The North Carolina development team consisted of attorney Ian Mance of the Southern Coalition and volunteer developers Colin Copeland, Andy Shapiro, and Dylan Young, all of Durham, NC. The Maryland and Illinois platforms were launched in October 2016 and were developed by Southern Coalition and Caktus Group, with generous support from the Open Society Foundations’ Democracy Fund.
Traffic  open-data  policing 
15 days ago
washingtonpost/data-homicides: The Washington Post collected data on more than 52,000 criminal homicides over the past decade in 50 of the largest American cities.
The Washington Post collected data on more than 52,000 criminal homicides over the past decade in 50 of the largest American cities.
datasets  mapping 
15 days ago
Washington sues Facebook and Google over failure to disclose political ad spending | TechCrunch
(Note that these don’t add up to the totals mentioned above; these are the numbers filed with the state’s Public Disclosure Committee. 2018 amounts are listed but are necessarily incomplete, so I omitted them.)

At least some of the many payments making up these results are not properly documented, and from the looks of it, this could amount to willful negligence. If a company is operating in a state and taking millions for political ads, it really can’t be unaware of that state’s disclosure laws. Yet according to the lawsuits, even basic data like names and addresses of advertisers and the amounts paid were not collected systematically, let alone made available publicly.
lobbying  facebook  google  politics 
16 days ago
The Mouse Vs. The Python | Python Programming from the Frontlines
My name is Mike Driscoll. I am a computer programmer by trade and use Python almost exclusively to make my living. I’m on the wxPython mailing list and their IRC channel (#wxpython) on freenode a lot, so if you’d like to find me, you can do so there.
python  resource 
17 days ago
“Behave More Sexually:” How Big Pharma Used Strippers, Guns, and Cash to Push Opioids – Mother Jones
Around 2015, just before overdoses sweeping the country started making national news, a pharmaceutical sales representative in New Jersey faced a dilemma: She wanted to increase her sales but worried that the opioid painkiller she was selling was addictive and dangerous. The medication was called Subsys, and its key ingredient, fentanyl, is a synthetic opioid 100 times stronger than morphine.

When the rep, who requested to go by her initials, M.S., voiced her concerns to her manager, she was told that Subsys patients were “already addicts and their prospects were therefore essentially rock-bottom,” according to a recently unsealed whistleblower lawsuit that M.S. filed after leaving Insys in 2016. To boost her numbers, the manager allegedly advised M.S. to “behave more sexually toward pain-management physicians, to stroke their hands while literally begging for prescriptions,” and to ask for the prescriptions as a “favor.”
data-journalism  pharmalot  best 
19 days ago
Study purged voters and felons
Florida purged people based on 80 percent of people’s names
20 days ago
Hispanics missing from voter purge list - News - Sarasota Herald-Tribune - Sarasota, FL
A data quirk in the state’s controversial effort to purge convicted felons from the voter rolls appears to have excluded Hispanics in greater numbers than other races.

Only 61 of the 47,763 names on the potential purge list are classified as Hispanic. Hispanics make up 17 percent of the state population, but a little more than one-tenth of 1 percent of the names on the list.

The missing Hispanics could feed into the Democratic Party’s contention that the purge is Jeb Bush’s plan to help his brother win Florida in the November presidential election.

All but one of the state’s Hispanic legislators are Republicans. And Cubans, who make up the largest single segment of the state’s Hispanic population, have traditionally supported the GOP.

“It’s sloppy work to say the least,” said Allie Merzer, spokeswoman for the Florida Democratic Party. “Is it intent? I don’t know. But something doesn’t smell right.”
census  demographics  voting  joins  bad-data 
20 days ago
Handling Data about Race and Ethnicity - Learning - Source: An OpenNews project
Here’s what happened. The state—led by Republican superstar Jeb Bush—decided it should purge felons who were legally ineligible to vote from the voter rolls. I could spend the rest of this case study going into the nuances here, but the tl;dr is this: The state developed a list of 47,000 people it said were felons and local elections officials should remove them. Because the list leaned heavily to the left, Democrats cried politics and disenfranchisement. We reporters set out to find out what was what. Were people improperly being stripped of their voting rights? Were felons illegally voting? And remember: the 2000 presidential election in Florida was determined by 537 votes, so removing 47,000 voters could tip an election.

So we have a big list of names and the fate of the democracy in the balance (cough). No problem. This is what data journalism is all about! Take dataset A, compare to dataset B and voila! Magic happens.


Being a competitive guy at a Florida news org, I wanted to do this big. I wanted to show how accurate or not accurate this felons list was, with statistical validity. I wanted to use actual social science to investigate it. A couple of anecdotes and a bunch of quotes wasn’t good enough for the state’s largest newspaper, the St. Petersburg Times (which is now the Tampa Bay Times). So I devised a method that would give us percentages of accuracy with a margin of error. In short, we were going to take a representative sample of names on the list—359 of them—and background check them, all in a day. Each background check cost the paper between $50 and $100, depending on how much information we needed to verify. At a minimum, we needed full names, dates of birth, previous addresses, and a criminal history from the state. I had an army of incredibly talented news researchers working with me, and by the end of the day, we found that 59 percent of the list was easily correct, 37 percent were murky, and four percent, or 220 people, were falsely being targeted for purging. We even talked to a man who faced losing his voting rights because he had the first and last name and date of birth as another man with a Florida criminal conviction. With a massive amount of work and in less than a day, we proved the state’s list released that day was flawed.
data  racial  statistics  data-anecdote  bad-data  census  demographics  data-journalism  best 
20 days ago
Drawing Conclusions from Data - Learning - Source: An OpenNews project
Data doesn’t just come from thin air. It’s collected by specific people—or machines—for a specific purpose. There may also be people who have a financial or political interest in the numbers. For example, a police department wants to see crime statistics go down and this may affect how crimes are recorded. You must understand the data generation process, and the types of errors it’s likely to introduce. Many data journalists call this process “interviewing the data.” Here are some questions you can ask:
statistics  data 
20 days ago
The First Wired President - The New York Times
Mr. Lincoln’s T-Mails: How Abraham Lincoln Used the Telegraph to Win the Civil War
21 days ago
Seymour Hersh's Memoir Is Full of Useful Reporting Secrets - Rolling Stone
Late in his new memoir, Reporter, muckraking legend Seymour Hersh recounts an episode from a story he wrote for the New Yorker in 1999, about the Israeli spy Jonathan Pollard.

Bill Clinton was believed to be preparing a pardon for Pollard. This infuriated the rank and file of the intelligence community, who now wanted the press to know just what Pollard had stolen and why letting him free would be, in their eyes, an outrage.
journalism 
21 days ago
New disclosures show Atlanta companies that profit from tax breaks
They are among metro Atlanta’s defining landmarks and marquee companies, from the upscale Avalon development in Alpharetta to Bank of America Plaza in Midtown, from Town Brookhaven to Coca-Cola.
They’re also among the biggest beneficiaries of tax breaks doled out by local governments in 2016. Together, in the name of economic development, governments in Atlanta’s four core counties — Fulton, DeKalb, Gwinnett and Cobb — ceded tens of millions in taxes last year, an amount that now can be tallied for the first time because of more rigorous national auditing requirements.

Among the businesses receiving public financial aid are Walmart, Costco, Home Depot, State Farm, SunTrust and more than a dozen other Fortune 500 companies.

Small businesses rarely qualify for similar treatment, but Atlanta’s skyline is dominated by construction projects underwritten by tax dollars, including office towers, mixed-use developments and factories, the figures show.

In all, companies in the four counties received $30.7 million in property tax discounts in 2016, and still owed $72.6 million, according to an examination of records obtained by the AJC under the Georgia Open Records Act. Not included in that amount are hundreds of millions of dollars distributed across the state in film tax credits, jobs credits and sports stadium construction dea
foia  investigations 
21 days ago
How an arcane, new accounting standard is helping reporters follow the money - Columbia Journalism Review
Niesse made a verbal FOIA request to the public relations officials in attendance. “They weren’t counting on that,” he recalls. Back in the newsroom, he followed up with a written request, and by late June, the spreadsheet—with its 56 columns and 77 rows of data—was open on his computer. “It was a lot of good information,” he recalls. “I would have had a hard time doing that myself.”

Niesse pored over the data that summer and, by mid-September, published an explosive cover story with a solid, aggregated figure—$30.7 million in 2016 alone—for the flurry of tax abatements granted to companies in four of Atlanta’s nine counties. But it was more than a numbers story. Niesse interviewed school officials to connect the tax giveaways with pressure on the district’s budget, as well as other public services, such as police and libraries. “It would have been a massive undertaking before and almost impossible,” Niesse says. “You could do it, but only on a case-by-case basis.”
spreadsheets  foia  best  data-journalism  investigations 
21 days ago
Why low cardinality indexes negatively impact performance
A lot of advice is available about identifying appropriate indexes to create or identifying bad indexes to drop. It may seem logical that an index on a field like account enabled, which has a very small set of unique values (yes, no), could substantially reduce a result set. In light of the geometry of B-tree indexes, an index with a low number of possible values can actually harm performance rather than help it.
sql  indexes  performance 
22 days ago
Squeezing Performance from SQLite: Indexes? Indexes!
Are your queries taking too long to run? In this episode of the series, I will explain how indexes work in SQLite, and how to design indexes that can take your most important queries to the next level of performance.
sqlite  sql  indexes  performance 
22 days ago
A Guide for Digging Through Trump’s Financial Disclosures - ProPublica
When President Donald Trump’s latest financial disclosure form was released last week, we dropped what we were doing and started digging.

We found a few things, including some newly registered companies and a jump in revenue for Trump Productions, which helped produce shows like “The Apprentice” and the lesser-known dating show, “Donald J. Trump Presents: The Ultimate Merger.”

We’ve decided to show how we did it so you can help us go deeper. Below are tips and tricks for finding noteworthy items buried in the 92-page disclosure.

First, some background. Trump’s financial disclosure form, which he files each year with the U.S. Office of Government Ethics, provides the most detailed account available of the president’s finances, from his sprawling business empire to individual payments made to his personal attorney, Michael Cohen. The forms are the best window we have into his financial holdings. (His tax returns would also be helpful, but he hasn’t released those.)

To see newly created companies, we put Trump’s new disclosure form next to last year’s form. That’s how we found T Retail LLC, an “online retail business; startup” that’s listed in the 2018 disclosure, but not in the 2017 one.
foia  journalism 
24 days ago
Elon Musk on Twitter: "An exception that proves the rule… "
And yet, journos are very willing to pursue stories that burn advertisers and other financial benefactors. The @WSJ published the investigation that destroyed Theranos, even though Rupert Murdoch was Theranos's biggest investor ($125M)
tweet 
29 days ago
First comes the school shooting. Then they update their data. | Poynter
The Post’s database took nearly a year of data collection and analysis and hundreds of hours to build before it was published last month, said Lynda Robinson, who has edited the outlet’s award-winning project on children of violence. It's been updated four times in about as many weeks.

“When one of us spots a school shooting, we update the database as soon as possible,” Robinson said. “Steven [Rich] updates the spreadsheet that powers the database with the relevant numbers, including the school’s enrollment, while John [Woodrow Cox] writes a one-sentence summary of the shooting.”
sdss  spreadsheets  database  collaboration 
4 weeks ago
Calculating the Work Behind Our Work — ProPublica
Verifying 4,969 names. Driving 1,493 miles for interviews. Fact-checking 291 facts for one story … twice. Here are some hidden costs of our reporting.
journalism 
4 weeks ago
Taking a Stroll Between The Pixels « The blog at the bottom of the sea
This post relates to a paper I wrote which talks about (ab)using linear texture interpolation to calculate points on Bezier curves. Extensions generalize it to Bezier surfaces and (multivariate) polynomials. All that can be found here: https://blog.demofox.org/2016/02/22/gpu-texture-sampler-bezier-curve-evaluation/

The original observation was that if you sample along the diagonal of a 2×2 texture, that as output you get points on a quadratic Bezier curve with the control points of the curve being the values of the pixels like in the image below. When I say you get a quadratic Bezier curve, I mean it literally, and exactly. One way of looking at what’s going on is that the texture interpolation is literally performing the De Casteljau algorithm. (Note: if the “B” values are not equal in the setup below, the 2nd control point will be the average of these two values, which an extension abuses to fit more curves into a smaller number of pixels.)
graphics  algebra 
8 weeks ago
Amazon CEO Jeff Bezos explains his famous one-character emails, known to strike fear in managers' hearts
Bezos on listening to customer complaints:


"The thing I have noticed is when the anecdotes and the data disagree, the anecdotes are usually right. There's something wrong with the way you are measuring it."

https://twitter.com/morganhousel/status/988508530834632704
data-anecdote 
8 weeks ago
database - Relational table naming convention - Stack Overflow
Yes. Beware of the heathens. Plural in the table names are a sure sign of someone who has not read any of the standard materials and has no knowledge of database theory.

https://news.ycombinator.com/item?id=16904088
Some of the wonderful things about Standards are: they are all integrated with each other; they work together; and they were written by minds greater than ours, so we do not have to debate them. The standard table name refers to each row in the table, which is used in the all verbiage, not the total content of the table (we know that the Customer table contains all the Customers).
database  naming-things  sql 
8 weeks ago
Rethinking GPS: Engineering Next-Gen Location at Uber
https://news.ycombinator.com/item?id=16887276

Location and navigation using global positioning systems (GPS) is deeply embedded in our daily lives, and is particularly crucial to Uber’s services. To orchestrate quick, efficient pickups, our GPS technologies need to know the locations of matched riders and drivers, as well as provide navigation guidance from a driver’s current location to where the rider needs to be picked up, and then, to the rider’s chosen destination. For this process to work seamlessly, the location estimates for riders and drivers need to be as precise as possible.
gps  gis 
8 weeks ago
How we identified bots on Twitter | Pew Research Center
You can come up with a list of characteristics like these to try to determine whether an account is a bot or not. Of course, it would be far too time-consuming to try to observe those characteristics for 140,000 different Twitter accounts (roughly the number of accounts included in the study). A more practical approach is to come up with a reasonably large dataset of accounts that are bots and not bots, and then use a machine learning system to “learn” the patterns that characterize bot and human accounts. With those patterns in hand, you can then use them to classify a much larger number of accounts.
news  article  bots  twitter  heuristics  methodology 
9 weeks ago
« earlier      
a-b-testing academic advice ai algorithms amazon analysis analytics angularjs animation api apis apple apps architecture article automation aws backbone bash bayesian best big-data bioinformatics book bots business c caching campaign-finance census cheatsheet cli clinicaltrials clojure code command-line compciv compilers computer computer-vision computing course crime crypto css d3 data data-analysis data-journalism data-mining data-science data-sharing data-visualization database databases datasets ddj death-data debugging deep-learning deployment design design-example devops digital-humanities diversity django drugs education elections email engineering essay excel facebook fakenews finance flux foia framework funny game game-dev games gaming git github golang google government graphics guide hacking hadoop hardware hash haskell health history howto html html5 http image-processing interesting internet investigations ios java javascript journalism jquery justice language learning linux lisp machine-learning map-reduce mapping maps marketing math medicine mobile mongodb mysql naming-things netsec neural-networks news nlp nodejs nosql nyc nylist ocr oop open-data opencv optimization osx padjo patterns performance photography policing politics postgres prisons privacy programming publicrecords punctuation python r rails react reactjs reference regex research ruby rust scalability science scraping search security semitechnical seo server server-ops shell spreadsheets sql sqlite standards statistics style-guide tdd teaching tensorflow testing text text-mining tools transparency tutorial twitter typography ui unicode unix ux video vim visualizations web web-design web-development web-scraping writing wtfviz

Copy this bookmark:



description:


tags: