jerryking + datasets   18

The Mystery of the Miserable Employees: How to Win in the Winner-Take-All Economy -
June 15, 2019 | The New York Times | By Neil Irwin.
Neil Irwin is a senior economics correspondent for The Upshot. He is the author of “How to Win in a Winner-Take-All-World,” a guide to navigating a career in the modern economy.......
What Mr. Ostrum and the analytics team did wasn’t a one-time dive into the numbers. It was part of a continuing process, a way of thinking that enabled them to change and adapt along with the business environment. The key is to listen to what data has to say — and develop the openness and interpretive skills to understand what it is telling us.......Neil Irwin was at Microsoft’s headquarters researching a book that aims to answer one simple question: How can a person design a thriving career today? The old advice (show up early, work hard) is no longer enough....In nearly every sector of the economy, people who seek well-paying, professional-track success face the same set of challenges: the rise of a handful of dominant “superstar” firms; a digital reinvention of business models; and a rapidly changing understanding about loyalty in the employer-employee relationship. It’s true in manufacturing and retail, in banking and law, in health care and education — and certainly in tech......superstar companies — and the smaller firms seeking to upend them — are where pragmatic capitalists can best develop their abilities and be well compensated for them over a long and durable career.....the obvious disadvantages of bureaucracy have been outweighed by some not-so-obvious advantages of scale......the ability to collect and analyze vast amounts of data about how people work, and what makes a manager effective (jk: organizing data) .... is essential for even those who aren’t managers of huge organizations, but are just trying to make themselves more valuable players on their own corporate team.......inside Microsoft’s human resources division, a former actuary named Dawn Klinghoffer ....was trying to figure out if the company could use data about its employees — which ones thrived, which ones quit, and the differences between those groups — to operate better......Klinghoffer was frustrated that ....insights came mostly from looking through survey results. She was convinced she could take the analytical approach further. After all, Microsoft was one of the biggest makers of email and calendar software — programs that produce a “digital exhaust” of metadata about how employees use their time. In September 2015, she advised Microsoft on the acquisition of a Seattle start-up, VoloMetrix, that could help it identify and act on the patterns in that vapor......One of VoloMetrix's foundational data sets, for example, was private emails sent by top Enron executives before the company’s 2001 collapse — a rich look at how an organization’s elite behave when they don’t think anyone is watching.
analytics  books  data  datasets  data_driven  exhaust_data  Fitbit  gut_feelings  human_resources  interpretative  Managing_Your_Career  massive_data_sets  meetings  metadata  Microsoft  Moneyball  organizational_analytics  organizing_data  people_analytics  quantitative  quantified_self  superstars  unhappiness  VoloMetrix  winner-take-all  work_life_balance 
june 2019 by jerryking
How 5 Data Dynamos Do Their Jobs
June 12, 2019 | The New York Times | By Lindsey Rogers Cook.
[Times Insider explains who we are and what we do, and delivers behind-the-scenes insights into how our journalism comes together.]
Reporters from across the newsroom describe the many ways in which they increasingly rely on datasets and spreadsheets to create groundbreaking work.

Data journalism is not new. It predates our biggest investigations of the last few decades. It predates computers. Indeed, reporters have used data to hold power to account for centuries, as a data-driven investigation that uncovered overspending by politicians, including then-congressman Abraham Lincoln, attests.

But the vast amount of data available now is new. The federal government’s data repository contains nearly 250,000 public datasets. New York City’s data portal contains more than 2,500. Millions more are collected by companies, tracked by think tanks and academics, and obtained by reporters through Freedom of Information Act requests (though not always without a battle). No matter where they come from, these datasets are largely more organized than ever before and more easily analyzed by our reporters.

(1) Karen Zraick, Express reporter.
NYC's Buildings Department said it was merely responding to a sudden spike in 311 complaints about store signs. But who complains about store signs?....it was hard to get a sense of the scale of the problem just by collecting anecdotes. So I turned to NYC Open Data, a vast trove of information that includes records about 311 complaints. By sorting and calculating the data, we learned that many of the calls were targeting stores in just a few Brooklyn neighborhoods.
(2) John Ismay, At War reporter
He has multiple spreadsheets for almost every article he works on......Spreadsheets helped him organize all the characters involved and the timeline of what happened as the situation went out of control 50 years ago......saves all the relevant location data he later used in Google Earth to analyze the terrain, which allowed him to ask more informed questions.
(3) Eliza Shapiro, education reporter for Metro
After she found out in March that only seven black students won seats at Stuyvesant, New York City’s most elite public high school, she kept coming back to one big question: How did this happen? I had a vague sense that the city’s so-called specialized schools once looked more like the rest of the city school system, which is mostly black and Hispanic.

With my colleague K.K. Rebecca Lai from The Times’s graphics department, I started to dig into a huge spreadsheet that listed the racial breakdown of each of the specialized schools dating to the mid-1970s.
analyzed changes in the city’s immigration patterns to better understand why some immigrant groups were overrepresented at the schools and others were underrepresented. We mapped out where the city’s accelerated academic programs are, and found that mostly black and Hispanic neighborhoods have lost them. And we tracked the rise of the local test preparation industry, which has exploded in part to meet the demand of parents eager to prepare their children for the specialized schools’ entrance exam.

To put a human face to the data points we gathered, I collected yearbooks from black and Hispanic alumni and spent hours on the phone with them, listening to their recollections of the schools in the 1970s through the 1990s. The final result was a data-driven article that combined Rebecca’s remarkable graphics, yearbook photos, and alumni reflections.

(4) Reed Abelson, Health and Science reporter
the most compelling stories take powerful anecdotes about patients and pair them with eye-opening data.....Being comfortable with data and spreadsheets allows me to ask better questions about researchers’ studies. Spreadsheets also provide a way of organizing sources, articles and research, as well as creating a timeline of events. By putting information in a spreadsheet, you can quickly access it, and share it with other reporters.

(5) Maggie Astor, Politics reporter
a political reporter dealing with more than 20 presidential candidates, she uses spreadsheets to track polling, fund-raising, policy positions and so much more. Without them, there’s just no way she could stay on top of such a huge field......The climate reporter Lisa Friedman and she used another spreadsheet to track the candidates’ positions on several climate policies.
311  5_W’s  behind-the-scenes  Communicating_&_Connecting  data  datasets  data_journalism  data_scientists  FOIA  groundbreaking  hidden  information_overload  information_sources  journalism  mapping  massive_data_sets  New_York_City  NYT  open_data  organizing_data  reporters  self-organization  systematic_approaches  spreadsheets  storytelling  timelines  tools 
june 2019 by jerryking
Past mistakes carry warnings for the future of work
May 21, 2019 | Financial Times | by SARAH O'CONNOR.

* Data can mislead unless combined with grittier insights on the power structures that underpin it.
* William Kempster, a master mason who worked on St Paul's Cathedral in the 18th century, left wage records that helped expose a flaw in our understanding of the past.

It is often said that we should learn from the mistakes of the past. But we can also learn from the mistakes we make about the past. Seemingly smooth data can mislead unless it is combined with a grittier insight into the structures, contracts and power relationships that underpin the numbers. On that score, economists and politicians who want to make sense of today’s labour market have an advantage over historians: it is happening right now, just outside their offices, in all its complexity and messiness. All they have to do is open the door
17th_century  18th_century  builders  contextual  data  datasets  developing_countries  economic_history  economists  freelancing  gig_economy  handwritten  historians  human_cloud_platforms  insights  labour_markets  London  messiness  mistakes  politicians  power_relations  power_structures  record-keeping  United_Kingdom  unstructured_data  wages  white-collar 
may 2019 by jerryking
JPMorgan Invests in Startup Tech That Analyzes Encrypted Data - CIO Journal. - WSJ
By Sara Castellanos
Nov 13, 2018
(possible assistance to Robert Lewis)

JPMorgan Chase & Co. has invested in a startup whose technology can analyze an encrypted dataset without revealing its contents, which could be “materially useful” for the company and its clients, said Samik Chandarana, head of data analytics for the Corporate and Investment Bank division.

The banking giant recently led a $10 million Series A funding round in the data security and analytics startup, Inpher Inc., headquartered in New York and Switzerland. JPMorgan could use the ‘secret computing’ technology to analyze a customer’s proprietary data on their behalf, using artificial intelligence algorithms without sacrificing privacy.......One of the technological methods Inpher uses, called fully homomorphic encryption, allows for computations to be conducted on encrypted data, said Jordan Brandt, co-founder and CEO of the company. It’s the ability to perform analytics and machine learning on cipher text, which is plain, readable text that has been encrypted using a specific algorithm, or a cipher, so that it becomes unintelligible.

Analyzing encrypted information without revealing any secret information is known as zero-knowledge computing and it means that organizations could share confidential information to gather more useful insights on larger datasets.
algorithms  artificial_intelligence  corporate_investors  datasets  encryption  JPMorgan_Chase  pooling  privacy  start_ups   synthetic_data  zero-knowledge 
november 2018 by jerryking
How Not to Drown in Numbers - NYTimes.com
MAY 2, 2015| NYT |By ALEX PEYSAKHOVICH and SETH STEPHENS-DAVIDOWITZ.

If you’re trying to build a self-driving car or detect whether a picture has a cat in it, big data is amazing. But here’s a secret: If you’re trying to make important decisions about your health, wealth or happiness, big data is not enough.

The problem is this: The things we can measure are never exactly what we care about. Just trying to get a single, easy-to-measure number higher and higher (or lower and lower) doesn’t actually help us make the right choice. For this reason, the key question isn’t “What did I measure?” but “What did I miss?”...So what can big data do to help us make big decisions? One of us, Alex, is a data scientist at Facebook. The other, Seth, is a former data scientist at Google. There is a special sauce necessary to making big data work: surveys and the judgment of humans — two seemingly old-fashioned approaches that we will call small data....For one thing, many teams ended up going overboard on data. It was easy to measure offense and pitching, so some organizations ended up underestimating the importance of defense, which is harder to measure. In fact, in his book “The Signal and the Noise,” Nate Silver of fivethirtyeight.com estimates that the Oakland A’s were giving up 8 to 10 wins per year in the mid-1990s because of their lousy defense.

And data-driven teams found out the hard way that scouts were actually important...We are optimists about the potential of data to improve human lives. But the world is incredibly complicated. No one data set, no matter how big, is going to tell us exactly what we need. The new mountains of blunt data sets make human creativity, judgment, intuition and expertise more valuable, not less.

==============================================
From Market Research: Safety Not Always in Numbers | Qualtrics ☑
Author: Qualtrics|July 28, 2010

Albert Einstein once said, “Not everything that can be counted counts, and not everything that counts can be counted.” [Warning of the danger of overquantification) Although many market research experts would say that quantitative research is the safest bet when one has limited resources, it can be dangerous to assume that it is always the best option.
human_ingenuity  data  analytics  small_data  massive_data_sets  data_driven  information_overload  dark_data  measurements  creativity  judgment  intuition  Nate_Silver  expertise  datasets  information_gaps  unknowns  underestimation  infoliteracy  overlooked_opportunities  sense-making  easy-to-measure  Albert_Einstein  special_sauce  metrics  overlooked  defensive_tactics  emotional_intelligence  EQ  soft_skills  overquantification  false_confidence 
may 2015 by jerryking
Spinning raw government datasets into gold - The Globe and Mail
IVOR TOSSELL
Special to The Globe and Mail
Published Monday, Feb. 02 2015
massive_data_sets  open_data  data  datasets 
february 2015 by jerryking
It’s a Whole New Data Game for Business - WSJ
Feb. 9, 2015 | WSJ |

opportunistic data collection is leading to entirely new kinds of data that aren’t well suited to the existing statistical and data-mining methodologies. So point number one is that you need to think hard about the questions that you have and about the way that the data were collected and build new statistical tools to answer those questions. Don’t expect the existing software package that you have is going to give you the tools you need....Point number two is having to deal with distributed data....What do you do when the data that you want to analyze are actually in different places?

There’s lots of clever solutions for doing that. But at some point, the volume of data’s going to outstrip the ability to do that. You’re forced to think about how you might, for example, reduce those data sets, so that they’re easier to move.
data  data_collection  datasets  data_mining  massive_data_sets  distributed_data  haystacks  questions  tools  unstructured_data 
february 2015 by jerryking
Want to kickstart the Canadian economy? Try "indovation", says U of T prof | U of T News
January 26, 2015 | U of T News | Terry Lavender.

Professor Dilip Soman heads up U of T's India Innovation Institute. He explains how necessity can be the mother of innovation. Indovation is a portmanteau of the words “Indian” and “innovation” and it means taking existing constraints – such as a shortage of funds or raw materials – into account when developing a response to actual problems.... “Frugality is at the essence of it,” Soman says. “In India, unless you can drive down costs, your idea is a non-starter.

“For example, mobile banking. That’s a classic ‘indovation’. It came about as a response to a particular problem, and it was developed in India and adopted in the west,” says Soman.

we’re working on developing a dataset on reverse innovation; the idea that innovations that have developed in the global south can be scaled back to the western world,” Soman says. “We have white papers on several topics: crowd-funding, agriculture, and retail and investment opportunities. The goal is to build up a database of information that both researchers as well as practitioners can use.”
constraints  innovation  India  Rotman  uToronto  trickle-up  frugality  necessity  reverse_innovation  jugaad  Bottom_of_the_Pyramid  datasets  Indians 
february 2015 by jerryking
Let me see
Posted by Seth Godin on July 08, 2008.

Passive contributions of public behaviour information to traditionally-sorted data
data  ideas  information  inspiration  Seth_Godin  social_data  datasets  open_data  social_physics  massive_data_sets  wisdom_of_crowds  thick_data  public_behavior  sorting  value_creation 
january 2015 by jerryking
Sponsor Generated Content: The State of the Data Economy
June 23, 2014

Where the Growth is
So for many companies right now, the core of the data economy is a small but growing segment—the information two billion-plus global Internet users create when they click "like" on a social media page or take action online. Digital customer tracking—the selling of “digital footprints” (the trail of information consumers leave behind each time they surf the Web)—is now a $3 billion segment, according to a May 2014 Outsell report. At the moment, that's tiny compared to the monetary value of traditional market research such as surveys, forecasting and trend analysis. But digital customer tracking "is where the excitement and growth is," says Giusto.

Real-time data that measures actions consumers are actually taking has more value than study results that rely on consumer opinions. Not surprising, businesses are willing to pay more for activity-based data.

Striking it Richer
Outsell Inc.'s analyst Chuck Richard notes that the specificity of data has a huge affect on its value. In days past, companies would sell names, phone numbers, and email addresses as sales leads. Now, data buyers have upped the ante. They want richer data—names of consumers whose current "buying intent" has been analyzed through behavioral analytics. Beyond the “who,” companies want the “what” and “when” of purchases, along with “how” best to engage with prospects.
"Some companies are getting a tenfold premium for data that is very focused and detailed," Richard says. "For example, if you had a list of all the heart specialists in one region, that’s worth a lot."

Tapping into New Veins
Moving forward, marketers will increasingly value datasets that they can identify, curate and exploit. New technology could increase the value of data by gleaning insights from unstructured data (video, email and other non-traditional data sources); crowdsourcing and social media could generate new types of shareable data; predictive modeling and machine learning could find new patterns in data, increasing the value of different types of data.

Given all this, the data economy is sure to keep growing, as companies tap into new veins of ever-richer and more-specific data.
data  data_driven  SAS  real-time  digital_footprints  OPMA  datasets  unstructured_data  data_marketplaces  value_creation  specificity  value_chains  intentionality  digital_economy  LBMA  behavioural_data  predictive_modeling  machine_learning  contextual  location_based_services  activity-based  consumer_behavior 
july 2014 by jerryking
The messy reality of open data and politics | Public Leaders Network
8 April 2013 | | Guardian Professional | Tim Davies, Guardian Professional.

In practice, datasets themselves are political objects, and policies to open up datasets are the product of politics. If you look beyond the binary fight over whether government data should be open or not, then you will find a far more subtle set of political questions over the what and the how of opening data.

Datasets are built from the categories and relationships that the database designer (or their political masters) decide are important. In their book, Sorting Things Out: Classification and its Consequences, Geoffrey Bowker and Susan Leigh Star describe how the international classification of disease, the basis for worldwide mortality statistics, has historically under-represented tropical diseases in its code lists. The result is that global health policy has been less able to see, distinguish and target certain conditions....Local authority spending data has never existed as a single dataset before – but a central edict that this should be published, itself a decision with a political edge, has generated new standards for representing local spend, that have to decide what sort of information about spend is important.

Should the data contain company identifiers to let us see which firms get public money? And should spend data be linked to results and categorisation of public services? These decisions can have big impacts on how data can be used, what it can tell us, and what impacts open data will have.
datasets  data  open_data  cities  municipalities  politics  political_campaigns  sorting  messiness 
december 2013 by jerryking
Bill Gates is naive, data is not objective | mathbabe
January 29, 2013 Cathy O'Neil,

Don’t be fooled by the mathematical imprimatur: behind every model and every data set is a political process that chose that data and built that model and defined success for that model.
billgates  naivete  data  Cathy_O’Neil  value_judgements  datasets  biases 
december 2013 by jerryking
How crowd-sourcing will spark a data revolution
March 22, 2012 |Globe and Mail Blog | by frances woolley.

Yet all of these initiatives are geared towards government data sets and professional researchers. Important private records – diaries of early settlers, for example – can find a home in Canada’s National Archives. But the Archives do not have sufficient resources to process and document records of snowdrops or goldfinches. Moreover, the Archives keep records, not data sets – it is fascinating to look at census records from 120 years ago, but they aren’t much use for statistical analysis.

There is a solution: crowd-sourcing. Across the country there are students, amateur and professional historians, policy analysts, bloggers and data nerds. I’m one of them. I’ve taken data collected by a notable Ottawa record keeper, Mr. Harry Thomson, and posted it on Worthwhile Canadian Initiative. Mr. Thomson’s records go back to the 1960s, long before Environment Canada began collecting comparable hydrometric data. An analysis of the data shows a significant decline in peak water levels during the spring flood – with this year being no exception.

Yet Worthwhile Canadian Initiative is just one blog in the vast expanse of the World Wide Web, and might not even be there in five or ten year’s time. We need a permanent site for all of this data, through which the collective power of the internet can be unleashed – editing, compiling, analyzing, telling stories and, above all, building understanding.
analog  archives  Canadian  cannabis  census  crowdsourcing  data  data_driven  datasets  massive_data_sets  nerds  open_data  record-keeping  Statistics_Canada  unstructured_data 
march 2012 by jerryking
World Bank Is Opening Its Treasure Chest of Data
July 2, 2011 | NYT|By STEPHANIE STROM. The World Bank’s
traditional role has been to finance specific projects that foster eco.
dvlpmnt,...it might come as a surprise that its president , Robert
Zoellick, argues that the most valuable currency of the WB isn’t its $—
it is its information. ...For > a yr, the WB has been releasing its
prized data sets, currently giving public access to more than 7,000 that
were previously available only to some 140,000 subscribers — mostly
govts & researchers, who paid for access. ...Those data sets contain
all sorts of info. about the developing world, whether workaday
economic stats — GDP, CPI & the like — or arcana like the # of women
are breast-feeding their children in rural Peru.

It is a trove unlike anything else in the world, and, it turns out,
highly valuable. For whatever its accuracy or biases, this data defines
the economic reality of billions of people and is used in making
policies & decisions that enormously impact their lives.
World_Bank  information_flows  data  databases  massive_data_sets  transparency  open_source  Robert_Zoellick  crowdsourcing  mashups  datasets  decision_making  policymaking  developing_countries 
july 2011 by jerryking
Mining of Raw Data May Bring New Productivity, a Study Says - NYTimes.com
May 13, 2011 | NYT | By STEVE LOHR.
(fresh produce) Data is a vital raw material of the information economy, much as coal
and iron ore were in the Industrial Revolution. But the business world
is just beginning to learn how to process it all. The current data surge
is coming from sophisticated computer tracking of shipments, sales,
suppliers and customers, as well as e-mail, Web traffic and social
network comments. ..Mining and analyzing these big new data sets can
open the door to a new wave of innovation, accelerating productivity and
economic growth. ..The next stage, they say, will exploit
Internet-scale data sets to discover new businesses and predict consumer
behavior and market shifts.
....The McKinsey Global Institute is publishing “Big Data: The Next
Frontier for Innovation, Competition and Productivity.” It makes
estimates of the potential benefits from deploying data-harvesting
technologies and skills.
massive_data_sets  Steve_Lohr  McKinsey  data  consumer_behavior  data_driven  data_mining  analytics  Freshbooks  digital_economy  fresh_produce  OPMA  Industrial_Revolution  datasets  new_businesses  productivity 
may 2011 by jerryking

related tags

5_W’s  17th_century  18th_century  activity-based  Albert_Einstein  algorithms  analog  analytics  archives  artificial_intelligence  behavioural_data  behind-the-scenes  biases  billgates  books  Bottom_of_the_Pyramid  builders  Canadian  cannabis  Cathy_O’Neil  census  cities  Communicating_&_Connecting  constraints  consumer_behavior  contextual  corporate_investors  creativity  crowdsourcing  dark_data  data  databases  datasets  data_collection  data_driven  data_journalism  data_marketplaces  data_mining  data_scientists  decision_making  defensive_tactics  developing_countries  digital_economy  digital_footprints  distributed_data  easy-to-measure  economic_history  economists  emotional_intelligence  encryption  EQ  exhaust_data  expertise  false_confidence  Fitbit  FOIA  freelancing  Freshbooks  fresh_produce  frugality  gig_economy  groundbreaking  gut_feelings  handwritten  haystacks  hidden  historians  howto  human_cloud_platforms  human_ingenuity  human_resources  ideas  India  Indians  Industrial_Revolution  infoliteracy  information  information_flows  information_gaps  information_overload  information_sources  innovation  insights  inspiration  intentionality  interpretative  intuition  journalism  JPMorgan_Chase  judgment  jugaad  labour_markets  LBMA  location_based_services  London  machine_learning  Managing_Your_Career  mapping  mashups  massive_data_sets  McKinsey  measurements  meetings  messiness  metadata  metrics  Microsoft  mistakes  Moneyball  municipalities  naivete  Nate_Silver  necessity  nerds  network_effects  new_businesses  New_York_City  NYT  open_data  open_source  OPMA  organizational_analytics  organizing_data  overlooked  overlooked_opportunities  overquantification  people_analytics  policymaking  political_campaigns  politicians  politics  pooling  power_relations  power_structures  predictive_modeling  privacy  productivity  public_behavior  quantified_self  quantitative  questions  real-time  record-keeping  reporters  reverse_innovation  Robert_Zoellick  Rotman  SAS  self-organization  sense-making  Seth_Godin  small_data  social_data  social_physics  soft_skills  sorting  special_sauce  specificity  spreadsheets  start_ups   Statistics_Canada  Steve_Lohr  storytelling  superstars  synthetic_data  systematic_approaches  thick_data  timelines  tools  transparency  trickle-up  underestimation  unhappiness  United_Kingdom  unknowns  unstructured_data  uToronto  value_chains  value_creation  value_judgements  VoloMetrix  wages  white-collar  winner-take-all  wisdom_of_crowds  work_life_balance  World_Bank  zero-knowledge 

Copy this bookmark:



description:


tags: