jerryking + data_scientists   106

Comments to How 5 Data Dynamos Do Their Jobs
I’d like someone to go through the tax data and find out what happened to all the accountants before and after Wang Spreadsheet, Lotus123, and Excel were released. What happened to their earnings, ...
data_scientists  letters_to_the_editor  organizing_data  storytelling  from notes
9 weeks ago by jerryking
How 5 Data Dynamos Do Their Jobs
June 12, 2019 | The New York Times | By Lindsey Rogers Cook.
[Times Insider explains who we are and what we do, and delivers behind-the-scenes insights into how our journalism comes together.]
Reporters from across the newsroom describe the many ways in which they increasingly rely on datasets and spreadsheets to create groundbreaking work.

Data journalism is not new. It predates our biggest investigations of the last few decades. It predates computers. Indeed, reporters have used data to hold power to account for centuries, as a data-driven investigation that uncovered overspending by politicians, including then-congressman Abraham Lincoln, attests.

But the vast amount of data available now is new. The federal government’s data repository contains nearly 250,000 public datasets. New York City’s data portal contains more than 2,500. Millions more are collected by companies, tracked by think tanks and academics, and obtained by reporters through Freedom of Information Act requests (though not always without a battle). No matter where they come from, these datasets are largely more organized than ever before and more easily analyzed by our reporters.

(1) Karen Zraick, Express reporter.
NYC's Buildings Department said it was merely responding to a sudden spike in 311 complaints about store signs. But who complains about store signs?....it was hard to get a sense of the scale of the problem just by collecting anecdotes. So I turned to NYC Open Data, a vast trove of information that includes records about 311 complaints. By sorting and calculating the data, we learned that many of the calls were targeting stores in just a few Brooklyn neighborhoods.
(2) John Ismay, At War reporter
He has multiple spreadsheets for almost every article he works on......Spreadsheets helped him organize all the characters involved and the timeline of what happened as the situation went out of control 50 years ago......saves all the relevant location data he later used in Google Earth to analyze the terrain, which allowed him to ask more informed questions.
(3) Eliza Shapiro, education reporter for Metro
After she found out in March that only seven black students won seats at Stuyvesant, New York City’s most elite public high school, she kept coming back to one big question: How did this happen? I had a vague sense that the city’s so-called specialized schools once looked more like the rest of the city school system, which is mostly black and Hispanic.

With my colleague K.K. Rebecca Lai from The Times’s graphics department, I started to dig into a huge spreadsheet that listed the racial breakdown of each of the specialized schools dating to the mid-1970s.
analyzed changes in the city’s immigration patterns to better understand why some immigrant groups were overrepresented at the schools and others were underrepresented. We mapped out where the city’s accelerated academic programs are, and found that mostly black and Hispanic neighborhoods have lost them. And we tracked the rise of the local test preparation industry, which has exploded in part to meet the demand of parents eager to prepare their children for the specialized schools’ entrance exam.

To put a human face to the data points we gathered, I collected yearbooks from black and Hispanic alumni and spent hours on the phone with them, listening to their recollections of the schools in the 1970s through the 1990s. The final result was a data-driven article that combined Rebecca’s remarkable graphics, yearbook photos, and alumni reflections.

(4) Reed Abelson, Health and Science reporter
the most compelling stories take powerful anecdotes about patients and pair them with eye-opening data.....Being comfortable with data and spreadsheets allows me to ask better questions about researchers’ studies. Spreadsheets also provide a way of organizing sources, articles and research, as well as creating a timeline of events. By putting information in a spreadsheet, you can quickly access it, and share it with other reporters.

(5) Maggie Astor, Politics reporter
a political reporter dealing with more than 20 presidential candidates, she uses spreadsheets to track polling, fund-raising, policy positions and so much more. Without them, there’s just no way she could stay on top of such a huge field......The climate reporter Lisa Friedman and she used another spreadsheet to track the candidates’ positions on several climate policies.
311  5_W’s  behind-the-scenes  Communicating_&_Connecting  data  datasets  data_journalism  data_scientists  FOIA  groundbreaking  hidden  information_overload  information_sources  journalism  mapping  massive_data_sets  New_York_City  NYT  open_data  organizing_data  reporters  self-organization  systematic_approaches  spreadsheets  storytelling  timelines  tools 
9 weeks ago by jerryking
The Art of Statistics by David Spiegelhalter
May 6, 2019 | Financial Times | Review by Alan Smith.

The Art of Statistics, by Sir David Spiegelhalter, former president of the UK’s Royal Statistical Society and current Winton professor of the public understanding of risk at the University of Cambridge.

The comparison with Rosling is easy to make, not least because Spiegelhalter is humorously critical of his own field which, by his reckoning, has spent too much time arguing with itself over “the mechanical application of a bag of statistical tools, many named after eccentric and argumentative statisticians”.

His latest book, its title,
books  book_reviews  charts  Communicating_&_Connecting  data  data_journalism  data_scientists  Hans_Rosling  listening  massive_data_sets  mathematics  statistics  visualization 
may 2019 by jerryking
Tyson Made Its Fortune Packing Meat. Now It Wants to Sell You Frittatas.
Feb. 13, 2019 | WSJ | By Jacob Bunge

Tyson’s strategy is to transform the 84-year-old meatpacking giant into a modern food company selling branded consumer goods on par with Kraft Heinz Co. or Coca-Cola Co.
.....Tyson wants to be big in more-profitable prepared and packaged foods to distance itself from the traditional meat business’s boom-and-bust cycles. America’s biggest supplier of meat wants to also be known for selling packaged foods........How’s the transformation going? Amid an historic meat glut, the company’s shares are worth $4.9 billion less than they were a year ago—and are still valued like those of a meatpacker pumping out shrink-wrapped packs of pork chops and chicken breasts....Investors say the initiatives aren’t yet enough to counteract the steep challenges facing the poultry and livestock slaughtering and processing operations that have been the company’s core since....1935.....Record red meat and poultry production nationwide is pushing down prices and eroding Tyson’s meat-processing profit margins. Tariffs and trade barriers to U.S. meat have further dented prices and built up backlogs, while transport and labor costs have climbed. .......The packaged-foods business is itself struggling with consumers gravitating toward nimbler upstart brands and demanding natural ingredients and healthier recipes........Tyson's acquisition of Hillside triggered changes, including the onboarding of executives attuned to consumer trends. Tyson added managers from Fortune 100 companies, including Boeing Co. and HP Inc., who replaced some meat-processing officials who led Tyson for decades. The newcomers brought experience managing brands, understanding consumers, developing new products and building new technology tools, areas Tyson deemed central to its future......A chief sustainability officer, a newly created position, began working to shift Tyson’s image among environmental groups, .....Shifting consumer tastes have created hurdles for other packaged-food giants, such as Campbell Soup Co. and Kellogg Co. .... the meat business remains Tyson’s biggest challenge. In 2018 a flood of cheap beef, fueled by enlarged cattle herds, spurred a summer of “burger wars,” meat industry officials said. .......investment in brands and packaged foods hasn’t insulated Tyson’s business from these commodity-market swings. ........The company is also trying to improve its ability for forecast meat demand..........developing artificial intelligence to help Tyson better predict the future.........Scott Spradley, who left HP in 2017 to become Tyson’s CTO, said company data scientists are crunching numbers on major U.S. metropolitan areas. By analyzing historic meat consumption alongside demographic shifts, the number of residents moving in and out, and the frequency of birthdays and baseball games, Mr. Spradley said Tyson is building computer models that will help plan production and sales for its meat business. The effort aims to find patterns in data that Tyson’s human economists and current projections might not see. ......Deep data dives helped steer Tyson toward what executives say will be one of its biggest new product launches: plant-based replacements for traditional meat,
Big_Food  brands  Coca-Cola  CPG  cured_and_smoked  data_scientists  Kraft_Heinz  meat  new_products  plant-based  prepared_meals  reinvention  shifting_tastes  stockpiles  strategy  sustainability  tariffs  Tyson  predictive_modeling 
february 2019 by jerryking
Hedge funds fight back against tech in the war for talent
August 3, 2018 | | Financial Times | by Lindsay Fortado in London.

Like other industries competing for the top computer science talent, hedge funds are projecting an image that appeals to a new generation. The development is forcing a traditionally secretive industry into an unusual position: having to promote itself, and become cool.

The office revamp is all part of that plan, as hedge funds vie with technology companies for recruits who have expertise in machine learning, artificial intelligence and big data analytics, many of whom are garnering salaries of $150,000 or more straight out of university.

“A lot have gone down the Google route to offer more perks,” said Mr Roussanov, who works for the recruitment firm Selby Jennings in New York. “They’re trying to rebrand themselves as tech firms.”...While quantitative investing funds, which trade using computer algorithms, have been on the forefront of hiring these types of candidates, other hedge funds that rely on humans to make trading decisions are increasingly upping their quantitative capabilities in order to analyze reams of data faster.

The casual work atmosphere and flexible hours at tech firms such as Google have long been a strong draw, and hedge funds are making an effort to 'rebrand themselves' Besides the increasing amount of perks funds are trying to offer, like revamping their workplace and offering services such as free dry-cleaning, they are emphasizing the amount of money they are willing to spend on technology and the complexity of the problems in financial markets to entice recruits.

“The pitch is . . . this is a very data-rich environment, and it’s a phenomenally well-resourced environment,” said Matthew Granade, the chief market intelligence officer at Point72, Steve Cohen’s $13bn hedge fund.

For the people Mr Granade calls “data learning, quant types”, the harder the problem, the better. “The benefit for us is that the markets are one of the hardest problems in the world. You think you’ve found a solution and then everyone else catches up. The markets are always adapting. So you are constantly being presented with new challenges, and the problem is constantly getting harder.”
hedge_funds  recruiting  uWaterloo  war_for_talent  millennials  finance  perks  quantitative  hard_questions  new_graduates  data_scientists 
august 2018 by jerryking
Commodity trading enters the age of digitisation
July 9, 2018 | Financial Times | by Emiko Terazono.

Commodity houses are on the hunt for data experts to help them gain an edge after seeing their margins squeezed by rivals......commodity traders are seeking ways of exploiting their information to help them profit from price swings.

“It is really a combination of knowing what to look for and using the right mathematical tools for it,” ........“We want to be able to extract data and put it into algorithms,” .......“We then plan to move on to machine learning in order to improve decision-making in trading and, as a result, our profitability.” The French trading arm is investing in people, processes and systems to centralize its data — and it is not alone.

“Everybody [in the commodity world] is waking up to the fact that the age of digitisation is upon us,” said Damian Stewart at headhunters Human Capital.

In an industry where traders with proprietary knowledge, from outages at west African oilfields to crop conditions in Russia, vied to gain an upper hand over rivals, the democratisation of information over the past two decades has been a challenge......the ABCDs — Archer Daniels Midland, Bunge, Cargill and Louis Dreyfus Company — all recording single-digit ROE in their latest results. As a consequence, an increasing number of traders are hoping to increase their competitiveness by feeding computer programs with mountains of information they have accumulated from years of trading physical raw materials to try and detect patterns that could form the basis for trading ideas.......Despite this new enthusiasm, the road to electronification may not come easily for some traders. Compared to other financial and industrial sectors, “they are coming from way behind,” said one consultant.

One issue is that some of the larger commodities traders face internal resistance in centralising information on one platform.

With each desk in a trading house in charge of its profit-and-loss account, data are closely guarded even from colleagues, said Antti Belt, head of digital commodity trading at Boston Consulting Group. “The move to ‘share all our data with each other’ is a very, very big cultural shift,” he added.

Another problem is that in some trading houses, staff operate on multiple technology platforms, with different units using separate systems.

Rather than focusing on analytics, some data scientists and engineers are having to focus on harmonising the platforms before bringing on the data from different parts of the company.
ADM  agribusiness  agriculture  algorithms  artificial_intelligence  Bunge  Cargill  commodities  data_scientists  digitalization  machine_learning  traders  food_crops  Louis_Dreyfus  grains  informational_advantages 
july 2018 by jerryking
The quant factories producing the fund managers of tomorrow
Jennifer Thompson in London JUNE 2, 2018

The wealth of nations and individuals is ever more likely to be influenced by computer algorithms as investors look to computer-powered quantitative trading strategies to generate returns. But underpinning those machines and algorithms are real people, namely the world’s sharpest mathematicians and data scientists.

Though not hard to identify, virtually every industry — and especially Big Tech — is competing with the financial world for their skills....Competition for talent means the campuses of elite universities have become a favoured hunting ground for many groups, and that the very best students and early career academics can command staggering starting salaries should they join the investment world......The links asset managers foster with universities vary. In the UK, Oxford and Cambridge are home to dedicated institutes established and funded by investment managers. Although these were set up with a genuine desire to foster research in the field, with a nod to philanthropy, they are also proving to be an effective way to spotting future talent.

Connections between hedge funds and investment managers are less formalised on US campuses but are treated with no less importance.

Personal relationships are important,
mathematics  data_scientists  quants  quantitative  hedge_funds  algorithms  war_for_talent  asset_management  PhDs  WorldQuant  Big_Tech 
june 2018 by jerryking
A new boss for McKinsey - Firm direction
Mar 1st 2018

On February 25th the result of a long election process was made public. Kevin Sneader, the Scottish chairman of McKinsey’s Asia unit, will replace Dominic Barton as managing partner—the top job. He inherits a thriving business. The firm remains by far the biggest of the premium consultancies (see table). Over the past decade, annual revenues have doubled to $10bn; so too has the size of the partnership, to more than 2,000......Mr Barton claims that half of what it does today falls within capabilities that did not exist five years ago. It is working to ensure that customers turn to McKinseyites for help with all things digital. It has had to make acquisitions in some areas: recent purchases include QuantumBlack, an advanced-analytics firm in London, and LUNAR, a Silicon-Valley design company. It is increasingly recruiting outside the usual business schools to bring in seasoned data scientists and software developers.....McKinsey has kept plenty of older ones as clients, such as Hewlett Packard, but it has a lot more to do to crack new tech giants and unicorns (private startups worth more than $1bn). ....McKinsey’s response is to try to gain a foothold earlier on in tech firms’ life-cycles. It is targeting medium-sized companies, which would not have been able to afford its fees, by offering shorter projects with smaller “startup-sized” teams
appointments  CEOs  data_scientists  management_consulting  McKinsey  mergers_&_acquisitions  SMEs  software_developers 
march 2018 by jerryking
America’s intelligence agencies find creative ways to compete for talent - Spooks for hire
March 1, 2018 | Economist |

AMERICA’S intelligence agencies are struggling to attract and retain talent. Leon Panetta, a former Pentagon and CIA boss, says this is “a developing crisis”......The squeeze is tightest in cyber-security, programming, engineering and data science.....Until the agencies solve this problem, he says, they will fall short in their mission or end up paying more for expertise from contractors. By one estimate, contractors provide a third of the intelligence community’s workforce.....Part of the problem is the demand in the private sector for skills that used to be needed almost exclusively by government agencies, says Robert Cardillo, head of the National Geospatial-Intelligence Agency (NGA). To hire people for geospatial data analysis, he must now compete with firms like Fitbit, a maker of activity-measurement gadgets. .....The NGA now encourages certain staff to work temporarily for private firms while continuing to draw a government salary. After six months or a year, they return, bringing “invaluable” skills to the NGA, Mr Cardillo says. Firms return the favour by quietly lending the NGA experts in app development and database security. .....
war_for_talent  talent  data_scientists  CIA  security_&_intelligence  cyber_security  Leon_Panetta  SecDef  Pentagon  geospatial 
march 2018 by jerryking
Novartis’s new chief sets sights on ‘productivity revolution’
SEPTEMBER 25, 2017 | Financial Times | Sarah Neville and Ralph Atkins.

The incoming chief executive of Novartis, Vas Narasimhan, has vowed to slash drug development costs, eyeing savings of up to 25 per cent on multibillion-dollar clinical trials as part of a “productivity revolution” at the Swiss drugmaker.

The time and cost of taking a medicine from discovery to market has long been seen as the biggest drag on the pharmaceutical industry’s performance, with the process typically taking up to 14 years and costing at least $2.5bn.

In his first interview as CEO-designate, Dr Narasimhan says analysts have estimated between 10 and 25 per cent could be cut from the cost of trials if digital technology were used to carry them out more efficiently. The company has 200 drug development projects under way and is running 500 trials, so “that will have a big effect if we can do it at scale”.......Dr Narasimhan plans to partner with, or acquire, artificial intelligence and data analytics companies, to supplement Novartis’s strong but “scattered” data science capability.....“I really think of our future as a medicines and data science company, centred on innovation and access.”

He must now decide where Novartis has the capability “to really create unique value . . . and where is the adjacency too far?”.....Does he need the cash pile that would be generated by selling off these parts of the business to realise his big data vision? He says: “Right now, on data science, I feel like it’s much more about building a culture and a talent base . . . ...Novartis has “a huge database of prior clinical trials and we know exactly where we have been successful in terms of centres around the world recruiting certain types of patients, and we’re able to now use advanced analytics to help us better predict where to go . . . to find specific types of patients.

“We’re finding that we’re able to significantly reduce the amount of time that it takes to execute a clinical trial and that’s huge . . . You could take huge cost out.”...Dr Narasimhan cites one inspiration as a visit to Disney World with his young children where he saw how efficiently people were moved around the park, constantly monitored by “an army of [Massachusetts Institute of Technology-]trained data scientists”.
He has now harnessed similar technology to overhaul the way Novartis conducts its global drug trials. His clinical operations teams no longer rely on Excel spreadsheets and PowerPoint slides, but instead “bring up a screen that has a predictive algorithm that in real time is recalculating what is the likelihood our trials enrol, what is the quality of our clinical trials”.

“For our industry I think this is pretty far ahead,” he adds.

More broadly, he is realistic about the likely attrition rate. “We will fail at many of these experiments, but if we hit on a couple of big ones that are transformative, I think you can see a step change in productivity.”
algorithms  analytics  artificial_intelligence  attrition_rates  CEOs  data_driven  data_scientists  drug_development  failure  Indian-Americans  multiple_targets  Novartis  pharmaceutical_industry  predictive_analytics  productivity  productivity_payoffs  product_development  real-time  scaling  spreadsheets 
november 2017 by jerryking
The Ivory Tower Can’t Keep Ignoring Tech
NOV. 14, 2017 | The New York Times | By Cathy O’Neil is a data scientist and author of the book “Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy. Follow her on Twitter at @mathbabedotorg.

We urgently need an academic institute focused on algorithmic accountability.

First, it should provide a comprehensive ethical training for future engineers and data scientists at the undergraduate and graduate levels, with case studies taken from real-world algorithms that are choosing the winners from the losers. Lecturers from humanities, social sciences and philosophy departments should weigh in.

Second, this academic institute should offer a series of workshops, conferences and clinics focused on the intersection of different industries with the world of A.I. and algorithms. These should include experts in the content areas, lawyers, policymakers, ethicists, journalists and data scientists, and they should be tasked with poking holes in our current regulatory framework — and imagine a more relevant one.

Third, the institute should convene a committee charged with reimagining the standards and ethics of human experimentation in the age of big data, in ways that can be adopted by the tech industry.

There’s a lot at stake when it comes to the growing role of algorithms in our lives. The good news is that a lot could be explained and clarified by professional and uncompromised thinkers who are protected within the walls of academia with freedom of academic inquiry and expression. If only they would scrutinize the big tech firms rather than stand by waiting to be hired.
algorithms  accountability  Cathy_O’Neil  Colleges_&_Universities  data_scientists  ethics  inequality  think_tanks  Big_Tech 
november 2017 by jerryking
Art market ripe for disruption by algorithms
MAY 26, 2017 | Financial Times | by John Dizard.

Art consultants and dealers are convinced that theirs is a high-touch, rather than a high-tech business, and they have arcane skills that are difficult, if not impossible, to replicate..... better-informed collectors [are musing about] how to compress those transaction costs and get that price discovery done more efficiently.....The art world already has transaction databases and competing price indices. The databases tend to be incomplete, since a high proportion of fine art objects are sold privately rather than at public auctions. The price indices also have their issues, given the (arguably) unique nature of the objects being traded. Sotheby’s Mei Moses index attempts to get around that by compiling repeat-sales data, which, given the slow turnover of particular works of art, is challenging.....Other indices, or value estimations, are based on hedonic regression, which is less amusing than it sounds. It is a form of linear regression used, in this case, to determine the weight of different components in the pricing of a work of art, such as the artist’s name, the work’s size, the year of creation and so on. Those weights in turn are used to create time-series data to describe “the art market”. It is better than nothing, but not quite enough to replace the auctioneers and dealers.....the algos are already on the hunt....people are watching the auctions and art fairs and doing empirics....gathering data at a very micro level, looking for patterns, just to gather information on the process.....the art world and its auction markets are increasingly intriguing to applied mathematicians and computer scientists. Recognising, let alone analysing, a work of art is a conceptually and computationally challenging problem. But computing power is very cheap now, which makes it easier to try new methods.....Computer scientists have been scanning, or “crawling”, published art catalogues and art reviews to create semantic data for art works based on natural-language descriptions. As one 2015 Polish paper says, “well-structured data may pave the way towards usage of methods from graph theory, topic labelling, or even employment of machine learning”.

Machine-learning techniques, such as software programs for deep recurrent neural networks, have already been used to analyse and predict other auction processes.
algorithms  disruption  art  art_finance  auctions  collectors  linear_regression  data_scientists  machine_learning  Sotheby’s  high-touch  pricing  quantitative  analytics  arcane_knowledge  art_market 
june 2017 by jerryking
Seven Tips for Hiring Great Data-Analytics People - The Experts - WSJ
By TOM GIMBEL
May 16, 2017

1. Check references. References may sound basic, but they are crucial.
2. Actual examples. Regardless of their previous role, have them share an example of how they’ve analyzed data in the past. Ask for both the written and oral presentation. You want the person who actually did the heavy lifting, versus the person who only interpreted the information.
3. Take-home projects. Give your candidates a case study to take home and analyze.
4. On-the-spot tests. The best way to tell in real time whether or not a candidate is good at analyzing data is to present them with a data set during the interview and have them share how they would go about drawing conclusions.
5.Challenge the status quo. Talk to the candidate about a flawed process, or something you did that went wrong. Do they challenge or push back on why you went about it a certain way, or suggest a different way?
6. Storytelling. If when explaining a project they worked on, candidates claim to have reduced or increased key metrics, ask why they thought it was successful and what downstream impact it had on the business.
7. Insightfulness. Regardless of the project, whether it was an in-person analysis or report from a take-home assignment, have them walk through how they got to each step. What was their thought process, and are they able to expand on how it would impact business?
data_scientists  hiring  howto  tips  reference-checking  references  storytelling  insights 
may 2017 by jerryking
Building an Empire on Event Data – The Event Log
Michelle WetzlerFollow
Chief Data Scientist @keen_io
Mar 31

Facebook, Google, Amazon, and Netflix have built their businesses on event data. They’ve invested hundreds of millions behind data scientists and engineers, all to help them get to a deep understanding and analysis of the actions their users or customers take, to inform decisions all across their businesses.
Other companies hoping to compete in a space where event data is crucial to their success must find a way to mirror the capabilities of the market leaders with far fewer resources. They’re starting to do that with event data platforms like Keen IO.
What does “Event Data” mean?
Event data isn’t like its older counterpart, entity data, which describes objects and is stored in tables. Event data describes actions, and its structure allows many rich attributes to be recorded about the state of something at a particular point in time.
Every time someone loads a webpage, clicks an ad, pauses a song, updates a profile, or even takes a step into a retail location, their actions can be tracked and analyzed. These events span so many channels and so many types of interactions that they paint an extremely detailed picture of what captivates customers.
data  data_driven  massive_data_sets  data_scientists  event-driven  events  strategy  engineering  Facebook  Google  Amazon  Netflix 
april 2017 by jerryking
With 125 Ph.D.s in 15 Countries, a Quant ‘Alpha Factory’ Hunts for Investing Edge - WSJ
By BRADLEY HOPE
Updated April 6, 2017

The firm is part of the forefront of a new quantitative renaissance in investing, where the ability to make sense of billions of bits of data in real time is more sought after than old-school financial analysis.

“Brilliance is very equally distributed across the world, but opportunity is not,” said Mr. Tulchinsky, a 50-year-old Belarusian. “We provide the opportunity.”

To do this, WorldQuant developed a model where it employs hundreds of scientists, including 125 Ph.D.s, around the world and hundreds more part-time workers to scour the noise of the economy and markets for hidden patterns. This is the heart of the firm. Mr. Tulchinsky calls it the “Alpha Factory.”....Quantitative hedge funds have been around for decades but they are becoming dominant players in the markets for their ability to parse massive data sets and trade rapidly. Amid huge outflows, traditional hedge funds are bringing aboard chief data scientists and trying to mimic quant techniques to keep up, fund executives say.

Some critics of quants believe their strategies are overhyped and are highly susceptible to finding false patterns in the noise of data. David Leinweber, a data scientist, famously found that the data set with the highest correlation with the S&P 500 over a 10-year period in the 1990s was butter production in Bangladesh.
quantitative  Wall_Street  PhDs  alpha  investors  slight_edge  massive_data_sets  signals  noise  data_scientists  real-time  algorithms  patterns  sense-making  quants  unevenly_distributed  WorldQuant 
april 2017 by jerryking
Advice for Data Scientists on Where to Work | Stitch Fix Technology – Multithreaded
It's a good time to be a data scientist. If you have the skills, experience, curiosity and passion, there is a vast and receptive market of companies to choose from. Yet there is much to consider when evaluating a prospective firm as a place to apply your talents. Even veterans may not have had the opportunity to experience different organizations, stages of maturity, cultures, technologies, or domains. We are amalgamating our combined experience here to offer some advice - three things to look for in a company that could make it a great place to work.

Work for a Company that Leverages Data Science for its Strategic Differentiation

Companies employ various means of differentiation in order to gain a competitive advantage in the market. Some differentiate themselves using price, striving to be the low-price leader. Others differentiate by product, providing an offering that is superior in some way. Still others differentiate by their processes - for example providing faster shipping.

A Data Scientist should look for a company that actually uses data science to set themselves apart from the competition. Note that data science may be supportive of lower prices, better products, and faster shipping, however, it is not typically the direct enabler of these differentiators. More commonly, the enablers are other things - economies of scale in the case of lower prices, patents or branding in the case of product, and automation technology in the case of faster shipping. Data science can directly enable a strategic differentiator if the company's core competency depends on its data and analytic capabilities. When this happens, the company becomes supportive to data science instead of the other way around. It's willing to invest in acquiring the top talent, building the necessary infrastructure, pioneering the latest algorithmic and computational techniques, and building incredible engineering products to manifest the data science.

"Good enough" is not a phrase that is uttered in the context of a strategic differentiator. Rather, the company and the data scientist have every incentive to push the envelope, to innovate further, and to take more risks. The company's aspirations are squarely in-line with that of the data scientist's. It's an amazing intersection to be at – a place that gets you excited to wake up to every morning, a place that stretches you, a place that inspires you (and supports you) to be the best in the world at what you do.

Work for a Company with Great Data

In determining what will be a great company to work for, data-science-as-a-strategic-differentiator is a necessary criteria, but it is not sufficient. The company must also have world-class data to work with.

This starts with finding a company that really has data. Spotting the difference between data and aspirations of data can be especially important in evaluating early-stage companies. Ideally you'll find a company that already has enough data to do interesting things. Almost all companies will generate more data as they grow, but if you join a company that already has data your potential for impact and fulfillment will be much higher.

Next look for data that is both interesting and that has explanatory power. One of the most important aspects of your daily life will be the extent to which you find the data you work with compelling. Interesting data should require your creativity to frame problems, test your intuition and push you to develop new algorithms and applications. Explanatory power is just as important - great data enables great applications. There should be enough signal to support data science as a differentiating strength.

Finally, don't fixate on big data. The rising prominence of the data scientist has coincided with the rise of Big Data, but they are not the same thing. Sheer scale does not necessarily make data interesting, nor is it necessarily required. Look for data with high information density rather than high volume, and that supports applications you find interesting or surprising. This enables you to spend most of your mental energy on analysis and framing rather than on efficient data processing.

Work for a Company with Greenfield Opportunities

When evaluating opportunities, find a company that doesn't have it all figured out yet. Nearly all companies that fit the criteria in the sections above will already have some applications in place where the work of data scientists is essential. Look for those companies that have a strong direction and strongly established data science teams, but have an array of problems they are solving for the first time.

Often the most exciting and impactful opportunities for data scientists at a company are not being actively pursued. They probably have not even been conceived of yet. Work somewhere that encourages you to take risks, challenge basic assumptions, and imagine new possibilities.

Observing the relationship between engineering and data science teams is a quick way to determine if an organization adopts this mindset. Is engineering enthusiastic to partner with data science teams to experiment and integrate ideas back into the business? Is there an architecture in place that supports agile integration of new ideas and technologies? In fact, in companies that embody this mindset most effectively, it is likely difficult to locate the boundary between data science and engineering teams.

A greenfield can be intimidating in its lack of structure, but the amount of creativity and freedom available to you as a data scientist is never greater than when you're starting from scratch. The impact of putting something in place where nothing existed previously can be immeasurable. Look for chances to be involved in designing not just the math and science, but also the pipeline, the API, and the tech stack. Not only is creating something new often more challenging and rewarding, but there is no better opportunity for learning and growth than designing something from the ground up.

Incremental improvements have incremental impacts, but embrace the chance to operate on a greenfield. While it is extremely important to constantly iterate and improve on systems that already exist, the Version 1 of something new can fundamentally change the business.

Summary

Of course, there are other considerations: domain, the company's brand, the specific technology in use, the culture, the people, and so forth. All of those are equally important. We call out the three above since they are less frequently talked about, yet fundamental to a data scientist's growth, impact, and happiness. They are also less obvious. We learned these things from experience. At first glance, you would not expect to find these things in a women's apparel company. However, our very different business model places a huge emphasis on data science, enables some of the richest data in the world, and creates space for a whole new suite of innovative software.
career  strategy  via:enochko  economies_of_scale  data_scientists  job_search  Managing_Your_Career  greenfields  data  differentiation  good_enough  information_density  product_pipelines  think_threes 
september 2016 by jerryking
Gearing Up for the Cloud, AT&T Tells Its Workers: Adapt, or Else - The New York Times
FEB. 13, 2016| NYT | By QUENTIN HARDY.

For the company to survive in this environment, Mr. Stephenson needs to retrain its 280,000 employees so they can improve their coding skills, or learn them, and make quick business decisions based on a fire hose of data coming into the company.....Learn new skills or find your career choices are very limited.

“There is a need to retool yourself, and you should not expect to stop,”....People who do not spend five to 10 hours a week in online learning, he added, “will obsolete themselves with the technology.” .......By 2020, Mr. Stephenson hopes AT&T will be well into its transformation into a computing company that manages all sorts of digital things: phones, satellite television and huge volumes of data, all sorted through software managed in the cloud.

That can’t happen unless at least some of his work force is retrained to deal with the technology. It’s not a young group: The average tenure at AT&T is 12 years, or 22 years if you don’t count the people working in call centers. And many employees don’t have experience writing open-source software or casually analyzing terabytes of customer data. .......By 2020, Mr. Stephenson hopes AT&T will be well into its transformation into a computing company that manages all sorts of digital things: phones, satellite television and huge volumes of data, all sorted through software managed in the cloud.

.......“Everybody is going to go face to face with a Google, an Amazon, a Netflix,” he said. “You compete based on data, and based on customer insights you get with their permission. If we’re wrong, it won’t play well for anyone here.
Quentin_Hardy  AT&T  cloud_computing  data  retraining  reinvention  skills  self-education  virtualization  data_scientists  new_products  online_training  e-learning  customer_insights  Google  Amazon  Netflix  data_driven 
february 2016 by jerryking
The Value of Bad Data - The Experts - WSJ
Apr 22, 2015 | WSJ | by Alexandra Samuel--technology researcher and the author of “Work Smarter with Social Media.”
*** Can I apply the idea of negative space towards evolving a dataset?

What do you do when you don’t have access to a large data set?...even without access to big data, you can still use some of the tools of data-driven decision-making to make all the other choices that arise in your day-to-day work.

Adopting and adapting the tools of quantitative analysis is crucial, because we often face decisions that can’t be guided by a large data set. Maybe you’re the founder of a small company, and you don’t yet have enough customers or transactions to provide a statistically significant sample size. Perhaps you’re working on a challenge for which you have no common data set, like evaluating the performance of different employees whose work has been tracked in different ways. Or maybe you’re facing a problem that feels like it can’t be quantified, like assessing the fit between your services and the needs of different potential clients.

None of these scenarios offers you the kind of big data that would make a data scientist happy. But you can still dig into your data scientist’s toolbox, and use a quasi-quantitative approach to get some of the benefits of statistical analysis… even in the absence of statistically valid data.
massive_data_sets  data  data_driven  small_business  data_scientists  books  hustle  statistics  quantitative  small_data  data_quality 
july 2015 by jerryking
The Sensor-Rich, Data-Scooping Future - NYTimes.com
APRIL 26, 2015 | NYT | By QUENTIN HARDY.

Sensor-rich lights, to be found eventually in offices and homes, are for a company that will sell knowledge of behavior as much as physical objects....The Internet will be almost fused with the physical world. The way Google now looks at online clicks to figure out what ad to next put in front of you will become the way companies gain once-hidden insights into the patterns of nature and society.

G.E., Google and others expect that knowing and manipulating these patterns is the heart of a new era of global efficiency, centered on machines that learn and predict what is likely to happen next.

“The core thing Google is doing is machine learning,” Eric Schmidt....The great data science companies of our sensor-packed world will have experts in arcane reaches of statistics, computer science, networking, visualization and database systems, among other fields. Graduates in those areas are already in high demand.

Nor is data analysis just a question of computing skills; data access is also critically important. As a general rule, the larger and richer a data set a company has, the better its predictions become. ....an emerging area of computer analysis known as “deep learning” will blow away older fields.

While both Facebook and Google have snapped up deep-learning specialists, Mr. Howard said, “they have far too much invested in traditional computing paradigms. They are the equivalent of Kodak in photography.” Echoing Mr. Chui’s point about specialization, he said he thought the new methods demanded understanding of specific fields to work well.

It is of course possible that both things are true: Big companies like Google and Amazon will have lots of commodity data analysis, and specialists will find niches. That means for most of us, the answer to the future will be in knowing how to ask the right kinds of questions.
sensors  GE  GE_Capital  Quentin_Hardy  data  data_driven  data_scientists  massive_data_sets  machine_learning  automated_reasoning  predictions  predictive_analytics  predictive_modeling  layer_mastery  core_competencies  Enlitic  deep_learning  niches  patterns  analog  insights  latent  hidden  questions  Google  Amazon  aftermath  physical_world  specialization  consumer_behavior  cyberphysical  arcane_knowledge  artificial_intelligence  test_beds 
april 2015 by jerryking
The lost art of political persuasion - The Globe and Mail
KONRAD YAKABUSKI
The Globe and Mail
Published Saturday, Apr. 25 2015

Talking points are hardly a 21st century political innovation. But they have so crowded out every other form of discourse that politics is now utterly devoid of honesty, unless it’s the result of human error. The candidates are still human, we think, though the techies now running campaigns are no doubt working on ways to remove that bug from their programs.

Intuition, ideas and passion used to matter in politics. Now, data analytics aims to turn all politicians into robots, programmed to deliver a script that has been scientifically tested...The data analysts have algorithms that tell them just what words resonate with just what voters and will coax them to donate, volunteer and vote.

Politics is no longer about the art of persuasion or about having an honest debate about what’s best for your country, province or city. It’s about microtargeting individuals who’ve already demonstrated by their Facebook posts or responses to telephone surveys that they are suggestible. Voters are data points to be manipulated, not citizens to be cultivated....Campaign strategists euphemistically refer to this data collection and microtargeting as “grassroots engagement” or “having one-on-one conversations” with voters....The data analysts on the 2012 Obama campaign came up with “scores” for each voter in its database, or what author Sasha Issenberg called “a new political currency that predicted the behaviour of individual humans.
Konrad_Yakabuski  persuasion  middle_class  politicians  massive_data_sets  political_campaigns  data_scientists  data_driven  data_mining  microtargeting  behavioural_targeting  politics  data  analytics  Campaign_2012 
april 2015 by jerryking
The Evolving Automotive Ecosystem - The CIO Report - WSJ
April 6, 2015| WSJ | By IRVING WLADAWSKY-BERGER.

An issue in many other industries. Will the legacy industry leaders be able to embrace the new digital technologies, processes and culture, or will they inevitably fall behind their faster moving, more culturally adept digital-native competitors? [the great game]

(1) Find new partners and dance: “The structure of the automotive industry will likely change rapidly. Designing and producing new vehicles have become far too complex and expensive for any likely one company to manage all on its own.
(2) Become data masters: “Know your customers better than they know themselves. Use that data to curate every aspect of the customer experience from when they first learn about the car to the dealership experience and throughout the customer life cycle. Having data scientists on staff will likely be the rule, not the exception.
(3) Update your economic models: “Predicting demand was hard enough in the old days, when you did a major new product launch approximately every five years. Now, with the intensity of competition, the rapid cadence of new launches, and the mashup of consumer and automotive technology, you may need new economic models for predicting demand, capital expenditures, and vehicle profitability.
(4)Tame complexity: “It’s all about the center stack, the seamless connectivity with nomadic devices, the elegance of the Human Machine Interface.
(5) Create adaptable organizations: “It will take a combination of new hard and soft skills to build the cars and the companies of the future. For many older, established companies, that means culture change, bringing in new talent, and rethinking every aspect of process and people management.
Apple  automotive_industry  autonomous_vehicles  ecosystems  Google  know_your_customer  adaptability  CIOs  layer_mastery  competitive_landscape  competitive_strategy  connected_devices  telematics  data  data_driven  data_scientists  customer_experience  curation  structural_change  accelerated_lifecycles  UX  complexity  legacy_players  business_development  modelling  Irving_Wladawsky-Berger  SMAC_stack  cultural_change  digitalization  connected_cars  the_great_game 
april 2015 by jerryking
On the Case at Mount Sinai, It’s Dr. Data - NYTimes.com
MARCH 7, 2015 | NYT |By STEVE LOHR.

“Data-ism: The Revolution Transforming Decision Making, Consumer Behavior, and Almost Everything Else,” by Steve Lohr,
Steve_Lohr  data  data_driven  data_scientists  Wall_Street  Facebook  hospitals  medical  books  Cloudera  consumer_behavior 
march 2015 by jerryking
Banking Start-Ups Adopt New Tools for Lending
JAN. 18, 2015 | - NYTimes.com | By STEVE LOHR.

When bankers of the future decide whether to make a loan, they may look to see if potential customers use only capital letters when filling out forms, or at the amount of time they spend online reading terms and conditions — and not so much at credit history.

These signals about behavior — picked up by sophisticated software that can scan thousands of pieces of data about online and offline lives — are the focus of a handful of start-ups that are creating new models of lending....Earnest uses the new tools to make personal loans. Affirm, another start-up, offers alternatives to credit cards for online purchases. And another, ZestFinance, has focused on the relative niche market of payday loans.
Steve_Lohr  tools  banking  banks  massive_data_sets  start_ups  data_scientists  Earnest  Affirm  ZestFinance  Max_Levchin  consumer_finance  credit_scoring  fin-tech  financial_services  consumer_behavior  signals 
january 2015 by jerryking
Getting Started in ‘Big Data’ - The CFO Report - WSJ
February 4, 2014 | WSJ |by JAMES WILLHITE.

executives and recruiters, who compete for talent in the nascent specialty, point to hiring strategies that can get a big-data operation off the ground. They say they look for specific industry experience, poach from data-rich rivals, rely on interview questions that screen out weaker candidates and recommend starting with small projects.

David Ginsberg, chief data scientist at business-software maker SAP AG , said communication skills are critically important in the field, and that a key player on his big-data team is a “guy who can translate Ph.D. to English. Those are the hardest people to find.”

Along with the ability to explain their findings, data scientists need to have a proven record of being able to pluck useful information from data that often lack an obvious structure and may even come from a dubious source. This expertise doesn’t always cut across industry lines. A scientist with a keen knowledge of the entertainment industry, for example, won’t necessarily be able to transfer his skills to the fast-food market.

Some candidates can make the leap. Wolters Kluwer NV, a Netherlands-based information-services provider, has had some success in filling big-data jobs by recruiting from other, data-rich industries, such as financial services. “We have found tremendous success with going to alternative sources and looking at different businesses and saying, ‘What can you bring into our business?’ ” said Kevin Entricken, the company’s chief financial officer.
massive_data_sets  analytics  data_scientists  cross-industry  recruiting  howto  poaching  plain_English  connecting_the_dots  storytelling  SAP  Wolters_Kluwer  expertise  Communicating_&_Connecting  unstructured_data  war_for_talent  talent  PhDs  executive_search  artificial_intelligence  nontraditional 
june 2014 by jerryking
Sponsor Generated Content: 4 Industries Most in Need of Data Scientists
June 16, 2014 12:00 am ET
4 Industries Most in Need of Data Scientists
NARRATIVESby WSJ. Custom Studios for SAS

Agriculture
Relying on sensors in farm machinery, in soil and on planes flown over fields, precision agriculture is an emerging practice in which growing crops is directed by data covering everything from soil conditions to weather patterns to commodity pricing. “Precision agriculture helps you optimize yield and avoid major mistakes,” says Daniel Castro, director of the Center for Data Innovation, a think tank in Washington, D.C. For example, farmers traditionally have planted a crop, then applied fertilizer uniformly across entire fields. Data models allow them to instead customize the spread of fertilizer, seed, water and pesticide across different areas of their farms—even if the land rolls on for 50,000 acres.

Finance
Big data promises to discover better models to gauge risk, which could minimize the likelihood of scenarios such as the subprime mortgage meltdown. Data scientists, though, also are charged with many less obvious tasks in the financial industry, says Bill Rand, director of the Center for Complexity in Business at the University of Maryland. He points to one experiment that analyzed keywords in financial documents to identify competitors in different niches, helping pinpoint investment opportunities.

Government
Government organizations have huge stockpiles of data that can be applied against all sorts of problems, from food safety to terrorism. Joshua Sullivan, a data scientist who led the development of Booz Allen Hamilton’s The Field Guide to Data Science, cites one surprising use of analytics concerning government subsidies. “They created an amazing visualization that helped you see the disconnect between the locations of food distribution sites and the populations they served,” Sullivan says. “That's the type of thing that isn't easy to see in a pile of static reports; you need the imagination of a data scientist to depict the story in the data.”

Pharma
Developing a new drug can take more than a decade and cost billions. Data tools can help take some of the sting out, pinpointing the best drug candidates by scanning across pools of information, such as marketing data and adverse patient reactions. “We can model data and prioritize which experiments we take [forward],” Sullivan says. “Big data can help sort out the most promising drugs even before you do experiments on mice. Just three years ago that would have been impossible. But that's what data scientists do—they tee up the right question to ask.”
drug_development  precision_agriculture  farming  data_scientists  agriculture  massive_data_sets  data  finance  government  pharmaceutical_industry  product_development  non-obvious  storytelling  data_journalism  stockpiles 
june 2014 by jerryking
How to Make a Map Go Viral
MAY 2 2014 | Atlantic Monthly |ROBINSON MEYERMAY 2 2014,

What kind of data do you look for, and how do you find it?

I don't have a particular type of data that I look for beyond my subjective ...
mapping  howto  virality  massive_data_sets  open_data  data  data_scientists  from notes
may 2014 by jerryking
M.I.T.'s Alex Pentland: Measuring Idea Flows to Accelerate Innovation - NYTimes.com - NYTimes.com
April 15, 2014 | NYT | By STEVE LOHR.

Alex Pentland --“Social Physics: How Good Ideas Spread — The Lesson From a New Science.”

Mr. Pentland has been identified with concepts — and terms he has coined — related to the collection and interpretation of all that data, like “honest signals” and “reality mining.” His descriptive phrases are intended to make his point that not all data in the big data world is equal....Reality mining, for example, examines the data about what people are actually doing rather than what they are looking for or saying. Tracking a person’s movements during the day via smartphone GPS signals and credit-card transactions, he argues, are far more significant than a person’s web-browsing habits or social media comments....Central to the concept of social physics is the ability to measure communication and transactions as never before. Then, that knowledge about the flow of ideas can be used to accelerate the pace of innovation.

The best decision-making environment, Mr. Pentland says, is one with high levels of both “engagement” and “exploration.” Engagement is a measure of how often people in a group communicate with each other, sharing social knowledge. Exploration is a measure of seeking out new ideas and new people.

A golden mean is the ideal....[traders] with a balance of diversity of ideas in their trading network — engagement and exploration — had returns that were 30 percent ahead of isolated traders and well ahead of the echo chamber traders, too....The new data and measurement tools, he writes, allow for a “God’s eye view” of human activity. And with that knowledge, he adds, comes the potential to engineer better decisions in a “data-driven society.”
Alex_Pentland  books  cross-pollination  curiosity  data_scientists  data_driven  decision_making  massive_data_sets  MIT  Mydata  sensors  social_physics  Steve_Lohr  idea_generation  heterogeneity  ideas  intellectual_diversity  traders  social_data  signals  echo_chambers 
april 2014 by jerryking
Big Data Is A Big Factor In 2012                     
Mar 30 2012 | Campaigns & Elections | By Brett Bell.

But as the social media industry continues to mature, so too does the level of sophistication in which campaigns and organizations apply social media tools and techniques. Campaigns are moving away from merely having a social media presence to leveraging social activity to inform and fuel campaign machines....For their part, the Obama campaign is focusing significant attention and resources towards data management. In a series of telling job postings this summer, Obama For America put out the call for data mining and predictive modeling analysts, appealing to the startup community, private sector and data managers within their own Party. One particular job description stated that successful candidates would assist in developing statistical and predictive models to assist in fundraising, digital media and other areas of the campaign....the Obama For America Campaign 2012 launched a Facebook application which requested permission to access your location, name, picture, gender, list of friends and other information that would be valuable to the campaign team
Campaign_2012  massive_data_sets  political_campaigns  data_scientists  data_driven  data_mining  microtargeting  behavioural_targeting  data_management 
january 2014 by jerryking
You don’t want your privacy: Disney and the meat space data race — Tech News and Analysis
By John Foreman, MailChimp
Jan. 18, 2014

meat space is an internet-first way of viewing the world.

The research questions that might be answered with this type of tracking data are endless:

What menu items served at breakfast at the resort hotel restaurants will result in the longest stay at the park?
Do we detect an influx of park-goers into the bathrooms for long stays on the toilet? Perhaps they all ate at the same place, and we can cut off a foodborne illness problem before it gets worse.
Is there a roller coaster that’s correlated with early park departure or a high incidence of bathroom visits? That means less money in the park’s pockets. How might that coaster be altered?
Is there a particular ride and food fingerprint for the type of park visitor that’s likely to buy in-park high-dollar merchandise? If so, can we actively get vendors in front of this attendee’s eye by moving hawkers to them at just the right time?
data  privacy  Disney  RFID  sensors  massive_data_sets  data_driven  data_scientists  theme_parks  personalization  tracking  scheduling  queuing  meat_space  digital_first  questions 
january 2014 by jerryking
How should we analyse our lives? - FT.com
January 17, 2014 | FT |Gillian Tett.

“Social physics helps us understand how ideas flow from person to person . . . and ends up shaping the norms, productivity and creative output of our companies, cities and societies,” writes Pentland. “Just as the goal of traditional physics is to understand how the flow of energy translates into change in motion, social physics seems to understand how the flow of ideas and information translates into changes in behaviour.”...The only question now is whether these powerful new tools will be mostly used for good (to predict traffic queues or flu epidemics) or for more malevolent ends (to enable companies to flog needless goods, say, or for government control). Sadly, “social physics” and data crunching don’t offer any prediction on this issue, even though it is one of the dominant questions of our age......data are always organised, collected and interpreted by people. Thus if you want to analyse what our interactions mean – let alone make decisions based on this – you will invariably be grappling with cultural and power relations.
massive_data_sets  social_physics  data_scientists  quantified_self  call_centres  books  data  social_data  flu_outbreaks  Gillian_Tett  queuing 
january 2014 by jerryking
Science: Hans Rosling – data rock star - FT.com
January 17, 2014 6:53 pm
Science: Hans Rosling – data rock star

By Kate Allen
visualization  data_scientists  Hans_Rosling 
january 2014 by jerryking
Accessing Open Data via APIs: Never Mind the App, Is There a Market for That?
Mark Boyd, September 4th, 2013

But is the market ready to monetize? In Big Data: A Revolution That Will Transform How We Live, Work, and Think, authors Viktor Mayer-Schönberger and Kenneth Cukier argue that at present, those with “the most value in the big data value chain” are those businesses and entrepreneurs with an innovative mindset attuned to the potential of big and open data. While still in its nascence, “the ideas and the skills seem to hold the greatest worth”, they say. However, they expect:

“…eventually most value will be in the data itself. This is because we’ll be able to do more with the information, and also because the data holders will better appreciate the potential value of the asset they possess. As a result, they’ll probably hold it more tightly than ever, and charge outsiders a high price for access.”
data_scientists  open_data  massive_data_sets  entrepreneurship  start_ups  InfoChimps  Junar  mindsets  commercialization  monetization 
january 2014 by jerryking
Data, Data and More Data
July-August 2013 | Campaigns & Elections | by Colin Delany.

...The picture often portrayed--the Obama campaign as a relentlessly efficient data juggernaut--paints over a lot of workarounds, hacks and improvisations. I'd heard this before, for example at CampaignTech in April, when Obama data manager Ethan Roeder had mentioned that plenty of the campaign's technology was held together with "duct tape and baling wire." He echoed that sentiment in Philadelphia, and he wasn't alone: Obama Chief Scientist Rayid Ghani said that for every mention of "data integration" on the campaign, he had "20 caveats" about how less-than-perfect it actually was in practice...Ghani said that the Obama voter file was actually the smallest data set he'd worked with as a technology professional, in part because people vote so rarely. Elections simply don't come around that often, and compared with commercial marketers (who can draw on thousands of purchases and other transactions to predict buying patterns), political campaigners don't have much historical data to work with.
data  data_driven  political_campaigns  Campaign_2012  Facebook  data_scientists  data_management 
december 2013 by jerryking
Minorities possible unfairly disqualified from opening bank accounts | mathbabe
August 7, 2013 Cathy O'Neil,

New York State attorney general Eric T. Schneiderman’s investigation into possibly unfair practices by big banks using opaque and sometimes erroneous databases to disqualify people from opening accounts.

Not much hard information is given in the article but we know that negative reports stemming from the databases have effectively banished more than a million lower-income Americans from the financial system, and we know that the number of “underbanked” people in this country has grown by 10% since 2009. Underbanked people are people who are shut out of the normal banking system and have to rely on the underbelly system including check cashing stores and payday lenders....The second, more interesting point – at least to me – is this. We care about and defend ourselves from our constitutional rights being taken away but we have much less energy to defend ourselves against good things not happening to us.

In other words, it’s not written into the constitution that we all deserve a good checking account, nor a good college education, nor good terms on a mortgage, and so on. Even so, in a large society such as ours, such things are basic ingredients for a comfortable existence. Yet these services are rare if not nonexistent for a huge and swelling part of our society, resulting in a degradation of opportunity for the poor.

The overall effect is heinous, and at some point does seem to rise to the level of a constitutional right to opportunity, but I’m no lawyer.

In other words, instead of only worrying about the truly bad things that might happen to our vulnerable citizens, I personally spend just as much time worrying about the good things that might not happen to our vulnerable citizens, because from my perspective lots of good things not happening add up to bad things happening: they all narrow future options.
visible_minorities  discrimination  data  data_scientists  banks  banking  unbanked  equality  equality_of_opportunity  financial_system  constitutional_rights  payday_lenders  Cathy_O’Neil  optionality  opportunity_gaps  low-income 
december 2013 by jerryking
Open data is not a panacea | mathbabe
December 29, 2012 Cathy O'Neil,
And it’s not just about speed. You can have hugely important, rich, and large data sets sitting in a lump on a publicly available website like wikipedia, and if you don’t have fancy parsing tools and algorithms you’re not going to be able to make use of it.

When important data goes public, the edge goes to the most sophisticated data engineer, not the general public. The Goldman Sachs’s of the world will always know how to make use of “freely available to everyone” data before the average guy.

Which brings me to my second point about open data. It’s general wisdom that we should hope for the best but prepare for the worst. My feeling is that as we move towards open data we are doing plenty of the hoping part but not enough of the preparing part.

If there’s one thing I learned working in finance, it’s not to be naive about how information will be used. You’ve got to learn to think like an asshole to really see what to worry about. It’s a skill which I don’t regret having.

So, if you’re giving me information on where public schools need help, I’m going to imagine using that information to cut off credit for people who live nearby. If you tell me where environmental complaints are being served, I’m going to draw a map and see where they aren’t being served so I can take my questionable business practices there.
open_data  unintended_consequences  preparation  skepticism  naivete  no_regrets  Goldman_Sachs  tools  algorithms  Cathy_O’Neil  thinking_tragically  slight_edge  sophisticated  unfair_advantages  smart_people  data_scientists  gaming_the_system  dark_side 
december 2013 by jerryking
PhD student makes municipal politics media-friendly
November 8, 2013 | UToday-- University of Calgary |By Heath McCoy

In the past, Fairie has also honed his teaching skills by leading classes on research methods. “That’s usually about making stats accessible for people who hate numbers, so I think I developed some skills there, in making the hard details a bit more fun.”

Now that he’s graduating, Fairie wants to take his passion for statistics to the next level. He plans to start up a data-science consulting firm with colleagues.

“I’d like to work with organizations and government bodies who have big data sets which they need to get value out of,” he says. “I can help them understand the patterns that are going on underneath that data. It’s valuable information.”
Colleges_&_Universities  Calgary  data_scientists  politics  municipalities  elections  massive_data_sets  political_campaigns 
december 2013 by jerryking
Start-Ups Are Mining Hyperlocal Information for Global Insights - NYTimes.com
November 10, 2013 | WSJ | By QUENTIN HARDY

By analyzing the photos of prices and the placement of everyday items like piles of tomatoes and bottles of shampoo and matching that to other data, Premise is building a real-time inflation index to sell to companies and Wall Street traders, who are hungry for insightful data.... Collecting data from all sorts of odd places and analyzing it much faster than was possible even a couple of years ago has become one of the hottest areas of the technology industry. The idea is simple: With all that processing power and a little creativity, researchers should be able to find novel patterns and relationships among different kinds of information.

For the last few years, insiders have been calling this sort of analysis Big Data. Now Big Data is evolving, becoming more “hyper” and including all sorts of sources. Start-ups like Premise and ClearStory Data, as well as larger companies like General Electric, are getting into the act....General Electric, for example, which has over 200 sensors in a single jet engine, has worked with Accenture to build a business analyzing aircraft performance the moment the jet lands. G.E. also has software that looks at data collected from 100 places on a turbine every second, and combines it with power demand, weather forecasts and labor costs to plot maintenance schedules.
start_ups  data  data_driven  data_mining  data_scientists  inflation  indices  massive_data_sets  hyperlocal  Premise  Accenture  GE  ClearStory  real-time  insights  Quentin_Hardy  pattern_recognition  photography  sensors  maintenance  industrial_Internet  small_data 
november 2013 by jerryking
To hire without using ads or recruiters - genius or folly? - The Globe and Mail
Aug. 28 2013 | The Globe and Mail | SUSAN SMITH.

That’s the challenge faced by Wojciech Gryc, 27, who started Canopy Labs a year and a half ago in Toronto. The company makes software for businesses that want to track their customers’ preferences using data analytics....The product compiles information from e-mail, e-commerce sites, social media, voice mail and call centres to help predict how likely people are to remain customers, how much they are likely to spend and which marketing messages they are likely to respond to.
hiring  Toronto  start_ups  predictive_analytics  data_scientists  recruiting  DIY  fallacies_follies 
october 2013 by jerryking
Keeping Up With Your Quants
July-August 2013 | HBR | Thomas Davenport.

Article places people into two buckets, as either producers or consumers of analytics. Producers are, of course, good at gathering the available data and making predictions about the future. But most lack sufficient knowledge to identify hypotheses and relevant variables and to know when the ground beneath an organization is shifting. Your job as a data consumer is to generate hypotheses and determine whether results and recommendations make sense in a changing business environment—is therefore critically important....Learn a little about analytics.
If you remember the content of your college-level statistics course, you may be fine. If not, bone up on the basics of regression analysis, statistical inference, and experimental design.

Focus on the beginning and the end.
Ask lots of questions along the way.
Establish a culture of inquiry, not advocacy.
HBR  Thomas_Davenport  massive_data_sets  data_scientists  data  data_driven  howto  analytics  decision_making  quants  questions  endgame  curiosity 
july 2013 by jerryking
Shakeups in the "C Suite": Hail to the New Chiefs
July-August 2012 | World Future Society Vol. 46, No. 4 |By Geoffrey Colon.

Here are a few more additions to the “C Suite” that we might anticipate as technological and economic trends shape the corporate future.

* Earned Media Officer
* Chief Content Officer
* Open-Source Manager
* Chief Linguist
* Chief Data Scientist
executive_management  CMOs  data_scientists 
july 2013 by jerryking
Sizing Up Big Data, Broadening Beyond the Internet - NYTimes.com
June 19, 2013 | NYT | By STEVE LOHR.

The story is the same in one field after another, in science, politics, crime prevention, public health, sports and industries as varied as energy and advertising. All are being transformed by data-driven discovery and decision-making. The pioneering consumer Internet companies, like Google, Facebook and Amazon, were just the start, experts say. Today, data tools and techniques are used for tasks as varied as predicting neighborhood blocks where crimes are most likely to occur and injecting intelligence into hulking industrial machines, like electrical power generators.

Big Data is the shorthand label for the phenomenon, which embraces technology, decision-making and public policy. Supplying the technology is a fast-growing market, increasing at more than 30 percent a year and likely to reach $24 billion by 2016, according to a forecast by IDC, a research firm. All the major technology companies, and a host of start-ups, are aggressively pursuing the business.

Demand is brisk for people with data skills. The McKinsey Global Institute, the research arm of the consulting firm, projects that the United States needs 140,000 to 190,000 more workers with “deep analytical” expertise and 1.5 million more data-literate managers, whether retrained or hired, by 2020.
massive_data_sets  Steve_Lohr  data_scientists  data_driven  open_data  neighbourhoods  decision_making  public  McKinsey 
june 2013 by jerryking
The new number crunchers
May 31, 2013 | The Financial Times p12| by Emma Jacobs.

One of the world's first data scientists turned his geeky love of maths into a lucrative career

When Jeff Hammerbacher lies in bed at nigh...
data_scientists  massive_data_sets  Cloudera  pattern_recognition  data  data_driven  from notes
june 2013 by jerryking
Universities Offer Courses in a Hot New Field - Data Science
April 11, 2013 | NYTimes.com | By CLAIRE CAIN MILLER.

Rachel Schutt, a senior research scientist at Johnson Research Labs, taught “Introduction to Data Science” last semester at Columbia (its first course with “data science” in the title). She described the data scientist this way: “a hybrid computer scientist software engineer statistician.” And added: “The best tend to be really curious people, thinkers who ask good questions and are O.K. dealing with unstructured situations and trying to find structure in them.”
data_scientists  Colleges_&_Universities  data  Claire_Cain_Miller  massive_data_sets  unstructured  Columbia 
april 2013 by jerryking
Big Data, Big Blunders - WSJ.com
March 8, 2013 | WSJ | By SHIRA OVIDE.
Big Data, Big Blunders
Five mistakes companies make—and how they can avoid them
massive_data_sets  problems  data_scientists  mistakes  howto 
march 2013 by jerryking
Big Data should inspire humility, not hype
Mar. 04 2013| The Globe and Mail |Konrad Yakabuski.

" mathematical models have their limits.

The Great Recession should have made that clear. The forecasters and risk managers who relied on supposedly foolproof algorithms all failed to see the crash coming. The historical economic data they fed into their computers did not go back far enough. Their models were not built to account for rare events. Yet, policy makers bought their rosy forecasts hook, line and sinker.

You might think that Nate Silver, the whiz-kid statistician who correctly predicted the winner of the 2012 U.S. presidential election in all 50 states, would be Big Data’s biggest apologist. Instead, he warns against putting our faith in the predictive power of machines.

“Our predictions may be more prone to failure in the era of Big Data,” The New York Times blogger writes in his recent book, The Signal and the Noise. “As there is an exponential increase in the amount of available information, there is likewise an exponential increase in the number of hypotheses to investigate … [But] most of the data is just noise, as most of the universe is filled with empty space.”

Perhaps the biggest risk we run in the era of Big Data is confusing correlation with causation – or rather, being duped by so-called “data scientists” who tell us one thing leads to another. The old admonition about “lies, damn lies and statistics” is more appropriate than ever."
massive_data_sets  data_driven  McKinsey  skepticism  contrarians  data_scientists  Konrad_Yakabuski  modelling  Nate_Silver  humility  risks  books  correlations  causality  algorithms  infoliteracy  noise  signals  hype 
march 2013 by jerryking
To transform education, Donorschoose hires a data scientist - Fortune Tech
By Jessi Hempel, writer December 21, 2012

Charles Best, a former Bronx history teacher, started Donorschoose to help donors and teachers connect directly online. Now Best plans to use a dozen years' worth of data to advocate for those educators. After all, teachers often turn to Donorschoose to request help getting the tools and supplies they need most. In the grand -- and often political -- struggle to identify what schools need, Best believes Donorschoose can help policy makers who control government spending listen to teachers. ...The idea for harnessing this data began when Donorschoose held a hacking contest in 2011. Among the most intriguing explorations of the data was a project done by Lisa Zhang, a Canadian undergraduate who used Donorschoose data to look at, among other things, the influence of a teacher's gender on the types of projects that were submitted and got funded....
education  data_driven  data_scientists  data  nonprofit  philanthropy  teachers  reform 
january 2013 by jerryking
Big Data Is Great, but Don’t Forget Intuition
December 29, 2012 | NYTimes.com |By STEVE LOHR.

A major part of managing Big Data projects is asking the right questions: How do you define the problem? What data do you need? Where does it come from? What are the assumptions behind the model that the data is fed into? How is the model different from reality?...recognize the limits and shortcomings of the Big Data technology that they are building. Listening to the data is important, they say, but so is experience and intuition. After all, what is intuition at its best but large amounts of data of all kinds filtered through a human brain rather than a math model?
Andrew_McAfee  asking_the_right_questions  bubbles  conferences  critical_thinking  data_scientists  Erik_Brynjolfsson  failure  hedge_funds  human_brains  intuition  massive_data_sets  MIT  models  problems  problem_awareness  problem_definition  problem_framing  questions  skepticism  Steve_Lohr  Wall_Street 
january 2013 by jerryking
Big data, cows and cadastres
Jul 5, 2012 | KMWorld Magazine July/August 2012, [Vol 21, Issue 7]| by Stephen E. Arnold.

The hero of the story is a bull named Badger-Bluff Fanny Freddie. Dairy cattle sired by him yield more milk. Genetic information processed by sophisticated numerical recipes yield more efficiency. With Badger-Bluff Fanny Freddie, the dairy industry has an opportunity to convert big data into more milk per head. Therefore, the knowledge generated by big data analytics methods translates directly to money.

The article explained: "Dairy breeding is perfect for quantitative analysis. Pedigree records have been assiduously kept; relatively easy artificial insemination has helped centralize genetic information in a small number of key bulls since the 1960s; there are a relatively small and easily measurable number of traits—milk production, fat in the milk, protein in the milk, longevity, udder quality—that breeders want to optimize; each cow works for three or four years, which means that farmers invest thousands of dollars into each animal, so it's worth it to get the best semen money can buy. The economics push breeders to use the genetics."...The IBM approach is to understand the prospect or customer's problem, develop a plan of action and then assemble the solution from the components in IBM's toolbox....The only problem is that the user-friendly system assumes that the marketing manager understands sample size, the strengths and weaknesses of specific statistical methods and the output itself. Eye-catching graphics is not the same as statistically valid data.
The challenges

The problem in those two examples boils down to people. There is a shortage of staff with big data and analytics skills. The problem is not local; it is global. Data and the need to exploit it are rising faster than the talent pool required to use the sophisticated, increasingly user-friendly systems. Kolmogorov worked with a pen and paper. He could tap into today's powerful system because he had the mathematical expertise required to tame big data. Using a mouse is the trivial part of figuring out cow genetics.
dairy  massive_data_sets  data_scientists  IBM  Google  Palantir  Pentaho  Jaspersoft  talent_pools 
december 2012 by jerryking
« earlier      
per page:    204080120160

related tags

5_W’s  21st._century  accelerated_lifecycles  Accenture  accountability  adaptability  ADM  advertising  Affirm  aftermath  agribusiness  agriculture  Alex_Pentland  algorithms  alpha  Amazon  analog  analysis  analytics  Andrew_McAfee  Apple  appointments  arcane_knowledge  art  artificial_intelligence  art_finance  art_market  asking_the_right_questions  asset_management  AT&T  attrition_rates  auctions  automated_reasoning  automotive_industry  autonomous_vehicles  banking  banks  behavioural_data  behavioural_targeting  behind-the-scenes  best_of  Big_Food  Big_Tech  black_markets  blogs  books  book_reviews  brands  bubbles  Bunge  business  business_development  business_schools  BuzzData  Calgary  call_centres  Campaign_2012  career  career_paths  Cargill  Cathy_O’Neil  causality  CEOs  challenges  charts  Chrystia_Freeland  CIA  CIOs  Citadel  Claire_Cain_Miller  ClearStory  Cloudera  cloud_computing  CMOs  Coatue  Coca-Cola  collectors  Colleges_&_Universities  Columbia  commercialization  commodities  Communicating_&_Connecting  compensation  competingonanalytics  competition  competitive_landscape  competitive_strategy  complexity  conferences  connected_cars  connected_devices  connecting_the_dots  constitutional_rights  consumer_behavior  consumer_finance  content_creators  contrarians  core_competencies  corporate_investors  correlations  CPG  credit_scoring  critical_thinking  cross-industry  cross-pollination  cultural_change  curation  cured_and_smoked  curiosity  customer_experience  customer_insights  cyberphysical  cyber_security  DaaS  dairy  dark_side  data  databases  DataKind  DataMarket  Dataminr  datasets  DataSift  data_driven  data_journalism  data_management  data_marketplaces  data_mining  data_quality  data_scientists  data_wrangling  decision_making  deep_learning  design  differentiation  digitalization  digital_economy  digital_first  discrimination  Disney  disruption  DIY  Donald_Rumsfeld  drug_development  e-learning  Earnest  echo_chambers  economies_of_scale  ecosystems  Edgeflip  education  elections  endgame  engineering  Enlitic  entrepreneurship  equality  equality_of_opportunity  Erik_Brynjolfsson  ethics  event-driven  events  executive_management  executive_search  experimentation  expertise  Facebook  factual  failure  fallacies_follies  farming  fin-tech  finance  financial_data  financial_markets  financial_services  financial_system  flu_outbreaks  FOIA  food_crops  Freebase  Freshbooks  future  gaming_the_system  GE  geospatial  GE_Capital  Gillian_Tett  Gnip  Goldman_Sachs  good_enough  Google  Google_Squared  government  grains  greenfields  grocer  groundbreaking  Hal_Varian  Hans_Rosling  hard_questions  haystacks  HBR  hedge_funds  heterogeneity  hidden  hierarchies  high-touch  hiring  Holman_Jenkins  hospitals  howto  human_brains  human_resources  humility  hustle  hype  hyperlocal  IBM  ideas  idea_generation  Indian-Americans  indices  industrial_Internet  inequality  inflation  infochimps  infographics  infoliteracy  informational_advantages  information_density  information_overload  Information_Rules  information_sources  insights  intellectual_diversity  intuition  investors  in_the_real_world  Irving_Wladawsky-Berger  Jaspersoft  Jessica_E._Vascellaro  job_search  journalism  journalists  Junar  Kaggle  Kasabi  know_your_customer  Konrad_Yakabuski  Kraft_Heinz  Kroger  latent  layer_mastery  legacy_players  Leon_Panetta  letters_to_the_editor  limitations  linear_regression  listening  Louis_Dreyfus  low-income  machine_learning  maintenance  management_consulting  Managing_Your_Career  mapping  marketing  markets  massive_data_sets  match-making  mathematics  Max_Levchin  McKinsey  measurements  meat  meat_space  medical  meetings  mergers_&_acquisitions  metadata  microtargeting  middle_class  millennials  mindsets  mistakes  MIT  modelling  models  monetization  Montreal  multiple_targets  municipalities  Mydata  naivete  Nate_Silver  neighbourhoods  Netflix  new_graduates  new_products  New_York_City  NGOs  niches  noise  non-obvious  non-voters  nonprofit  nontraditional  Novartis  no_regrets  numeracy  NYT  Obama  obituaries  OkCupid  online_training  open_data  opportunity_gaps  optionality  organizational_culture  organizing_data  Oxford  Palantir  patterns  pattern_recognition  payday_lenders  Pentagon  Pentaho  perks  personalization  persuasion  pharmaceutical_industry  PhDs  philanthropy  photography  physical_world  plain_English  plant-based  poaching  Point72  political_campaigns  politicians  politics  precision_agriculture  predictions  predictive_analytics  predictive_modeling  Premise  preparation  prepared_meals  pricing  privacy  problems  problem_awareness  problem_definition  problem_framing  problem_solving  productivity  productivity_payoffs  product_development  product_pipelines  public  quantified_self  quantitative  quants  Quentin_Hardy  questions  queuing  real-time  recruiting  reference-checking  references  reform  reinvention  renewable  reporters  retraining  RFID  risks  Rubikloud  salaries  SAP  scaling  scarcity  scheduling  SecDef  security_&_intelligence  self-education  self-organization  sense-making  sensors  shifting_tastes  shortages  signals  silicon_valley  situational_awareness  skepticism  skills  slides  slight_edge  SMAC_stack  small_business  small_data  smart_people  SMEs  social_data  social_media  social_physics  Socrata  software  software_developers  sophisticated  Sotheby’s  specialization  spreadsheets  spurious  start_ups  statistics  Stephen_Baker  Steve_Lohr  stockpiles  storytelling  strategy  structural_change  supermarkets  sustainability  systematic_approaches  talent  talent_management  talent_pools  tariffs  teachers  telematics  test_beds  theme_parks  the_great_game  thinking_tragically  think_tanks  think_threes  Thomas_Davenport  Thomson_Reuters  timelines  Timetric  tips  tools  Toronto  TPL  tracking  traders  trading  trial_&_error  Twitter  Tyson  unbanked  unevenly_distributed  unfair_advantages  unintended_consequences  unstructured  unstructured_data  uWaterloo  UX  value_creation  venture_capital  via:enochko  virality  virtualization  visible_minorities  visualization  volunteering  Wall_Street  war_for_talent  Washington_D.C.  Winton_Capital  Wolters_Kluwer  women  WorldQuant  ZestFinance 

Copy this bookmark:



description:


tags: