The Exaggerated Promise of So-Called Unbiased Data Mining | WIRED
"Something extremely unlikely is not unlikely at all if it has already happened."
"The Feynman trap—ransacking data for patterns without any preconceived idea of what one is looking for—is the Achilles heel of studies based on data mining. Finding something unusual or surprising after it has already occurred is neither unusual nor surprising. Patterns are sure to be found, and are likely to be misleading, absurd, or worse."

Using pandas with large data
Learn how to use simple techniques to reduce memory usage by almost 90% and work with bigger data using pandas.
Brain-wide Organization of Neuronal Activity and Convergent Sensorimotor Transformations in Larval Zebrafish - ScienceDirect
Simultaneous recordings of large populations of neurons in behaving animals allow detailed observation of high-dimensional, complex brain activity. However, experimental approaches often focus on singular behavioral paradigms or brain areas. Here, we recorded whole-brain neuronal activity of larval zebrafish presented with a battery of visual stimuli while recording fictive motor output. We identified neurons tuned to each stimulus type and motor output and discovered groups of neurons in the anterior hindbrain that respond to different stimuli eliciting similar behavioral responses. These convergent sensorimotor representations were only weakly correlated to instantaneous motor activity, suggesting that they critically inform, but do not directly generate, behavioral choices. To catalog brain-wide activity beyond explicit sensorimotor processing, we developed an unsupervised clustering technique that organizes neurons into functional groups. These analyses enabled a broad overview of the functional organization of the brain and revealed numerous brain nuclei whose neurons exhibit concerted activity patterns.
Algorithmic Government: Automating Public Services and Supporting Civil Servants in using Data Science Technologies
The data science technologies of artificial intelligence (AI), Internet of Things (IoT), big data and behavioral/predictive analytics, and blockchain are poised to revolutionize government and create a new generation of GovTech start-ups. The impact from the ‘smartification’ of public services and the national infrastructure will be much more significant in comparison to any other sector given government’s function and importance to every institution and individual. Potential GovTech systems include Chatbots and intelligent assistants for public engagement, Robo-advisors to support civil servants, real-time management of the national infrastructure using IoT and blockchain, automated compliance/regulation, public records securely stored in blockchain distributed ledgers, online judicial and dispute resolution systems, and laws/statutes encoded as blockchain smart contracts. Government is potentially the major ‘client’ and also ‘public champion’ for these new data technologies. This review paper uses our simple taxonomy of government services to provide an overview of data science automation being deployed by governments world-wide. The goal of this review paper is to encourage the Computer Science community to engage with government to develop these new systems to transform public services and support the work of civil servants.
Higher patient satisfaction with antidepressants correlates with earlier drug release dates across online user‐generated medical databases
> nThe advent of large online databases in which patients themselves rate drugs allows for a new Big Data–driven approach to compare the efficacy and patient satisfaction with sample sizes exceeding previous studies. Exemplifying this approach with antidepressants, we show that patient satisfaction with a drug anticorrelates with its release date with high significance, across different online user‐driven databases. This finding suggests that a systematic reevaluation of current, often patent‐protected drugs compared to their older predecessors may be helpful, especially given that the efficacy of newer agents relative to older classes of antidepressants such as monoamine oxidase inhibitors (MAOIs) and tricyclic antidepressants (TCAs) is as yet quantitatively unexplored.
The Kinds of Data Scientist
-- they forgot to include the data scientists that call out snake oil salesmanship and the ones who worry about ethics.
