The Environment and Disease: Association or Causation?  
By Sir Austin Bradford Hill CBE DSC FRCP (hon) FRS
(Professor Emeritus of Medical Statistics, University of London)
statistics  causality 
24 days ago
Cache: working with cross-testrun state — pytest documentation
--lf, --last-failed - to only re-run the failures.
--ff, --failed-first - to run the failures first and then the rest of the tests.
testing  python 
25 days ago
Deep Learning for NLP Best Practices
This post is a collection of best practices for using neural networks in Natural Language Processing. It will be updated periodically as new insights become available and in order to keep track of our evolving understanding of Deep Learning for NLP.
deeplearning  nlp 
26 days ago
A better way to solve the housing crisis — tax land, not development - LA Times
Housing scarcity delivers unearned wealth to people who own housing, and it imposes unwarranted burdens on people who don’t. To solve our housing crisis fairly and effectively, we should tax that wealth and use it to ease those burdens. It’s easy to wish that someone would help our disadvantaged fellow citizens. It’s harder to acknowledge our own role in their distress — to admit that our capital gains are their housing crisis. But that is the situation: Linkage fees feed the false belief that some of us in some neighborhoods can keep blocking development and growing our nest eggs, while the city helps the poor by taxing someone else to build affordable housing somewhere else.
politics  housing  Tax  losangeles 
28 days ago
What unread books can teach us | Life and style | The Guardian
There are many similarly incisive bits of Eco-wisdom in a 1977 book recently translated into English. Its unprepossessing title is How To Write A Thesis; even worse, it hasn’t been updated for the era of personal computers or the web. Yet really, beneath the surface, it’s about cultivating curiosity and learning how to learn, whether or not you’re doing a PhD. For example, who hasn’t encountered what Eco calls “the alibi of photocopies” – the way that, just by accumulating material, you start to imagine you’ve internalised it? “There are many things I do not know,” he writes, “because I photocopied a text and then relaxed as if I had read it.”
library  books  academia 
4 weeks ago
No Country For Ye Olde Men • Damn Interesting
Britain’s practice of transporting convicts to American colonies was a fearsome punishment, but not for the chronic criminal James Dalton.
history  usa  crime  uk 
4 weeks ago
What is Differential Privacy? – A Few Thoughts on Cryptographic Engineering
Even more usefully, the calculation of “how much” noise to inject can be made without knowing the contents of the database itself (or even its size). That is, the noise calculation can be performed based only on knowledge of the function to be computed, and the acceptable amount of data leakage.

To give an absolutely crazy example of how big the tradeoffs can be, consider this paper by Frederikson et al. from 2014. The authors began with a public database linking Warfarin dosage outcomes to specific genetic markers. They then used ML techniques to develop a dosing model based on their database — but applied DP at various privacy budgets while training the model. Then they evaluated both the information leakage and the model’s success at treating simulated “patients”.

The results showed that the model’s accuracy depends a lot on the privacy budget on which it was trained. If the budget is set too high, the database leaks a great deal of sensitive patient information — but the resulting model makes dosing decisions that are about as safe as standard clinical practice. On the other hand, when the budget was reduced to a level that achieved meaningful privacy, the “noise-ridden” model had a tendency to kill its “patients”.
differentialprivacy  security  Privacy  statistics 
4 weeks ago
What I learned as a hired consultant to autodidact physicists | Aeon Ideas
The majority of my callers are the ones who seek advice for an idea they’ve tried to formalise, unsuccessfully, often for a long time. Many of them are retired or near retirement, typically with a background in engineering or a related industry. All of them are men. Many base their theories on images, downloaded or drawn by hand, embedded in long pamphlets. A few use basic equations. Some add videos or applets. Some work with 3D models of Styrofoam, cardboard or wires. The variety of their ideas is bewildering, but these callers have two things in common: they spend an extraordinary amount of time on their theories, and they are frustrated that nobody is interested.

Sociologists have long tried and failed to draw a line between science and pseudoscience. In physics, though, that ‘demarcation problem’ is a non-problem, solved by the pragmatic observation that we can reliably tell an outsider when we see one. During a decade of education, we physicists learn more than the tools of the trade; we also learn the walk and talk of the community, shared through countless seminars and conferences, meetings, lectures and papers. After exchanging a few sentences, we can tell if you’re one of us. You can’t fake our community slang any more than you can fake a local accent in a foreign country.

A typical problem is that, in the absence of equations, they project literal meanings onto words such as ‘grains’ of space-time or particles ‘popping’ in and out of existence. Science writers should be more careful to point out when we are using metaphors. My clients read way too much into pictures, measuring every angle, scrutinising every colour, counting every dash. Illustrators should be more careful to point out what is relevant information and what is artistic freedom. But the most important lesson I’ve learned is that journalists are so successful at making physics seem not so complicated that many readers come away with the impression that they can easily do it themselves. How can we blame them for not knowing what it takes if we never tell them?
physics  ScienceWriting  sociology  crank 
4 weeks ago
The Highest-Leverage Activities Aren’t Always Deep Work
Ultimately, if you want to maximize your effectiveness, don’t think solely in terms of deep work versus shallow work. Focus instead on the highest-leverage tasks that must be accomplished to get your job done. The most effective engineers own their results — that means having the technical skills to execute on deep work, but also having the meta-skills and the willingness to tackle any shallower tasks necessary to translate that work into impact.
4 weeks ago
What’s your ML test score? A rubric for ML production systems
Using machine learning in real-world production systems is complicated by a host of issues not found in small toy examples or even large offline research experiments. Testing and monitoring are key considerations for assessing the production-readiness of an ML system. But how much testing and monitoring is enough? We present an ML Test Score rubric based on a set of actionable tests to help quantify these issues.
machinelearning  sre  devops 
4 weeks ago
12 Fractured Apps – Kelsey Hightower – Medium
Everything in this post is about improving the deployment process for your applications, specifically those running in a Docker container, but these ideas should apply almost anywhere. On the surface it may seem like a good idea to push application bootstrapping tasks to custom wrapper scripts, but I urge you to reconsider. Deal with application bootstrapping tasks as close to the application as possible and avoid pushing this burden onto your users, which in the future could very well be you.
devops  docker 
4 weeks ago
Learning to Hash
This homepage lists some representative papers about hashing, especially Learning to Hash, for big data applications.
4 weeks ago
!!Con 2017: Finding Friends in High Dimensions... by Aaron Levin - YouTube
Finding Friends in High Dimensions: Locality-Sensitive Hashing For Fun and Friendliness! by Aaron Levin
4 weeks ago
X-DataInitiative/tick: Module for statistical learning, with a particular emphasis on time-dependent modelling
tick is a machine learning library for Python 3. The focus is on statistical learning for time dependent systems, such as point processes. Tick features also tools for generalized linear models and a generic optimization toolbox. The core of the library is an optimization module providing model computational classes, solvers and proximal operators for regularization. It comes also with inference and simulation tools intended for end-users who for example can easily:

Perform linear, logistic or Poisson regression
Simulate point Hawkes processes with standard or exotic kernels.
Infer Hawkes models with various assumptions on the kernels: exponential or sum of exponential kernels, linear combination of basis kernels, sparse interactions, etc.
timeseries  python 
4 weeks ago
Stargazers forks
Find maintained forks of your favorite GitHub repos.
5 weeks ago
Math Has No God Particle | FiveThirtyEight
David Vogan, who’s a mathematician at MIT and was involved with the atlas project, described academic mathematics as a garden. There are showy, flowery fields like number theory. Its beautiful problems and elegant results, such as the prime gap or Fermat’s last theorem, are math’s orchids. There are also the tomatoes — the things you can eat out of the garden, the practical yield. These disciplines, like Fourier analysis with its concrete applications to signal processing of audio, radio and light waves, are businesslike. And then there are the disciplines, often unheralded, that keep the rest of the garden growing — the hoes, the sprinklers. Lie groups, their representations and the atlas project are an example.

“Representation theory,” the field of the atlas group’s research, “is the fertilizer or the rose trellis, depending on the day of the week,” Vogan said.
math  research  journalism  scientism  ScienceWriting 
5 weeks ago
Gaussian processes for time-series modelling
Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences
statistics  timeseries 
5 weeks ago
« earlier      
1970s 20c 21c 2fa 401k abc abtesting academia accommodation adtech advertising aerospace ageism ai airbnb algorithms amazon amiga analytics anecdata anglican anomaly ansible antarctica api apple architecture art arxiv asia astro astronomy asyncio auc audio autism awk aws backup badscience banking bash basketball bayes bayesdb bias bigdata bitcoin book books bot bottle brain brexit britain brooklyn bug business c california camera capitalism car causality charity chess children china chrome cia civilrights climate climatechange clojure cloud cloudfront code colonialism communication communism computers concurrency conference coroutines cosmology crank crime cryptocurrency cryptography cs css csv culture cycling cython dancing darkmatter data database dataengineering datascience datasets datastructures death debt deeplearning design development devops differentialprivacy diversity diy dns docker earth ebay economics education elsevier email emoji emulation encryption engineering english entityresolution equity erlang espionage ethereum ethics etsy eu europe f# facebook facism fairness family fashion fastforward fatml feminism fiction film finance finland fintech frontend functional fzf gambling gans gaussianprocesses gawker gchq geography germany gerrymandering gif gis gist git github globalentry golang google government gpu gradschool grammar graph graphdbs greece grep gui h1b hardware haskell hawaii health healthcare hiring history hmm home homomorphic housing html http https humor humour hype ibm image immigration infrastructure insight instagram insurance internet interpretability interview investments ipython iraq java javascript job jobs journal journalism js json julia jupiter jupyter kafka kdb keras korean kubernetes labour lambda language law lda lectures legal lendingclub liberalarts liberalism library linearalgebra linguistics linux lisp literature logging logic london losangeles lstm mac machinelearning macos make management map mapreduce maps marfa marketing mars math maths me mechanicalturk media medicine mfa microsoft military mining module money mp3 mta music mxnet name nazism network networking neuralnetworks neuroscience newyork next nips nist nlg nlp nosql notebook npr nsa nuclear numpy nyc obama oop opensource os oxford p2ploans package packaging pandas paypal pdf philanthropy philosophy phone phonetics photography photoshop php physics pivot planning poem politics polling postgresql poverty presentation privacy probabilisticprogramming probability product professional programming pronunciation psephology publishing pymc3 pystan pytest python pywren q quant quantum r race racism radio ransomware reading rec recipe recommendation reinforcementlearning relationships religion remote republican research rest review ripgrep rnn robots.txt roc route53 ruby russia rust s3 safety salary sanfrancisco satellite scala scheme science sciencewriting scientism scifi scikitlearn scipy search security sed semisupervised sentiment serverless sexism sf siliconvalley slack smalldata social socialism socialmedia sociology solarsystem sonos space spacy spark sql sre ssh ssl stan startup startups statistics stream streaming study style suburban summarization supervised switzerland sysadmin talk tax teaching tech technology tensorflow terrorism testing texas text theano thesis time timemachine timeseries tmux tox translation transport travel trump tsne tts tutorial tv twitter typography uber uk unicode unions unix unsupervised urban usa usage usps ux vc versioncontrol video vim virtualenv visa visualization vpn weather web webdev windows word2vec wsgi

Copy this bookmark: