CS 228 notes
These notes form a concise introductory course on probabilistic graphical models. Probabilistic graphical models are a subfield of machine learning that studies how to describe and reason about the world in terms of probabilities. . They are based on Stanford CS228, taught by Stefano Ermon, and are written by Volodymyr Kuleshov, with the help of many students and course staff.
10 hours ago
jtleek/datasharing: The Leek group guide to data sharing
To facilitate the most efficient and timely analysis this is the information you should pass to a statistician:

The raw data.
A tidy data set
A code book describing each variable and its values in the tidy data set.
An explicit and exact recipe you used to go from 1 -> 2,3
datascience  statistics 
7 days ago
Reproducible Data Analysis in Jupyter | Pythonic Perambulations
Jupyter notebooks provide a useful environment for interactive exploration of data. A common question I get, though, is how you can progress from this nonlinear, interactive, trial-and-error style of exploration to a more linear and reproducible analysis based on organized, packaged, and tested code. This series of videos presents a case study in how I personally approach reproducible data analysis within the Jupyter notebook.
jupyter  python 
8 days ago
The Long, Lucrative Right-wing Grift Is Blowing Up in the World's Face
Rather rapidly, two things happened: First, Republicans realized they’d radicalized their base to a point where nothing they did in power could satisfy their most fervent constituents. Then—in a much more consequential development—a large portion of the Republican Congressional caucus became people who themselves consume garbage conservative media, and nothing else.

That, broadly, explains the dysfunction of the Obama era, post-Tea Party freakout. Congressional Republicans went from people who were able to turn their bullshit-hose on their constituents, in order to rile them up, to people who pointed it directly at themselves, mouths open.
politics  trump  media  usa 
8 days ago
The DADSS Midterm Grading Procedure
score = 1 + ln(p)/ln(4), where p is probability you assign to correct answer
statistics  probability  education 
9 days ago
Equal pay: New York banned employers from asking job candidates about past salaries - The Washington Post
The New York City Council isn't fond of it, either. In a vote Wednesday, it approved legislation that will ban employers from asking job applicants about what they make in their current or past job and could have far-reaching consequences beyond the city as employers try to standardize their practices. It's an idea that's starting to spread: In passing the measure, New York City joins Massachusetts, Puerto Rico and the city of Philadelphia — where the local Chamber of Commerce filed a lawsuit against that measure Thursday — in banning the question from job interviews. More than 20 other city and state legislatures have introduced similar provisions.
professional  sexism 
10 days ago
Why bots aren’t the real AI disruption – Textio Word Nerd
Many of today’s bots are kind of a hipster façade around the same basic command line interfaces consumers abandoned in the 1980s. They require specific syntaxes and understand only a limited vocabulary—but they sure have personality!
While the added convenience of language recognition is a benefit, until bots are capable of performing very complex and novel tasks that richly combine actions and context across the boundaries of apps and sites in unique ways the first time they are asked, we will be limited to trying to remember the 489 commands Siri recognizes. (Yes that link is a man page for Siri. ~sigh~)
bot  ai  machinelearning 
11 days ago
Astronomers explore uses for AI-generated images : Nature News & Comment
"Generative AIs look promising for basic science, too, says Welling, who is helping to develop software for the Square Kilometre Array (SKA), a radio-astronomy observatory to be built in South Africa and Australia. The SKA will produce such vast amounts of data that its images will need to be compressed into low-noise but patchy data. Generative AI models will help to reconstruct and fill in blank parts of those data, producing the images of the sky that astronomers will examine.

A team led by Rachel Mandelbaum, an astrophysicist at Carnegie Mellon University, has been experimenting with both GANs and VAEs to simulate images of galaxies that look deformed because of gravitational lensing — when the gravity of objects in the foreground distorts space-time and warps light rays. Researchers are planning to survey huge numbers of galaxies to map gravitational lensing across the Universe’s history. This could show how the distribution of the Universe’s matter has changed over time, providing clues to the nature of the dark energy that is thought to have driven cosmic expansion. But to do this, astronomers need software that can reliably separate gravitational lensing from other effects. Synthetic images will improve the programs’ accuracy, Mandelbaum says."

gans  neuralnetworks  astro 
11 days ago
The Corporation Does Not Always Have To Win
You are not the corporation. You are the human. It is okay for the corporation to lose a small portion of what it has in terrifying overabundance (money, time, efficiency) in order to preserve what a human has that cannot ever be replaced (dignity, humanity, conscience, life). It is okay for you to prioritize your affinity with your fellow humans over your subservience to the corporation, and to imagine and broker outcomes based on this ordering of things. It is okay for the corporation to lose. It will return to its work of churning the living world into dead sand presently.
business  usa  capitalism 
13 days ago
kdb+ 3.5 released last month | Lambda the Ultimate
While I don't suggest these papers are the blueprint for copying/mimicking the DAAS product, it does help the LtU reader imagine a "different world" of data processing than the often cited Map/Reduce paper and other more mainstream approaches. What is particularly striking is how tiny q.exe (the program that runs kdb+ and provides a CLI for q scripting) is. Language researchers are looking at provably correct C compilers, and it is not a huge leap to think about the world soon seeing provably correct real-time time series databases using kdb+ as an inspiration.

Another curiosity, relevant to us here at LtU, is that kdb+ has its own programming language, q. q is a variant of APL with a special library for statistics. Most "big data" solutions don't have native implementations for weighted average, which is a fairly important and frequently used function in quantitative finance, useful for computing volume weighted average price (VWAP) as well as tilt and weighted spread. q is itself implemented in another language, k. The whole language of each is just a couple lines of (terse) code.
timeseries  kdb 
4 weeks ago
I’m not organizing Open Data Day DC this year — these three reasons won’t surprise you. – Medium
Open data is a method, not a goal. Open data practitioners have had the luxury till now of not needing to be overly precise that their goals were, typically, civic innovation. We can’t take for granted anymore that our goals are shared or that they are going to be understood by others if we don’t articulate them. We can’t continue to hold “transparency” and “open data” events anymore and expect to be understood as actually being civic innovators when others more powerful than us are using the same terms and techniques to hurt people.
politics  datascience 
6 weeks ago
PPE: the Oxford degree that runs Britain
Oxford PPE, he wrote, “gives no training in scholarship, only refining to a high degree of perfection the ability to write short dilettantish essays on the basis of very little knowledge: ideal training for the social engineer”.
economics  oxford  education  politics  philosophy 
7 weeks ago
Making Git and Jupyter Notebooks play nice - Tim Staley
Summary: jq rocks for speedy JSON mangling. Use it to make powerful git clean filters, e.g. when stripping out unwanted cached-data from Jupyter notebooks. You can find the documentation of git 'clean' and 'smudge' filters buried in the page on git-attributes, or see my example setup below.
jupyter  git 
8 weeks ago
1970s 20c 21c 2fa 401k abc abtesting academia accommodation adtech advertising ageism ai airbnb algorithms amazon amiga analytics anecdata anglican anomaly ansible antarctica api apple architecture art arxiv asia astro astronomy asyncio auc audio autism awk aws backup badscience banking bash bayes bayesdb bias bigdata book books bot bottle brexit britain brooklyn bug business c california camera capitalism car charity chess children china chrome cia civilrights climate climatechange clojure code colonialism communism computers concurrency conference cosmology crime cryptography cs css csv culture cycling cython dancing darkmatter data database dataengineering datascience datasets datastructures death debt deeplearning design development devops diversity diy docker earth ebay economics education elsevier email emoji emulation engineering english entityresolution equity erlang espionage ethics etsy eu europe f# facebook facism fairness family fashion fastforward fatml feminism finance finland fintech frontend functional fzf gambling gans gaussianprocesses gawker gchq geography germany gerrymandering gif gis gist git github globalentry golang google government gpu gradschool grammar graph greece grep gui h1b hardware haskell hawaii health hiring history hmm home homomorphic housing html http humor humour hype ibm image immigration infrastructure insight instagram insurance internet interpretability interview investments ipython iraq java javascript job jobs journal journalism js json julia jupiter jupyter kdb keras korean kubernetes labour lambda language law lda lectures legal lendingclub liberalarts liberalism linearalgebra linguistics linux lisp literature logic london losangeles mac machinelearning macos make management map mapreduce maps marfa marketing mars math maths me mechanicalturk media medicine mfa microsoft military mining module money mp3 mta music mxnet name nazism network networking neuralnetworks neuroscience newyork next nips nist nlp notebook npr nsa nuclear numpy nyc obama oop opensource os oxford p2ploans package pandas paypal pdf philanthropy philosophy phone phonetics photography photoshop php physics pivot planning poem politics polling poverty presentation privacy probabilisticprogramming probability product professional programming pronunciation psephology publishing pymc3 pystan pytest python pywren q quant quantum r race racism radio reading rec recipe recommendation reinforcementlearning relationships religion research rest review ripgrep robots.txt roc russia rust salary sanfrancisco satellite scala scheme science sciencewriting scientism scikitlearn scipy search security sed semisupervised sentiment serverless sexism sf siliconvalley slack smalldata social socialism socialmedia solarsystem sonos space spark sql ssh ssl stan startup startups statistics stream study style suburban supervised switzerland sysadmin talk tax teaching tech technology tensorflow terrorism testing texas text theano thesis time timemachine timeseries tmux tox translation transport travel trump tsne tts tutorial tv twitter typography uber uk unicode unions unix unsupervised urban usa usage usps ux vc versioncontrol video vim virtualenv visa visualization vpn weather web webdev windows word2vec wsgi

