nhalidaybundlesacm   2138
CakeML
some interesting job openings in Sydney listed here
programming  pls  plt  functional  ocaml-sml  formal-methods  rigor  compilers  types  numerics  accuracy  estimate  research-program  homepage  anglo  jobs  tech  cool 
8 days ago by nhaliday
How good are decisions?
A statement I commonly hear in tech-utopian circles is that some seeming inefficiency can’t actually be inefficient because the market is efficient and inefficiencies will quickly be eliminated. A contentious example of this is the claim that companies can’t be discriminating because the market is too competitive to tolerate discrimination. A less contentious example is that when you see a big company doing something that seems bizarrely inefficient, maybe it’s not inefficient and you just lack the information necessary to understand why the decision was efficient.

Unfortunately, arguments like this are difficult to settle because, even in retrospect, it’s usually not possible to get enough information to determine the precise “value” of a decision. Even in cases where the decision led to an unambiguous success or failure, there are so many factors that led to the result that it’s difficult to figure out precisely why something happened.

One nice thing about sports is that they often have detailed play-by-play data and well-defined win criteria which lets us tell, on average, what the expected value of a decision is. In this post, we’ll look at the cost of bad decision making in one sport and then briefly discuss why decision quality in sports might be the same or better as decision quality in other fields.

Just to have a concrete example, we’re going to look at baseball, but you could do the same kind of analysis for football, hockey, basketball, etc., and my understanding is that you’d get a roughly similar result in all of those cases.

We’re going to model baseball as a state machine, both because that makes it easy to understand the expected value of particular decisions and because this lets us talk about the value of decisions without having to go over most of the rules of baseball.

exactly the kinda thing Dad likes
techtariat  dan-luu  data  analysis  examples  nitty-gritty  sports  street-fighting  automata-languages  models  optimization  arbitrage  data-science  cost-benefit  tactics 
13 days ago by nhaliday
How to come up with the solutions: techniques - Codeforces
Technique 1: "Total Recall"
Technique 2: "From Specific to General"
Let's say that you've found the solution for the problem (hurray!). Let's consider some particular case of a problem. Of course, you can apply the algorithm/solution to it. That's why, in order to solve a general problem, you need to solve all of its specific cases. Try solving some (or multiple) specific cases and then try and generalize them to the solution of the main problem.
Technique 3: "Bold Hypothesis"
Technique 4: "To solve a problem, you should think like a problem"
Technique 5: "Think together"
Technique 6: "Pick a Method"
Technique 7: "Print Out and Look"
Technique 8: "Google"
oly  oly-programming  problem-solving  thinking  expert-experience  retention  metabuch  visual-understanding  zooming  local-global  collaboration  tactics  debugging  bare-hands  let-me-see  advice 
14 days ago by nhaliday
Treadmill desk observations - Gwern.net
Notes relating to my use of a treadmill desk and 2 self-experiments showing walking treadmill use interferes with typing and memory performance.

...

While the result seems highly likely to be true for me, I don’t know how well it might generalize to other people. For example, perhaps more fit people can use a treadmill without harm and the negative effect is due to the treadmill usage tiring & distracting me; I try to walk 2 miles a day, but that’s not much compared to some people.

Given this harmful impact, I will avoid doing spaced repetition on my treadmill in the future, and given this & the typing result, will relegate any computer+treadmill usage to non-intellectually-demanding work like watching movies. This turned out to not be a niche use I cared about and I hardly ever used my treadmill afterwards, so in October 2016 I sold my treadmill for $70. I might investigate standing desks next for providing some exercise beyond sitting but without the distracting movement of walking on a treadmill.
ratty  gwern  data  analysis  quantified-self  health  fitness  get-fit  working-stiff  intervention  cost-benefit  psychology  cog-psych  retention  iq  branches  keyboard  ergo  efficiency  accuracy  null-result  increase-decrease 
14 days ago by nhaliday
Three best practices for building successful data pipelines - O'Reilly Media
Drawn from their experiences and my own, I’ve identified three key areas that are often overlooked in data pipelines, and those are making your analysis:
1. Reproducible
2. Consistent
3. Productionizable

...

Science that cannot be reproduced by an external third party is just not science — and this does apply to data science. One of the benefits of working in data science is the ability to apply the existing tools from software engineering. These tools let you isolate all the dependencies of your analyses and make them reproducible.

Dependencies fall into three categories:
1. Analysis code ...
2. Data sources ...
3. Algorithmic randomness ...

...

Establishing consistency in data
...

There are generally two ways of establishing the consistency of data sources. The first is by checking-in all code and data into a single revision control repository. The second method is to reserve source control for code and build a pipeline that explicitly depends on external data being in a stable, consistent format and location.

Checking data into version control is generally considered verboten for production software engineers, but it has a place in data analysis. For one thing, it makes your analysis very portable by isolating all dependencies into source control. Here are some conditions under which it makes sense to have both code and data in source control:
Small data sets ...
Regular analytics ...
Fixed source ...

Productionizability: Developing a common ETL
...

1. Common data format ...
2. Isolating library dependencies ...

https://blog.koresoftware.com/blog/etl-principles
Rigorously enforce the idempotency constraint
For efficiency, seek to load data incrementally
Always ensure that you can efficiently process historic data
Partition ingested data at the destination
Rest data between tasks
Pool resources for efficiency
Store all metadata together in one place
Manage login details in one place
Specify configuration details once
Parameterize sub flows and dynamically run tasks where possible
Execute conditionally
Develop your own workflow framework and reuse workflow components

more focused on details of specific technologies:
https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-i-4227c5c457d7

https://www.cloudera.com/documentation/director/cloud/topics/cloud_de_best_practices.html
techtariat  org:com  best-practices  engineering  code-organizing  machine-learning  data-science  yak-shaving  nitty-gritty  workflow  config  vcs  replication  homo-hetero  multi  org:med  design  system-design  links  shipping  protocol  minimalism  volo-avolo  causation  random  invariance  structure  arrows 
18 days ago by nhaliday
Information Processing: Beijing 2019 Notes
Trump, the trade war, and US-China relations came up frequently in discussion. Chinese opinion tends to focus on the long term. Our driver for a day trip to the Great Wall was an older man from the countryside, who has lived only 3 years in Beijing. I was surprised to hear him expressing a very balanced opinion about the situation. He understood Trump's position remarkably well -- China has done very well trading with the US, and owes much of its technological and scientific development to the West. A recalibration is in order, and it is natural for Trump to negotiate in the interest of US workers.

China's economy is less and less export-dependent, and domestic drivers of growth seem easy to identify. For example, there is still a lot of low-hanging fruit in the form of "catch up growth" -- but now this means not just catching up with the outside developed world, but Tier 2 and Tier 3 cities catching up with Tier 1 cities like Beijing, Shanghai, Shenzhen, etc.

China watchers have noted the rapidly increasing government and private sector debt necessary to drive growth here. Perhaps this portends a future crisis. However, I didn't get any sense of impending doom for the Chinese economy. To be fair there was very little inkling of what would happen to the US economy in 2007-8. Some of the people I met with are highly placed with special knowledge -- they are among the most likely to be aware of problems. Overall I had the impression of normalcy and quiet confidence, but perhaps this would have been different in an export/manufacturing hub like Shenzhen. [ Update: Today after posting this I did hear something about economic concerns... So situation is unclear. ]

Innovation is everywhere here. Perhaps the most obvious is the high level of convenience from the use of e-payment and delivery services. You can pay for everything using your mobile (increasingly, using just your face!), and you can have food and other items (think Amazon on steroids) delivered quickly to your apartment. Even museum admissions can be handled via QR code.

A highly placed technologist told me that in fields like AI or computer science, Chinese researchers and engineers have access to in-depth local discussions of important arXiv papers -- think StackOverflow in Mandarin. Since most researchers here can read English, they have access both to Western advances, and a Chinese language reservoir of knowledge and analysis. He anticipates that eventually the pace and depth of engineering implementation here will be unequaled.

IVF and genetic testing are huge businesses in China. Perhaps I'll comment more on this in the future. New technologies, in genomics as in other areas, tend to be received more positively here than in the US and Europe.

...

Note Added: In the comments AG points to a Quora post by a user called Janus Dongye Qimeng, an AI researcher in Cambridge UK, who seems to be a real China expert. I found these posts to be very interesting.

Infrastructure development in poor regions of China

Size of Chinese internet social network platforms

Can the US derail China 2025? (Core technology stacks in and outside China)

Huawei smartphone technology stack and impact of US entity list interdiction (software and hardware!)

Agriculture at Massive Scale

US-China AI competition


More recommenations: Bruno Maçães is one of my favorite modern geopolitical thinkers. A Straussian of sorts (PhD under Harvey Mansfield at Harvard), he was Secretary of State for European Affairs in Portugal, and has thought deeply about the future of Eurasia and of US-China relations. He spent the last year in Beijing and I was eager to meet with him while here. His recent essay Equilibrium Americanum appeared in the Berlin Policy Journal. Podcast interview -- we hope to have him on Manifold soon :-)
hsu  scitariat  china  asia  thucydides  tech  technology  ai  automation  machine-learning  trends  the-bones  links  reflection  qra  q-n-a  foreign-policy  world  usa  trade  nationalism-globalism  great-powers  economics  research 
25 days ago by nhaliday
Sage: Open Source Mathematics Software: You don't really think that Sage has failed, do you?
> P.S. You don't _really_ think that Sage has failed, do you?

After almost exactly 10 years of working on the Sage project, I absolutely do think it has failed to accomplish the stated goal of the mission statement: "Create a viable free open source alternative to Magma, Maple, Mathematica and Matlab.".     When it was only a few years into the project, it was really hard to evaluate progress against such a lofty mission statement.  However, after 10 years, it's clear to me that not only have we not got there, we are not going to ever get there before I retire.   And that's definitely a failure.   
mathtariat  reflection  failure  cost-benefit  oss  software  math  CAS  tools  state-of-art  expert-experience  review  comparison  saas  cloud  :/ 
29 days ago by nhaliday
acm 94 tags

accuracyacmacmtariatankur-moitraapproximationauto-learningbanditsbayesianben-rechtbias-variancebrunn-minkowskiclassificationclever-ratscompressed-sensingcomputer-visionconcentration-of-measurecurvaturedata-sciencedecision-theorydeep-learningdeepgoogdifferential-privacydimensionalitydistributionalembeddingsensemblesentropy-likeexploratoryextremafeaturesfrequentistgame-theorygaussian-processesgeneralizationgenerativegradient-descentgraphical-modelsGT-101high-dimensionhuman-mlIEEEiidnessinformation-theoryinterpretabilityiterative-methodskernelslatent-variableslearning-theorylinear-modelslocal-globalmachine-learningmarkovmath.DSmatrix-factorizationmichael-jordanML-MAP-Emodel-classmodel-selectionmomentsmonte-carlomrtznlpnonparametricnumericsoff-convexonline-learningopenaioptimizationORFEoutliersPACparametricrandom-matricesrandom-networksregressionregularizationreinforcementrobustsample-complexitysamplingsanjeev-arorasebastien-bubeckshalizishannonsimplexsparsitystate-of-artstatsstochastic-processessubmodulartailstime-seriesunsupervisedvc-dimension

Copy this bookmark:



description:


tags: