magnitude   239

« earlier    

Tiny Endian
How to scrape the web and not get caught
This article will be just a quick one. It's a few line of code recipe on how to mitigate IP restrictions and WAFs when crawling the web. If you're reading this you probably already already tried web scraping. It's all easy breezy until one day someone managing the website you're harvesting data from realizes what happens and blocks your IP. If you're running your scrappers in an automated way you'll start seeing them failing miserably. You'll probably want to solve this problem fast, before any of precious data slips through your fingers.
Sa hello to proxies
While it might be tempting to use one of paid providers of such services it isn't that hard to craft a home baked solution that will cost you no money. This is thanks to an awesome project scrapy-rotating-proxies.
Just add it to your project like it is described in the documentation:
# settings.py
# ...
ROTATING_PROXY_LIST = [
'proxy1.com:8000',
'proxy2.com:8031',
# ...
]
ROTATING_PROXY_LIST_PATH = 'proxies.txt'
# ...
So, where to get this proxies.txt list from? This is easier than you think. I was not able to find a python project that would provide a list free proxies out of the box, but there is a list-proxies node module made exactly for that!
Installation is extremely simple, as well as usage:
proxy-lists getProxies --sources-white-list="gatherproxy,sockslist"
This will save a bulky list of proxies in your proxies.txt file.
Say hello to Makefiles
Now you're essentially running a mixed-language project (with Python for scrapy and JS for list-proxies). You need a way to synchronize these two tools. What would be better than the lingua franca of builds and orchestration - the Makefile.
Just create a target:
all:
yarn run proxy-lists getProxies --sources-white-list=$$PROXIES_SOURCE_LIST
scrapy crawl mycrawler -o myoutput.csv
rm -r proxies.txt
And after you're done with that, your build step in Jenkins becomes just:
make all
Things to consider
Of  course  there's  an  overhead  to  pay  for  using  this  -  after  introducing  proxies  my  crawl  times  grew  by  an  order  magnitude  from  minutes  to  hours!  But  hey_  it  works  and  it's  free_  so  if  you're  not  willing  to  pay  for  data  in  cash_  you  need  to  pay  for  it  with  time.  Luckily  for  you  with  this  sweet  hack  it's  build  server's  time_  not  yours.  from iphone
april 2018 by hendry
Theory of Self-Reproducing Automata - John von Neumann
Fourth Lecture: THE ROLE OF HIGH AND OF EXTREMELY HIGH COMPLICATION

Comparisons between computing machines and the nervous systems. Estimates of size for computing machines, present and near future.

Estimates for size for the human central nervous system. Excursus about the “mixed” character of living organisms. Analog and digital elements. Observations about the “mixed” character of all componentry, artificial as well as natural. Interpretation of the position to be taken with respect to these.

Evaluation of the discrepancy in size between artificial and natural automata. Interpretation of this discrepancy in terms of physical factors. Nature of the materials used.

The probability of the presence of other intellectual factors. The role of complication and the theoretical penetration that it requires.

Questions of reliability and errors reconsidered. Probability of individual errors and length of procedure. Typical lengths of procedure for computing machines and for living organisms--that is, for artificial and for natural automata. Upper limits on acceptable probability of error in individual operations. Compensation by checking and self-correcting features.

Differences of principle in the way in which errors are dealt with in artificial and in natural automata. The “single error” principle in artificial automata. Crudeness of our approach in this case, due to the lack of adequate theory. More sophisticated treatment of this problem in natural automata: The role of the autonomy of parts. Connections between this autonomy and evolution.

- 10^10 neurons in brain, 10^4 vacuum tubes in largest computer at time
- machines faster: 5 ms from neuron potential to neuron potential, 10^-3 ms for vacuum tubes

https://en.wikipedia.org/wiki/John_von_Neumann#Computing
pdf  article  papers  essay  nibble  math  cs  computation  bio  neuro  neuro-nitgrit  scale  magnitude  comparison  acm  von-neumann  giants  thermo  phys-energy  speed  performance  time  density  frequency  hardware  ems  efficiency  dirty-hands  street-fighting  fermi  estimate  retention  physics  interdisciplinary  multi  wiki  links  people  🔬  atoms  automata  duplication  iteration-recursion  turing  complexity  measure  nature  technology  complex-systems  bits  information-theory  circuits  robust  structure  composition-decomposition  evolution  mutation  axioms  analogy  thinking  input-output  hi-order-bits  coding-theory  flexibility  rigidity 
april 2018 by nhaliday
Complexity no Bar to AI - Gwern.net
Critics of AI risk suggest diminishing returns to computing (formalized asymptotically) means AI will be weak; this argument relies on a large number of questionable premises and ignoring additional resources, constant factors, and nonlinear returns to small intelligence advantages, and is highly unlikely. (computer science, transhumanism, AI, R)
created: 1 June 2014; modified: 01 Feb 2018; status: finished; confidence: likely; importance: 10
ratty  gwern  analysis  faq  ai  risk  speedometer  intelligence  futurism  cs  computation  complexity  tcs  linear-algebra  nonlinearity  convexity-curvature  average-case  adversarial  article  time-complexity  singularity  iteration-recursion  magnitude  multiplicative  lower-bounds  no-go  performance  hardware  humanity  psychology  cog-psych  psychometrics  iq  distribution  moments  complement-substitute  hanson  ems  enhancement  parable  detail-architecture  universalism-particularism  neuro  ai-control  environment  climate-change  threat-modeling  security  theory-practice  hacker  academia  realness  crypto  rigorous-crypto  usa  government 
april 2018 by nhaliday
Chickenhawks – Gene Expression
I know I seem like a warblogger, and I promise I’ll shift to something more esoteric and non-current-eventsy very soon, but check this table out on fatalities by profession. It ranges from 50 per 100,000 for cab-drivers to 100 per 100,000 for fisherman & loggers. Granted, there have surely been work related fatalities in the American military in the past year, but we’ve had about 30 fatalities so far, and perhaps we’ll go up to 200-300 in the current campaign if we don’t get into house-to-house fighting. How many fatalities occurred during the Afghan campaign? Look at this table of historic casualty rates. I don’t do this to say that being a soldier is something that isn’t a big deal-but for me, the “chickenhawk” insult seems less resonant taking into the account the changes that have been wrought by technology in the post-Vietnam era. Casualty rates seem to be approaching the order of magnitude of some of the more cvil dangerous professions. That is most certainly a good thing.
gnxp  scitariat  commentary  war  meta:war  usa  iraq-syria  MENA  military  death  pro-rata  data  comparison  fighting  outcome-risk  uncertainty  martial  time-series  history  early-modern  mostly-modern  pre-ww2  world-war  europe  gallic  revolution  the-south  germanic  israel  scale  magnitude  cold-war 
february 2018 by nhaliday
Why do stars twinkle?
According to many astronomers and educators, twinkle (stellar scintillation) is caused by atmospheric structure that works like ordinary lenses and prisms. Pockets of variable temperature - and hence index of refraction - randomly shift and focus starlight, perceived by eye as changes in brightness. Pockets also disperse colors like prisms, explaining the flashes of color often seen in bright stars. Stars appear to twinkle more than planets because they are points of light, whereas the twinkling points on planetary disks are averaged to a uniform appearance. Below, figure 1 is a simulation in glass of the kind of turbulence structure posited in the lens-and-prism theory of stellar scintillation, shown over the Penrose tile floor to demonstrate the random lensing effects.

However appealing and ubiquitous on the internet, this popular explanation is wrong, and my aim is to debunk the myth. This research is mostly about showing that the lens-and-prism theory just doesn't work, but I also have a stellar list of references that explain the actual cause of scintillation, starting with two classic papers by C.G. Little and S. Chandrasekhar.
nibble  org:junk  space  sky  visuo  illusion  explanans  physics  electromag  trivia  cocktail  critique  contrarianism  explanation  waves  simulation  experiment  hmm  magnitude  atmosphere  roots  idk 
december 2017 by nhaliday
light - Why doesn't the moon twinkle? - Astronomy Stack Exchange
As you mention, when light enters our atmosphere, it goes through several parcels of gas with varying density, temperature, pressure, and humidity. These differences make the refractive index of the parcels different, and since they move around (the scientific term for air moving around is "wind"), the light rays take slightly different paths through the atmosphere.

Stars are point sources
…the Moon is not
nibble  q-n-a  overflow  space  physics  trivia  cocktail  navigation  sky  visuo  illusion  measure  random  electromag  signal-noise  flux-stasis  explanation  explanans  magnitude  atmosphere  roots 
december 2017 by nhaliday
Estimation of effect size distribution from genome-wide association studies and implications for future discoveries
We report a set of tools to estimate the number of susceptibility loci and the distribution of their effect sizes for a trait on the basis of discoveries from existing genome-wide association studies (GWASs). We propose statistical power calculations for future GWASs using estimated distributions of effect sizes. Using reported GWAS findings for height, Crohn’s disease and breast, prostate and colorectal (BPC) cancers, we determine that each of these traits is likely to harbor additional loci within the spectrum of low-penetrance common variants. These loci, which can be identified from sufficiently powerful GWASs, together could explain at least 15–20% of the known heritability of these traits. However, for BPC cancers, which have modest familial aggregation, our analysis suggests that risk models based on common variants alone will have modest discriminatory power (63.5% area under curve), even with new discoveries.

later paper:
Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants: http://www.pnas.org/content/108/44/18026.full

Recent discoveries of hundreds of common susceptibility SNPs from genome-wide association studies provide a unique opportunity to examine population genetic models for complex traits. In this report, we investigate distributions of various population genetic parameters and their interrelationships using estimates of allele frequencies and effect-size parameters for about 400 susceptibility SNPs across a spectrum of qualitative and quantitative traits. We calibrate our analysis by statistical power for detection of SNPs to account for overrepresentation of variants with larger effect sizes in currently known SNPs that are expected due to statistical power for discovery. Across all qualitative disease traits, minor alleles conferred “risk” more often than “protection.” Across all traits, an inverse relationship existed between “regression effects” and allele frequencies. Both of these trends were remarkably strong for type I diabetes, a trait that is most likely to be influenced by selection, but were modest for other traits such as human height or late-onset diseases such as type II diabetes and cancers. Across all traits, the estimated effect-size distribution suggested the existence of increasingly large numbers of susceptibility SNPs with decreasingly small effects. For most traits, the set of SNPs with intermediate minor allele frequencies (5–20%) contained an unusually small number of susceptibility loci and explained a relatively small fraction of heritability compared with what would be expected from the distribution of SNPs in the general population. These trends could have several implications for future studies of common and uncommon variants.

...

Relationship Between Allele Frequency and Effect Size. We explored the relationship between allele frequency and effect size in different scales. An inverse relationship between the squared regression coefficient and f(1 − f) was observed consistently across different traits (Fig. 3). For a number of these traits, however, the strengths of these relationships become less pronounced after adjustment for ascertainment due to study power. The strength of the trend, as captured by the slope of the fitted line (Table 2), markedly varies between traits, with an almost 10-fold change between the two extremes of distinct types of traits. After adjustment, the most pronounced trend was seen for type I diabetes and Crohn’s disease among qualitative traits and LDL level among quantitative traits. In exploring the relationship between the frequency of the risk allele and the magnitude of the associated risk coefficient (Fig. S4), we observed a quadratic pattern that indicates increasing risk coefficients as the risk-allele frequency diverges away from 0.50 either toward 0 or toward 1. Thus, it appears that regression coefficients for common susceptibility SNPs increase in magnitude monotonically with decreasing minor-allele frequency, irrespective of whether the minor allele confers risk or protection. However, for some traits, such as type I diabetes, risk alleles were predominantly minor alleles, that is, they had frequencies of less than 0.50.
pdf  nibble  study  article  org:nat  🌞  biodet  genetics  population-genetics  GWAS  QTL  distribution  disease  cancer  stat-power  bioinformatics  magnitude  embodied  prediction  scale  scaling-up  variance-components  multi  missing-heritability  effect-size  regression  correlation  data 
november 2017 by nhaliday
Autoignition temperature - Wikipedia
The autoignition temperature or kindling point of a substance is the lowest temperature at which it spontaneously ignites in normal atmosphere without an external source of ignition, such as a flame or spark. This temperature is required to supply the activation energy needed for combustion. The temperature at which a chemical ignites decreases as the pressure or oxygen concentration increases. It is usually applied to a combustible fuel mixture.

The time {\displaystyle t_{\text{ig}}} {\displaystyle t_{\text{ig}}} it takes for a material to reach its autoignition temperature {\displaystyle T_{\text{ig}}} {\displaystyle T_{\text{ig}}} when exposed to a heat flux {\displaystyle q''} {\displaystyle q''} is given by the following equation:
nibble  wiki  reference  concept  metrics  identity  physics  thermo  temperature  time  stock-flow  phys-energy  chemistry  article  street-fighting  fire  magnitude  data  list 
november 2017 by nhaliday
Static electricity - Wikipedia
Electrons can be exchanged between materials on contact; materials with weakly bound electrons tend to lose them while materials with sparsely filled outer shells tend to gain them. This is known as the triboelectric effect and results in one material becoming positively charged and the other negatively charged. The polarity and strength of the charge on a material once they are separated depends on their relative positions in the triboelectric series. The triboelectric effect is the main cause of static electricity as observed in everyday life, and in common high-school science demonstrations involving rubbing different materials together (e.g., fur against an acrylic rod). Contact-induced charge separation causes your hair to stand up and causes "static cling" (for example, a balloon rubbed against the hair becomes negatively charged; when near a wall, the charged balloon is attracted to positively charged particles in the wall, and can "cling" to it, appearing to be suspended against gravity).
nibble  wiki  reference  article  physics  electromag  embodied  curiosity  IEEE  dirty-hands  phys-energy  safety  data  magnitude  scale 
november 2017 by nhaliday

« earlier    

related tags

-  abortion-contraception-embryo  absolute-relative  academia  accuracy  acm  acmtariat  additive-combo  additive  adversarial  after  agriculture  ai-control  ai  algebraic-complexity  allodium  alt-inst  amt  an  analogy  analysis  analytical-holistic  and  anthropic  anthropology  antiquity  aphorism  apollonian-dionysian  applicability-prereqs  applications  approximation  archaeology  archaics  architecture  arms  arrows  article  asia  atmosphere  atoms  automata  automation  average-case  axioms  backup  bandits  bare-hands  behavioral-gen  benchmark  benchmarks  better-explained  bias-variance  big-list  big-peeps  big-picture  binomial  bio  biodet  bioinformatics  biophysical-econ  biotech  bits  bonferroni  books  boolean-analysis  bostrom  bounded-cognition  brain-scan  britain  broad-econ  build  business  but  by  calculation  caltech  cancer  capital  cash_  characterization  chart  cheatsheet  chemistry  china  christianity  circuits  civilization  cjones-like  class  classic  clever-rats  climate-change  cliometrics  coarse-fine  cocktail  code  coding-theory  cog-psych  cold-war  combo-optimization  commentary  communication  comparison  compensation  competition  complement-substitute  complex-systems  complexity  composition-decomposition  computation  computer  computing  concentration-of-measure  concept  concurrency  confidence  conquest-empire  context  contracts  contrarianism  convergence  convexity-curvature  cooperate-defect  correlation  cost-benefit  counterexample  course  crawl  critique  crosstab  crypto  cs  cultural-dynamics  curiosity  current-events  curvature  cybernetics  cycles  darwinian  data-science  data  database  dataset  dataviz  death  deep-materialism  defense  definite-planning  definition  degrees-of-freedom  demographics  dennett  density  dependence-independence  detail-architecture  deterrence  developing-world  differential  dimensionality  direction  dirty-hands  discovery  discrete  discussion  disease  distribution  divergence  duplication  duty  dynamic  dysgenics  early-modern  earth  earthquakes  ecology  econ-metrics  econ-productivity  econometrics  economics  econotariat  eden-heaven  education  effect-size  efficiency  electromag  elite  embeddings  embodied-street-fighting  embodied  empirical  ems  endocrine  energy-resources  engineering  enhancement  entrepreneurialism  entropy-like  environment  ergodic  essay  estimate  ethics  europe  evidence-based  evidence  evolution  example  examples  expansionism  expectancy  experiment  expert-experience  expert  explanans  explanation  exposition  extrema  faq  fermi  feudal  fiction  fighting  finance  finiteness  fire  flexibility  fluid  flux-stasis  food  for  foreign-policy  fourier  free_  frequency  from  frontier  futurism  gallic  games  garett-jones  gedanken  gender-diff  gender  gene-drift  genetic-load  genetics  genomics  geoengineering  geography  geometry  geopolitics  germanic  giants  gibbon  gnon  gnxp  gotchas  government  gowers  graph-theory  graphs  gravity  great-powers  gregory-clark  grew  ground-up  growth-econ  guide  gwas  gwern  hack  hacker  hanson  hard-tech  hardware  healthcare  heavy-industry  heuristic  hey_  hi-order-bits  higher-ed  history  hmm  homo-hetero  hours!  hsu  huge-data-the-biggest  human-capital  humanity  hypochondria  hypothesis-testing  ideas  identity  idk  ieee  if  iidness  illusion  immune  impact  in  incentives  india  individualism-collectivism  industrial-org  industrial-revolution  inequality  info-dynamics  info-econ  infographic  information-theory  inner-product  innovation  input-output  instinct  institutions  integral  intelligence  interdisciplinary  internet  intersection-connectedness  intersection  intricacy  introducing  invariance  iq  iran  iraq-syria  iron-age  is-ought  ising  islam  israel  it's  it  iteration-recursion  janus  japan  journos-pundits  kinship  korea  kumbaya-kult  labor  latency  latin-america  lecture-notes  lectures  len:long  lens  let-me-see  letters  levers  leviathan  limits  linear-algebra  links  list  lived-experience  local-global  logos  long-short-run  lower-bounds  luckily  machine-learning  malthus  manifolds  map-territory  maps  marginal  market-power  markets  martial  math.ca  math.co  math.cv  math.ds  math.fa  math.gr  math  mathtariat  maxim-gun  measure  measurement  mechanics  medieval  mediterranean  mena  mental-math  meta-analysis  meta:math  meta:prediction  meta:science  meta:war  metabuch  methodology  metric-space  metrics  micro  microfoundations  military  milliseconds  minutes  missing-heritability  ml-map-e  mobility  model-class  models  moments  money  monte-carlo  morality  mostly-modern  motivation  multi  multiplicative  murray  mutation  my  mystic  nationalism-globalism  nature  navigation  need  network-structure  networking  neuro-nitgrit  neuro  new-religion  news  nibble  nietzschean  nihil  nitty-gritty  no-go  noahpinion  nonlinearity  nonparametric  not  nuclear  null-result  nutrition  objektbuch  occident  oceans  of  offense-defense  old-anglo  oly  online-learning  optimization  order-disorder  order  orders  orfe  org:biz  org:bleg  org:bv  org:data  org:davos  org:edu  org:foreign  org:gov  org:junk  org:lite  org:mag  org:mat  org:nat  org:ngo  org:popup  org:rec  org:sci  organizing  orient  oscillation  outcome-risk  outliers  overflow  overhead  p:*  papers  parable  paradox  parametric  parasites-microbiome  pareto  path-dependence  paul-romer  pay  pdf  peace-violence  people  percolation  performance  phalanges  phase-transition  philosophy  phys-energy  physics  pic  pigeonhole-markov  piketty  planning  plots  pogson  polarization  policy  political-econ  pop-diff  pop-structure  population-genetics  population  positivity  power-law  power  pre-ww2  prediction  prepping  preprint  pro-rata  probabilistic-method  probability  problem-solving  profile  project  proof-systems  proofs  properties  property-rights  proxies  pseudoe  pseudorandomness  psychology  psychometrics  public-goodish  publishing  python  q-n-a  qra  qtl  quantifiers-sums  quantum-info  quantum  questions  quixotic  quora  quotes  random-matrices  random  ranking  rant  ratty  realness  realpolitik  reason  recent-selection  recommendations  recruiting  red-queen  reddit  reduction  reference  regression  regularizer  relativity  relaxation  religion  research  retention  review  revolution  rhythm  rigidity  rigorous-crypto  risk  robust  roots  rot  russia  s:**  s:*  safety  sapiens  scale  scaling-up  science-anxiety  science  scifi-fantasy  scitariat  search  second  seconds  securities  security  selection  server's  sex  signal-noise  signum  simulation  singularity  sinosphere  skeleton  skunkworks  sky  slides  so  social-psych  social-structure  social  sociality  society  sociology  soft-question  software  space-complexity  space  spatial  spearhead  speculation  speed  speedometer  spock  spreading  stagnation  stat-mech  stat-power  stats  stirling  stochastic-processes  stock-flow  stories  strategy  street-fighting  structure  study  stylized-facts  summary  supply-demand  survey  survival  sweet  symmetry  synthesis  systematic-ad-hoc  tactics  tails  tcs  tcstariat  technology  techtariat  temperature  tensors  test  tetlock  the-bones  the-classics  the-great-west-whale  the-self  the-south  the-trenches  the-world-is-just-atoms  theory-practice  theos  there's  thermo  things  thinking  this  threat-modeling  thucydides  tidbits  tightness  time-complexity  time-series  time.  time  time_  times  to  todo  top-n  topology  traces  trade  transportation  travel  trends  tribalism  tricki  trivia  trust  turing  tutorial  twitter  unaffiliated  uncertainty  unintended-consequences  unit  universalism-particularism  urban-rural  urban  us-them  usa  using  variance-components  visual-understanding  visualization  visuo  volo-avolo  von-neumann  war  waves  wealth-of-nations  wealth  web  west-hunter  white-paper  whole-partial-many  wiki  wild-ideas  willing  winner-take-all  wire-guided  with  within-without  wonkish  works  world-war  world  xenobio  yoga  you're  you  yours.  🌞  🎩  👳  🔬 

Copy this bookmark:



description:


tags: