magnitude   239
Tiny Endian
How to scrape the web and not get caught
This article will be just a quick one. It's a few line of code recipe on how to mitigate IP restrictions and WAFs when crawling the web. If you're reading this you probably already already tried web scraping. It's all easy breezy until one day someone managing the website you're harvesting data from realizes what happens and blocks your IP. If you're running your scrappers in an automated way you'll start seeing them failing miserably. You'll probably want to solve this problem fast, before any of precious data slips through your fingers.
Sa hello to proxies
While it might be tempting to use one of paid providers of such services it isn't that hard to craft a home baked solution that will cost you no money. This is thanks to an awesome project scrapy-rotating-proxies.
Just add it to your project like it is described in the documentation:
# settings.py
# ...
ROTATING_PROXY_LIST = [
'proxy1.com:8000',
'proxy2.com:8031',
# ...
]
ROTATING_PROXY_LIST_PATH = 'proxies.txt'
# ...
So, where to get this proxies.txt list from? This is easier than you think. I was not able to find a python project that would provide a list free proxies out of the box, but there is a list-proxies node module made exactly for that!
Installation is extremely simple, as well as usage:
proxy-lists getProxies --sources-white-list="gatherproxy,sockslist"
This will save a bulky list of proxies in your proxies.txt file.
Say hello to Makefiles
Now you're essentially running a mixed-language project (with Python for scrapy and JS for list-proxies). You need a way to synchronize these two tools. What would be better than the lingua franca of builds and orchestration - the Makefile.
Just create a target:
all:
yarn run proxy-lists getProxies --sources-white-list=PROXIES_SOURCE_LIST
scrapy crawl mycrawler -o myoutput.csv
rm -r proxies.txt
And after you're done with that, your build step in Jenkins becomes just:
make all
Things to consider
Of  course  there's  an  overhead  to  pay  for  using  this  -  after  introducing  proxies  my  crawl  times  grew  by  an  order  magnitude  from  minutes  to  hours!  But  hey_  it  works  and  it's  free_  so  if  you're  not  willing  to  pay  for  data  in  cash_  you  need  to  pay  for  it  with  time.  Luckily  for  you  with  this  sweet  hack  it's  build  server's  time_  not  yours.  from iphone
april 2018 by hendry
Theory of Self-Reproducing Automata - John von Neumann
Fourth Lecture: THE ROLE OF HIGH AND OF EXTREMELY HIGH COMPLICATION

Comparisons between computing machines and the nervous systems. Estimates of size for computing machines, present and near future.

Estimates for size for the human central nervous system. Excursus about the “mixed” character of living organisms. Analog and digital elements. Observations about the “mixed” character of all componentry, artificial as well as natural. Interpretation of the position to be taken with respect to these.

Evaluation of the discrepancy in size between artificial and natural automata. Interpretation of this discrepancy in terms of physical factors. Nature of the materials used.

The probability of the presence of other intellectual factors. The role of complication and the theoretical penetration that it requires.

Questions of reliability and errors reconsidered. Probability of individual errors and length of procedure. Typical lengths of procedure for computing machines and for living organisms--that is, for artificial and for natural automata. Upper limits on acceptable probability of error in individual operations. Compensation by checking and self-correcting features.

Differences of principle in the way in which errors are dealt with in artificial and in natural automata. The “single error” principle in artificial automata. Crudeness of our approach in this case, due to the lack of adequate theory. More sophisticated treatment of this problem in natural automata: The role of the autonomy of parts. Connections between this autonomy and evolution.

- 10^10 neurons in brain, 10^4 vacuum tubes in largest computer at time
- machines faster: 5 ms from neuron potential to neuron potential, 10^-3 ms for vacuum tubes

https://en.wikipedia.org/wiki/John_von_Neumann#Computing
pdf  article  papers  essay  nibble  math  cs  computation  bio  neuro  neuro-nitgrit  scale  magnitude  comparison  acm  von-neumann  giants  thermo  phys-energy  speed  performance  time  density  frequency  hardware  ems  efficiency  dirty-hands  street-fighting  fermi  estimate  retention  physics  interdisciplinary  multi  wiki  links  people  🔬  atoms  automata  duplication  iteration-recursion  turing  complexity  measure  nature  technology  complex-systems  bits  information-theory  circuits  robust  structure  composition-decomposition  evolution  mutation  axioms  analogy  thinking  input-output  hi-order-bits  coding-theory  flexibility  rigidity
april 2018 by nhaliday
Complexity no Bar to AI - Gwern.net
Critics of AI risk suggest diminishing returns to computing (formalized asymptotically) means AI will be weak; this argument relies on a large number of questionable premises and ignoring additional resources, constant factors, and nonlinear returns to small intelligence advantages, and is highly unlikely. (computer science, transhumanism, AI, R)
created: 1 June 2014; modified: 01 Feb 2018; status: finished; confidence: likely; importance: 10
ratty  gwern  analysis  faq  ai  risk  speedometer  intelligence  futurism  cs  computation  complexity  tcs  linear-algebra  nonlinearity  convexity-curvature  average-case  adversarial  article  time-complexity  singularity  iteration-recursion  magnitude  multiplicative  lower-bounds  no-go  performance  hardware  humanity  psychology  cog-psych  psychometrics  iq  distribution  moments  complement-substitute  hanson  ems  enhancement  parable  detail-architecture  universalism-particularism  neuro  ai-control  environment  climate-change  threat-modeling  security  theory-practice  hacker  academia  realness  crypto  rigorous-crypto  usa  government
april 2018 by nhaliday
Chickenhawks – Gene Expression
I know I seem like a warblogger, and I promise I’ll shift to something more esoteric and non-current-eventsy very soon, but check this table out on fatalities by profession. It ranges from 50 per 100,000 for cab-drivers to 100 per 100,000 for fisherman & loggers. Granted, there have surely been work related fatalities in the American military in the past year, but we’ve had about 30 fatalities so far, and perhaps we’ll go up to 200-300 in the current campaign if we don’t get into house-to-house fighting. How many fatalities occurred during the Afghan campaign? Look at this table of historic casualty rates. I don’t do this to say that being a soldier is something that isn’t a big deal-but for me, the “chickenhawk” insult seems less resonant taking into the account the changes that have been wrought by technology in the post-Vietnam era. Casualty rates seem to be approaching the order of magnitude of some of the more cvil dangerous professions. That is most certainly a good thing.
gnxp  scitariat  commentary  war  meta:war  usa  iraq-syria  MENA  military  death  pro-rata  data  comparison  fighting  outcome-risk  uncertainty  martial  time-series  history  early-modern  mostly-modern  pre-ww2  world-war  europe  gallic  revolution  the-south  germanic  israel  scale  magnitude  cold-war
february 2018 by nhaliday
Why do stars twinkle?
According to many astronomers and educators, twinkle (stellar scintillation) is caused by atmospheric structure that works like ordinary lenses and prisms. Pockets of variable temperature - and hence index of refraction - randomly shift and focus starlight, perceived by eye as changes in brightness. Pockets also disperse colors like prisms, explaining the flashes of color often seen in bright stars. Stars appear to twinkle more than planets because they are points of light, whereas the twinkling points on planetary disks are averaged to a uniform appearance. Below, figure 1 is a simulation in glass of the kind of turbulence structure posited in the lens-and-prism theory of stellar scintillation, shown over the Penrose tile floor to demonstrate the random lensing effects.

However appealing and ubiquitous on the internet, this popular explanation is wrong, and my aim is to debunk the myth. This research is mostly about showing that the lens-and-prism theory just doesn't work, but I also have a stellar list of references that explain the actual cause of scintillation, starting with two classic papers by C.G. Little and S. Chandrasekhar.
nibble  org:junk  space  sky  visuo  illusion  explanans  physics  electromag  trivia  cocktail  critique  contrarianism  explanation  waves  simulation  experiment  hmm  magnitude  atmosphere  roots  idk
december 2017 by nhaliday
light - Why doesn't the moon twinkle? - Astronomy Stack Exchange
As you mention, when light enters our atmosphere, it goes through several parcels of gas with varying density, temperature, pressure, and humidity. These differences make the refractive index of the parcels different, and since they move around (the scientific term for air moving around is "wind"), the light rays take slightly different paths through the atmosphere.

Stars are point sources
…the Moon is not
nibble  q-n-a  overflow  space  physics  trivia  cocktail  navigation  sky  visuo  illusion  measure  random  electromag  signal-noise  flux-stasis  explanation  explanans  magnitude  atmosphere  roots
december 2017 by nhaliday
Estimation of effect size distribution from genome-wide association studies and implications for future discoveries
We report a set of tools to estimate the number of susceptibility loci and the distribution of their effect sizes for a trait on the basis of discoveries from existing genome-wide association studies (GWASs). We propose statistical power calculations for future GWASs using estimated distributions of effect sizes. Using reported GWAS findings for height, Crohn’s disease and breast, prostate and colorectal (BPC) cancers, we determine that each of these traits is likely to harbor additional loci within the spectrum of low-penetrance common variants. These loci, which can be identified from sufficiently powerful GWASs, together could explain at least 15–20% of the known heritability of these traits. However, for BPC cancers, which have modest familial aggregation, our analysis suggests that risk models based on common variants alone will have modest discriminatory power (63.5% area under curve), even with new discoveries.

later paper:
Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants: http://www.pnas.org/content/108/44/18026.full

Recent discoveries of hundreds of common susceptibility SNPs from genome-wide association studies provide a unique opportunity to examine population genetic models for complex traits. In this report, we investigate distributions of various population genetic parameters and their interrelationships using estimates of allele frequencies and effect-size parameters for about 400 susceptibility SNPs across a spectrum of qualitative and quantitative traits. We calibrate our analysis by statistical power for detection of SNPs to account for overrepresentation of variants with larger effect sizes in currently known SNPs that are expected due to statistical power for discovery. Across all qualitative disease traits, minor alleles conferred “risk” more often than “protection.” Across all traits, an inverse relationship existed between “regression effects” and allele frequencies. Both of these trends were remarkably strong for type I diabetes, a trait that is most likely to be influenced by selection, but were modest for other traits such as human height or late-onset diseases such as type II diabetes and cancers. Across all traits, the estimated effect-size distribution suggested the existence of increasingly large numbers of susceptibility SNPs with decreasingly small effects. For most traits, the set of SNPs with intermediate minor allele frequencies (5–20%) contained an unusually small number of susceptibility loci and explained a relatively small fraction of heritability compared with what would be expected from the distribution of SNPs in the general population. These trends could have several implications for future studies of common and uncommon variants.

...

Relationship Between Allele Frequency and Effect Size. We explored the relationship between allele frequency and effect size in different scales. An inverse relationship between the squared regression coefficient and f(1 − f) was observed consistently across different traits (Fig. 3). For a number of these traits, however, the strengths of these relationships become less pronounced after adjustment for ascertainment due to study power. The strength of the trend, as captured by the slope of the fitted line (Table 2), markedly varies between traits, with an almost 10-fold change between the two extremes of distinct types of traits. After adjustment, the most pronounced trend was seen for type I diabetes and Crohn’s disease among qualitative traits and LDL level among quantitative traits. In exploring the relationship between the frequency of the risk allele and the magnitude of the associated risk coefficient (Fig. S4), we observed a quadratic pattern that indicates increasing risk coefficients as the risk-allele frequency diverges away from 0.50 either toward 0 or toward 1. Thus, it appears that regression coefficients for common susceptibility SNPs increase in magnitude monotonically with decreasing minor-allele frequency, irrespective of whether the minor allele confers risk or protection. However, for some traits, such as type I diabetes, risk alleles were predominantly minor alleles, that is, they had frequencies of less than 0.50.
pdf  nibble  study  article  org:nat  🌞  biodet  genetics  population-genetics  GWAS  QTL  distribution  disease  cancer  stat-power  bioinformatics  magnitude  embodied  prediction  scale  scaling-up  variance-components  multi  missing-heritability  effect-size  regression  correlation  data
november 2017 by nhaliday
Autoignition temperature - Wikipedia
The autoignition temperature or kindling point of a substance is the lowest temperature at which it spontaneously ignites in normal atmosphere without an external source of ignition, such as a flame or spark. This temperature is required to supply the activation energy needed for combustion. The temperature at which a chemical ignites decreases as the pressure or oxygen concentration increases. It is usually applied to a combustible fuel mixture.

The time {\displaystyle t_{\text{ig}}} {\displaystyle t_{\text{ig}}} it takes for a material to reach its autoignition temperature {\displaystyle T_{\text{ig}}} {\displaystyle T_{\text{ig}}} when exposed to a heat flux {\displaystyle q''} {\displaystyle q''} is given by the following equation:
nibble  wiki  reference  concept  metrics  identity  physics  thermo  temperature  time  stock-flow  phys-energy  chemistry  article  street-fighting  fire  magnitude  data  list
november 2017 by nhaliday
Static electricity - Wikipedia
Electrons can be exchanged between materials on contact; materials with weakly bound electrons tend to lose them while materials with sparsely filled outer shells tend to gain them. This is known as the triboelectric effect and results in one material becoming positively charged and the other negatively charged. The polarity and strength of the charge on a material once they are separated depends on their relative positions in the triboelectric series. The triboelectric effect is the main cause of static electricity as observed in everyday life, and in common high-school science demonstrations involving rubbing different materials together (e.g., fur against an acrylic rod). Contact-induced charge separation causes your hair to stand up and causes "static cling" (for example, a balloon rubbed against the hair becomes negatively charged; when near a wall, the charged balloon is attracted to positively charged particles in the wall, and can "cling" to it, appearing to be suspended against gravity).
nibble  wiki  reference  article  physics  electromag  embodied  curiosity  IEEE  dirty-hands  phys-energy  safety  data  magnitude  scale
november 2017 by nhaliday

Copy this bookmark:

description:

tags: