nhaliday + metric-space   23

Any particular gene has a specific location (its "locus") on a particular chromosome. For any two genes (or loci) alpha and beta, we can ask "What is the recombination frequency between them?" If the genes are on different chromosomes, the answer is 50% (independent assortment). If the two genes are on the same chromosome, the recombination frequency will be somewhere in the range from 0 to 50%. The "map unit" (1 cM) is the genetic map distance that corresponds to a recombination frequency of 1%. In large chromosomes, the cumulative map distance may be much greater than 50cM, but the maximum recombination frequency is 50%. Why? In large chromosomes, there is enough length to allow for multiple cross-overs, so we have to ask what result we expect for random multiple cross-overs.

1. How is it that random multiple cross-overs give the same result as independent assortment?

Figure 5.12 shows how the various double cross-over possibilities add up, resulting in gamete genotype percentages that are indistinguisable from independent assortment (50% parental type, 50% non-parental type). This is a very important figure. It provides the explanation for why genes that are far apart on a very large chromosome sort out in crosses just as if they were on separate chromosomes.

2. Is there a way to measure how close together two crossovers can occur involving the same two chromatids? That is, how could we measure whether there is spacial "interference"?

Figure 5.13 shows how a measurement of the gamete frequencies resulting from a "three point cross" can answer this question. If we would get a "lower than expected" occurrence of recombinant genotypes aCb and AcB, it would suggest that there is some hindrance to the two cross-overs occurring this close together. Crosses of this type in Drosophila have shown that, in this organism, double cross-overs do not occur at distances of less than about 10 cM between the two cross-over sites. ( Textbook, page 196. )

3. How does all of this lead to the "mapping function", the mathematical (graphical) relation between the observed recombination frequency (percent non-parental gametes) and the cumulative genetic distance in map units?

Figure 5.14 shows the result for the two extremes of "complete interference" and "no interference". The situation for real chromosomes in real organisms is somewhere between these extremes, such as the curve labelled "interference decreasing with distance".
org:junk  org:edu  explanation  faq  nibble  genetics  genomics  bio  ground-up  magnitude  data  flux-stasis  homo-hetero  measure  orders  metric-space  limits  measurement 
october 2017 by nhaliday
Pearson correlation coefficient - Wikipedia
what does this mean?: https://twitter.com/GarettJones/status/863546692724858880
deleted but it was about the Pearson correlation distance: 1-r
I guess it's a metric


A less misleading way to think about the correlation R is as follows: given X,Y from a standardized bivariate distribution with correlation R, an increase in X leads to an expected increase in Y: dY = R dX. In other words, students with +1 SD SAT score have, on average, roughly +0.4 SD college GPAs. Similarly, students with +1 SD college GPAs have on average +0.4 SAT.

this reminds me of the breeder's equation (but it uses r instead of h^2, so it can't actually be the same)

stats  science  hypothesis-testing  correlation  metrics  plots  regression  wiki  reference  nibble  methodology  multi  twitter  social  discussion  best-practices  econotariat  garett-jones  concept  conceptual-vocab  accuracy  causation  acm  matrix-factorization  todo  explanation  yoga  hsu  street-fighting  levers  🌞  2014  scitariat  variance-components  meta:prediction  biodet  s:**  mental-math  reddit  commentary  ssc  poast  gwern  data-science  metric-space  similarity  measure  dependence-independence 
may 2017 by nhaliday
inequalities - Is the Jaccard distance a distance? - MathOverflow
Steinhaus Transform
the referenced survey: http://kenclarkson.org/nn_survey/p.pdf

It's known that this transformation produces a metric from a metric. Now if you take as the base metric D the symmetric difference between two sets, what you end up with is the Jaccard distance (which actually is known by many other names as well).
q-n-a  overflow  nibble  math  acm  sublinear  metrics  metric-space  proofs  math.CO  tcstariat  arrows  reduction  measure  math.MG  similarity  multi  papers  survey  computational-geometry  cs  algorithms  pdf  positivity  msr  tidbits  intersection  curvature  convexity-curvature  intersection-connectedness  signum 
february 2017 by nhaliday
MinHash - Wikipedia
- goal: compute Jaccard coefficient J(A, B) = |A∩B| / |A∪B| in sublinear space
- idea: pick random injective hash function h, define h_min(S) = argmin_{x in S} h(x), and note that Pr[h_min(A) = h_min(B)] = J(A, B)
- reduce variance w/ Chernoff bound
algorithms  data-structures  sublinear  hashing  wiki  reference  random  tcs  nibble  measure  metric-space  metrics  similarity  PAC  intersection  intersection-connectedness 
february 2017 by nhaliday
Ethnic fractionalization and growth | Dietrich Vollrath
Garett Jones did a podcast with The Economics Detective recently on the costs of ethnic diversity. It is particularly worth listening to given that racial identity has re-emerged as a salient element of politics. A quick summary - and the link above includes a nice write-up of relevant sources - would be that diversity within workplaces does not appear to improve outcomes (however those outcomes are measured).

At the same time, there is a parallel literature, touched on in the podcast, about ethnic diversity (or fractionalization, as it is termed in that literature) and economic growth. But one has to be careful drawing a bright line between the two literatures. It does not follow that the results for workplace diversity imply the results regarding economic growth. And this is because the growth results, to the extent that you believe they are robust, all operate through political systems.

So here let me walk through some of the core empirical relationships that have been found regarding ethnic fractionalization and economic growth, and then talk about why you need to take care with over-interpreting them. This is not a thorough literature review, and I realize there are other papers in the same vein. What I’m after is characterizing the essential results.


- objection about sensitivity of measure to definition of clusters seems dumb to me (point is to fix definitions than compare different polities. as long as direction and strength of correlation is fairly robust to changes in clustering, this is a stupid critique)
- also, could probably define a less arbitrary notion of fractionalization (w/o fixed clustering or # of clusters) if using points in a metric/vector/euclidean space (eg, genomes)
- eg, A Generalized Index of Ethno-Linguistic Fractionalization: http://www-3.unipv.it/webdept/prin/workpv02.pdf
So like -E_{A, B ~ X} d(A, B). Or maybe -E_{A, B ~ X} f(d(A, B)) for f an increasing function (in particular, f(x) = x^2).

Note that E ||A - B|| = Θ(E ||E[A] - A||), and E ||A - B||^2 = 2Var A,
for A, B ~ X, so this is just quantifying deviation from mean for Euclidean spaces.

In the case that you have a bunch of difference clusters w/ centers equidistant (so n+1 in R^n), measures p_i, and internal variances σ_i^2, you get E ||A - B||^2 = -2∑_i p_i^2σ_i^2 - ∑_{i≠j} p_ip_j(1 + σ_i^2 + σ_j^2) = -2∑_i p_i^2σ_i^2 - ∑_{i≠j} p_ip_j(1 + σ_i^2 + σ_j^2) = -∑_i p_i^2(1 + 2σ_i^2) - ∑_i 2p_i(1-p_i)σ_i^2
(inter-center distance scaled to 1 wlog).
(in general, if you allow _approximate_ equidistance, you can pack in exp(O(n)) clusters via JL lemma)
econotariat  economics  growth-econ  diversity  spearhead  study  summary  list  survey  cracker-econ  hive-mind  stylized-facts  🎩  garett-jones  wonkish  populism  easterly  putnam-like  metric-space  similarity  dimensionality  embeddings  examples  metrics  sociology  polarization  big-peeps  econ-metrics  s:*  corruption  cohesion  government  econ-productivity  religion  broad-econ  social-capital  madisonian  chart  article  wealth-of-nations  the-bones  political-econ  public-goodish  microfoundations  alesina  🌞  multi  pdf  concept  conceptual-vocab  definition  hari-seldon 
december 2016 by nhaliday
Math attic
includes a nice visualization of implications between properties of topological spaces
math  visualization  visual-understanding  metabuch  techtariat  graphs  topology  synthesis  math.GN  separation  metric-space  zooming  inference  cheatsheet 
march 2016 by nhaliday

bundles : mathsp

related tags

accuracy  acm  advanced  alesina  algorithms  applications  approximation  arrows  article  atoms  best-practices  big-peeps  bio  biodet  bioinformatics  boolean-analysis  broad-econ  cartoons  causation  characterization  chart  cheatsheet  coding-theory  cohesion  commentary  computational-geometry  concentration-of-measure  concept  conceptual-vocab  contiguity-proximity  convexity-curvature  correlation  corruption  counterexample  course  cracker-econ  cs  curiosity  curvature  data  data-science  data-structures  definition  dependence-independence  dimensionality  direction  discussion  diversity  duplication  easterly  econ-metrics  econ-productivity  economics  econotariat  embeddings  engineering  examples  expert  expert-experience  explanation  exposition  faq  features  fixed-point  flux-stasis  garett-jones  genetics  genomics  geometry  government  graphs  ground-up  growth-econ  GWAS  gwern  hamming  hari-seldon  hashing  hi-order-bits  hierarchy  high-dimension  hive-mind  homepage  homo-hetero  homogeneity  hsu  hypothesis-testing  ideas  inference  inner-product  intersection  intersection-connectedness  intuition  invariance  lecture-notes  levers  limits  linear-algebra  linearity  list  machine-learning  madisonian  magnitude  math  math.CA  math.CO  math.FA  math.GN  math.MG  mathtariat  matrix-factorization  measure  measurement  mental-math  meta:prediction  metabuch  methodology  metric-space  metrics  microfoundations  mihai  mit  motivation  msr  multi  nibble  norms  novelty  orders  org:edu  org:junk  overflow  p:someday  PAC  papers  pdf  plots  poast  polarization  political-econ  populism  positivity  princeton  prof  programming  proofs  properties  public-goodish  putnam-like  q-n-a  qra  quixotic  rand-approx  random  reddit  reduction  reference  regression  religion  rigidity  s:*  s:**  science  scitariat  separation  signum  similarity  simplex  slides  smoothness  social  social-capital  sociology  soft-question  sparsity  spatial  spearhead  ssc  stats  street-fighting  strings  study  stylized-facts  sublinear  summary  survey  synthesis  talks  tcs  tcstariat  techtariat  the-bones  thinking  tidbits  todo  topics  topology  trees  twitter  unit  variance-components  visual-understanding  visualization  wealth-of-nations  wiki  wonkish  yoga  zooming  🌞  🎩  👳 

Copy this bookmark: