metric-space   27

Genetics: CHROMOSOMAL MAPS AND MAPPING FUNCTIONS
Any particular gene has a specific location (its "locus") on a particular chromosome. For any two genes (or loci) alpha and beta, we can ask "What is the recombination frequency between them?" If the genes are on different chromosomes, the answer is 50% (independent assortment). If the two genes are on the same chromosome, the recombination frequency will be somewhere in the range from 0 to 50%. The "map unit" (1 cM) is the genetic map distance that corresponds to a recombination frequency of 1%. In large chromosomes, the cumulative map distance may be much greater than 50cM, but the maximum recombination frequency is 50%. Why? In large chromosomes, there is enough length to allow for multiple cross-overs, so we have to ask what result we expect for random multiple cross-overs.

1. How is it that random multiple cross-overs give the same result as independent assortment?

Figure 5.12 shows how the various double cross-over possibilities add up, resulting in gamete genotype percentages that are indistinguisable from independent assortment (50% parental type, 50% non-parental type). This is a very important figure. It provides the explanation for why genes that are far apart on a very large chromosome sort out in crosses just as if they were on separate chromosomes.

2. Is there a way to measure how close together two crossovers can occur involving the same two chromatids? That is, how could we measure whether there is spacial "interference"?

Figure 5.13 shows how a measurement of the gamete frequencies resulting from a "three point cross" can answer this question. If we would get a "lower than expected" occurrence of recombinant genotypes aCb and AcB, it would suggest that there is some hindrance to the two cross-overs occurring this close together. Crosses of this type in Drosophila have shown that, in this organism, double cross-overs do not occur at distances of less than about 10 cM between the two cross-over sites. ( Textbook, page 196. )

3. How does all of this lead to the "mapping function", the mathematical (graphical) relation between the observed recombination frequency (percent non-parental gametes) and the cumulative genetic distance in map units?

Figure 5.14 shows the result for the two extremes of "complete interference" and "no interference". The situation for real chromosomes in real organisms is somewhere between these extremes, such as the curve labelled "interference decreasing with distance".
org:junk  org:edu  explanation  faq  nibble  genetics  genomics  bio  ground-up  magnitude  data  flux-stasis  homo-hetero  measure  orders  metric-space  limits  measurement 
october 2017 by nhaliday
inequalities - Is the Jaccard distance a distance? - MathOverflow
Steinhaus Transform
the referenced survey: http://kenclarkson.org/nn_survey/p.pdf

It's known that this transformation produces a metric from a metric. Now if you take as the base metric D the symmetric difference between two sets, what you end up with is the Jaccard distance (which actually is known by many other names as well).
q-n-a  overflow  nibble  math  acm  sublinear  metrics  metric-space  proofs  math.CO  tcstariat  arrows  reduction  measure  math.MG  similarity  multi  papers  survey  computational-geometry  cs  algorithms  pdf  positivity  msr  tidbits  intersection  curvature  convexity-curvature  intersection-connectedness  signum 
february 2017 by nhaliday
MinHash - Wikipedia
- goal: compute Jaccard coefficient J(A, B) = |A∩B| / |A∪B| in sublinear space
- idea: pick random injective hash function h, define h_min(S) = argmin_{x in S} h(x), and note that Pr[h_min(A) = h_min(B)] = J(A, B)
- reduce variance w/ Chernoff bound
algorithms  data-structures  sublinear  hashing  wiki  reference  random  tcs  nibble  measure  metric-space  metrics  similarity  PAC  intersection  intersection-connectedness 
february 2017 by nhaliday
Ethnic fractionalization and growth | Dietrich Vollrath
Garett Jones did a podcast with The Economics Detective recently on the costs of ethnic diversity. It is particularly worth listening to given that racial identity has re-emerged as a salient element of politics. A quick summary - and the link above includes a nice write-up of relevant sources - would be that diversity within workplaces does not appear to improve outcomes (however those outcomes are measured).

At the same time, there is a parallel literature, touched on in the podcast, about ethnic diversity (or fractionalization, as it is termed in that literature) and economic growth. But one has to be careful drawing a bright line between the two literatures. It does not follow that the results for workplace diversity imply the results regarding economic growth. And this is because the growth results, to the extent that you believe they are robust, all operate through political systems.

So here let me walk through some of the core empirical relationships that have been found regarding ethnic fractionalization and economic growth, and then talk about why you need to take care with over-interpreting them. This is not a thorough literature review, and I realize there are other papers in the same vein. What I’m after is characterizing the essential results.

--

- objection about sensitivity of measure to definition of clusters seems dumb to me (point is to fix definitions than compare different polities. as long as direction and strength of correlation is fairly robust to changes in clustering, this is a stupid critique)
- also, could probably define a less arbitrary notion of fractionalization (w/o fixed clustering or # of clusters) if using points in a metric/vector/euclidean space (eg, genomes)
- eg, A Generalized Index of Ethno-Linguistic Fractionalization: http://www-3.unipv.it/webdept/prin/workpv02.pdf
So like -E_{A, B ~ X} d(A, B). Or maybe -E_{A, B ~ X} f(d(A, B)) for f an increasing function (in particular, f(x) = x^2).

Note that E ||A - B|| = Θ(E ||E[A] - A||), and E ||A - B||^2 = 2Var A,
for A, B ~ X, so this is just quantifying deviation from mean for Euclidean spaces.

In the case that you have a bunch of difference clusters w/ centers equidistant (so n+1 in R^n), measures p_i, and internal variances σ_i^2, you get E ||A - B||^2 = -2∑_i p_i^2σ_i^2 - ∑_{i≠j} p_ip_j(1 + σ_i^2 + σ_j^2) = -2∑_i p_i^2σ_i^2 - ∑_{i≠j} p_ip_j(1 + σ_i^2 + σ_j^2) = -∑_i p_i^2(1 + 2σ_i^2) - ∑_i 2p_i(1-p_i)σ_i^2
(inter-center distance scaled to 1 wlog).
(in general, if you allow _approximate_ equidistance, you can pack in exp(O(n)) clusters via JL lemma)
econotariat  economics  growth-econ  diversity  spearhead  study  summary  list  survey  cracker-econ  hive-mind  stylized-facts  🎩  garett-jones  wonkish  populism  easterly  putnam-like  metric-space  similarity  dimensionality  embeddings  examples  metrics  sociology  polarization  big-peeps  econ-metrics  s:*  corruption  cohesion  government  econ-productivity  religion  broad-econ  social-capital  madisonian  chart  article  wealth-of-nations  the-bones  political-econ  public-goodish  microfoundations  alesina  🌞  multi  pdf  concept  conceptual-vocab  definition  hari-seldon 
december 2016 by nhaliday

related tags

acm  advanced  alesina  algorithm  algorithms  applications  approximation  arrows  article  atoms  best-practices  big-peeps  bio  bioinformatics  boolean-analysis  broad-econ  cartoons  characterization  chart  cheatsheet  coding-theory  cohesion  computational-geometry  concentration-of-measure  concept  conceptual-vocab  convexity-curvature  corruption  counterexample  course  cracker-econ  cs  curiosity  curvature  data-structure  data-structures  data  definition  dimensionality  direction  distance  diversity  duplication  easterly  econ-metrics  econ-productivity  economics  econotariat  embeddings  engineering  examples  expert-experience  expert  explanation  exposition  faq  features  fixed-point  flux-stasis  garett-jones  genetics  genomics  geometry  government  graphs  ground-up  growth-econ  gwas  hamming  hari-seldon  hashing  hi-order-bits  hierarchy  high-dimension  hive-mind  homepage  homo-hetero  homogeneity  ideas  inference  information-theory  inner-product  intersection-connectedness  intersection  intuition  invariance  lecture-notes  levenshtein-distance  levers  limits  linear-algebra  linearity  list  machine-learning  madisonian  magnitude  math.ca  math.co  math.fa  math.gn  math.mg  math  mathematics  mathtariat  measure  measurement  metabuch  metrics  microfoundations  mihai  mit  motivation  msr  multi  nearestneighbor  network-analysis  nibble  norms  novelty  orders  org:edu  org:junk  overflow  p:someday  pac  papers  pdf  polarization  political-econ  populism  positivity  princeton  probability  prof  programming  proofs  properties  public-goodish  putnam-like  q-n-a  qra  quixotic  rand-approx  random  reduction  reference  religion  rigidity  s:*  search  separation  shapeanalysis  signum  similarity  simplex  slides  small-world  smoothness  social-capital  sociology  soft-question  sparsity  spatial  spearhead  statistics  strings  study  stylized-facts  sublinear  summary  survey  synthesis  talks  tcs  tcstariat  techtariat  the-bones  thinking  tidbits  topics  topology  trees  unit  visual-understanding  visualization  wealth-of-nations  wiki  wikipedia  wonkish  yoga  zooming  🌞  🎩  👳 

Copy this bookmark:



description:


tags: