scaling-up   64

« earlier    

Estimation of effect size distribution from genome-wide association studies and implications for future discoveries
We report a set of tools to estimate the number of susceptibility loci and the distribution of their effect sizes for a trait on the basis of discoveries from existing genome-wide association studies (GWASs). We propose statistical power calculations for future GWASs using estimated distributions of effect sizes. Using reported GWAS findings for height, Crohn’s disease and breast, prostate and colorectal (BPC) cancers, we determine that each of these traits is likely to harbor additional loci within the spectrum of low-penetrance common variants. These loci, which can be identified from sufficiently powerful GWASs, together could explain at least 15–20% of the known heritability of these traits. However, for BPC cancers, which have modest familial aggregation, our analysis suggests that risk models based on common variants alone will have modest discriminatory power (63.5% area under curve), even with new discoveries.

later paper:
Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants:

Recent discoveries of hundreds of common susceptibility SNPs from genome-wide association studies provide a unique opportunity to examine population genetic models for complex traits. In this report, we investigate distributions of various population genetic parameters and their interrelationships using estimates of allele frequencies and effect-size parameters for about 400 susceptibility SNPs across a spectrum of qualitative and quantitative traits. We calibrate our analysis by statistical power for detection of SNPs to account for overrepresentation of variants with larger effect sizes in currently known SNPs that are expected due to statistical power for discovery. Across all qualitative disease traits, minor alleles conferred “risk” more often than “protection.” Across all traits, an inverse relationship existed between “regression effects” and allele frequencies. Both of these trends were remarkably strong for type I diabetes, a trait that is most likely to be influenced by selection, but were modest for other traits such as human height or late-onset diseases such as type II diabetes and cancers. Across all traits, the estimated effect-size distribution suggested the existence of increasingly large numbers of susceptibility SNPs with decreasingly small effects. For most traits, the set of SNPs with intermediate minor allele frequencies (5–20%) contained an unusually small number of susceptibility loci and explained a relatively small fraction of heritability compared with what would be expected from the distribution of SNPs in the general population. These trends could have several implications for future studies of common and uncommon variants.


Relationship Between Allele Frequency and Effect Size. We explored the relationship between allele frequency and effect size in different scales. An inverse relationship between the squared regression coefficient and f(1 − f) was observed consistently across different traits (Fig. 3). For a number of these traits, however, the strengths of these relationships become less pronounced after adjustment for ascertainment due to study power. The strength of the trend, as captured by the slope of the fitted line (Table 2), markedly varies between traits, with an almost 10-fold change between the two extremes of distinct types of traits. After adjustment, the most pronounced trend was seen for type I diabetes and Crohn’s disease among qualitative traits and LDL level among quantitative traits. In exploring the relationship between the frequency of the risk allele and the magnitude of the associated risk coefficient (Fig. S4), we observed a quadratic pattern that indicates increasing risk coefficients as the risk-allele frequency diverges away from 0.50 either toward 0 or toward 1. Thus, it appears that regression coefficients for common susceptibility SNPs increase in magnitude monotonically with decreasing minor-allele frequency, irrespective of whether the minor allele confers risk or protection. However, for some traits, such as type I diabetes, risk alleles were predominantly minor alleles, that is, they had frequencies of less than 0.50.
pdf  nibble  study  article  org:nat  🌞  biodet  genetics  population-genetics  GWAS  QTL  distribution  disease  cancer  stat-power  bioinformatics  magnitude  embodied  prediction  scale  scaling-up  variance-components  multi  missing-heritability  effect-size  regression  correlation  data 
november 2017 by nhaliday
A combined analysis of genetically correlated traits identifies 107 loci associated with intelligence | bioRxiv
We apply MTAG to three large GWAS: Sniekers et al (2017) on intelligence, Okbay et al. (2016) on Educational attainment, and Hill et al. (2016) on household income. By combining these three samples our functional sample size increased from 78 308 participants to 147 194. We found 107 independent loci associated with intelligence, implicating 233 genes, using both SNP-based and gene-based GWAS. We find evidence that neurogenesis may explain some of the biological differences in intelligence as well as genes expressed in the synapse and those involved in the regulation of the nervous system.


Finally, using an independent sample of 6 844 individuals we were able to predict 7% of intelligence using SNP data alone.
study  bio  preprint  biodet  behavioral-gen  GWAS  genetics  iq  education  compensation  composition-decomposition  🌞  gwern  meta-analysis  genetic-correlation  scaling-up  methodology  correlation  state-of-art  neuro  neuro-nitgrit  dimensionality 
july 2017 by nhaliday
10 million DTC dense marker genotypes by end of 2017? – Gene Expression
Ultimately I do wonder if I was a bit too optimistic that 50% of the US population will be sequenced at 30x by 2025. But the dynamic is quite likely to change rapidly because of a technological shift as the sector goes through a productivity uptick. We’re talking about exponential growth, which humans have weak intuition about….
gnxp  scitariat  commentary  biotech  scaling-up  genetics  genomics  scale  bioinformatics  multi  toys  measurement  duplication  signal-noise  coding-theory 
june 2017 by nhaliday
Genomic analysis of family data reveals additional genetic effects on intelligence and personality | bioRxiv
Using Extended Genealogy to Estimate Components of Heritability for 23 Quantitative and Dichotomous Traits:
Pedigree- and SNP-Associated Genetics and Recent Environment are the Major Contributors to Anthropometric and Cardiometabolic Trait Variation:

Missing Heritability – found?:
There is an interesting new paper out on genetics and IQ. The claim is that they have found the missing heritability – in rare variants, generally different in each family.

Some of the variants, the ones we find with GWAS, are fairly common and fitness-neutral: the variant that slightly increases IQ confers the same fitness (or very close to the same) as the one that slightly decreases IQ – presumably because of other effects it has. If this weren’t the case, it would be impossible for both of the variants to remain common.

The rare variants that affect IQ will generally decrease IQ – and since pleiotropy is the norm, usually they’ll be deleterious in other ways as well. Genetic load.

Happy families are all alike; every unhappy family is unhappy in its own way.:
It now looks as if the majority of the genetic variance in IQ is the product of mutational load, and the same may be true for many psychological traits. To the extent this is the case, a lot of human psychological variation must be non-adaptive. Maybe some personality variation fulfills an evolutionary function, but a lot does not. Being a dumb asshole may be a bug, rather than a feature. More generally, this kind of analysis could show us whether particular low-fitness syndromes, like autism, were ever strategies – I suspect not.

It’s bad new news for medicine and psychiatry, though. It would suggest that what we call a given type of mental illness, like schizophrenia, is really a grab-bag of many different syndromes. The ultimate causes are extremely varied: at best, there may be shared intermediate causal factors. Not good news for drug development: individualized medicine is a threat, not a promise.

see also comment at:
So the big implication here is that it's better than I had dared hope - like Yang/Visscher/Hsu have argued, the old GCTA estimate of ~0.3 is indeed a rather loose lower bound on additive genetic variants, and the rest of the missing heritability is just the relatively uncommon additive variants (ie <1% frequency), and so, like Yang demonstrated with height, using much more comprehensive imputation of SNP scores or using whole-genomes will be able to explain almost all of the genetic contribution. In other words, with better imputation panels, we can go back and squeeze out better polygenic scores from old GWASes, new GWASes will be able to reach and break the 0.3 upper bound, and eventually we can feasibly predict 0.5-0.8. Between the expanding sample sizes from biobanks, the still-falling price of whole genomes, the gradual development of better regression methods (informative priors, biological annotation information, networks, genetic correlations), and better imputation, the future of GWAS polygenic scores is bright. Which obviously will be extremely helpful for embryo selection/genome synthesis.

The argument that this supports mutation-selection balance is weaker but plausible. I hope that it's true, because if that's why there is so much genetic variation in intelligence, then that strongly encourages genetic engineering - there is no good reason or Chesterton fence for intelligence variants being non-fixed, it's just that evolution is too slow to purge the constantly-accumulating bad variants. And we can do better.

The surprising implications of familial association in disease risk:
As Greg Cochran has pointed out, this probably isn’t going to work. There are a few genes like BRCA1 (which makes you more likely to get breast and ovarian cancer) that we can detect and might affect treatment, but an awful lot of disease turns out to be just the result of random chance and deleterious mutation. This means that you can’t easily tailor disease treatment to people’s genes, because everybody is fucked up in their own special way. If Johnny is schizophrenic because of 100 random errors in the genes that code for his neurons, and Jack is schizophrenic because of 100 other random errors, there’s very little way to test a drug to work for either of them- they’re the only one in the world, most likely, with that specific pattern of errors. This is, presumably why the incidence of schizophrenia and autism rises in populations when dads get older- more random errors in sperm formation mean more random errors in the baby’s genes, and more things that go wrong down the line.

The looming crisis in human genetics:
Some awkward news ahead
- Geoffrey Miller

Human geneticists have reached a private crisis of conscience, and it will become public knowledge in 2010. The crisis has depressing health implications and alarming political ones. In a nutshell: the new genetics will reveal much less than hoped about how to cure disease, and much more than feared about human evolution and inequality, including genetic differences between classes, ethnicities and races.

study  preprint  bio  biodet  behavioral-gen  GWAS  missing-heritability  QTL  🌞  scaling-up  replication  iq  education  spearhead  sib-study  multi  west-hunter  scitariat  genetic-load  mutation  medicine  meta:medicine  stylized-facts  ratty  unaffiliated  commentary  rhetoric  wonkish  genetics  genomics  race  pop-structure  poast  population-genetics  psychiatry  aphorism  homo-hetero  generalization  scale  state-of-art  ssc  reddit  social  summary  gwern  methodology  personality  britain  anglo  enhancement  roots  s:*  2017  data  visualization  database  let-me-see  bioinformatics  news  org:rec  org:anglo  org:biz  track-record  prediction  identity-politics  pop-diff  recent-selection  westminster  inequality  egalitarianism-hierarchy  high-dimension  applications  dimensionality  ideas  no-go  volo-avolo  magnitude  variance-components  GCTA  tradeoffs  counter-revolution  org:mat  dysgenics  paternal-age  distribution  chart  abortion-contraception-embryo 
june 2017 by nhaliday
Estimating the number of unseen variants in the human genome
To find all common variants (frequency at least 1%) the number of individuals that need to be sequenced is small (∼350) and does not differ much among the different populations; our data show that, subject to sequence accuracy, the 1000 Genomes Project is likely to find most of these common variants and a high proportion of the rarer ones (frequency between 0.1 and 1%). The data reveal a rule of diminishing returns: a small number of individuals (∼150) is sufficient to identify 80% of variants with a frequency of at least 0.1%, while a much larger number (> 3,000 individuals) is necessary to find all of those variants.

A map of human genome variation from population-scale sequencing:

Scientists using data from the 1000 Genomes Project, which sequenced one thousand individuals from 26 human populations, found that "a typical [individual] genome differs from the reference human genome at 4.1 million to 5.0 million sites … affecting 20 million bases of sequence."[11] Nearly all (>99.9%) of these sites are small differences, either single nucleotide polymorphisms or brief insertion-deletions in the genetic sequence, but structural variations account for a greater number of base-pairs than the SNPs and indels.[11]

Human genetic variation:

Singleton Variants Dominate the Genetic Architecture of Human Gene Expression:
study  sapiens  genetics  genomics  population-genetics  bioinformatics  data  prediction  cost-benefit  scale  scaling-up  org:nat  QTL  methodology  multi  pdf  curvature  convexity-curvature  nonlinearity  measurement  magnitude  🌞  distribution  missing-heritability  pop-structure  genetic-load  mutation  wiki  reference  article  structure  bio  preprint  biodet  variance-components  nibble  chart 
may 2017 by nhaliday
Human genome - Wikipedia
There are an estimated 19,000-20,000 human protein-coding genes.[4] The estimate of the number of human genes has been repeatedly revised down from initial predictions of 100,000 or more as genome sequence quality and gene finding methods have improved, and could continue to drop further.[5][6] Protein-coding sequences account for only a very small fraction of the genome (approximately 1.5%), and the rest is associated with non-coding RNA molecules, regulatory DNA sequences, LINEs, SINEs, introns, and sequences for which as yet no function has been determined.[7]
bio  sapiens  genetics  genomics  bioinformatics  scaling-up  data  scale  wiki  reference  QTL  methodology 
may 2017 by nhaliday
Sequencing a genome for less than the cost of an X-ray? Not quite yet
A $100 genome will cost $100 in the same way that the $1,000 genome costs $1,000. As in, it won’t, at least not soon. “The $1,000 genome” — which sequencer makers began promising about five years ago — “costs us $3,000,” said Richard Gibbs, founder of the Baylor College of Medicine Human Genome Sequencing Center and one of the leaders of the original Human Genome Project in the 1990s.
news  org:sci  scaling-up  data  scale  genetics  genomics  biotech  money  efficiency  bioinformatics  cost-benefit  frontier  speedometer  measurement 
april 2017 by nhaliday

« earlier    

related tags

2013  2014  2017  :/  abortion-contraception-embryo  accelerationism  accuracy  acm  advanced-manufacturing  albion  analysis  anglo  announcement  aphorism  applications  arms  around-the-web  article  asia  attaq  audio  authoritarianism  autism  backup  behavioral-gen  benchmarks  big-picture  bio  biodet  biohacking  bioinformatics  biotech  bostrom  bounded-cognition  brands  britain  business-expansion  calculation  cancer  candidate-gene  chart  chicago  china  christianity  civil-liberty  clarity  classic  cocktail  coding-theory  cog-psych  comedy  commentary  comparison  compensation  competition  composition-decomposition  compressed-sensing  concentration-of-measure  concept  confusion  convexity-curvature  cooperate-defect  coordination  core-rats  correlation  cost-benefit  counter-revolution  cracker-econ  crispr  critique  culture  curiosity  current-events  curvature  data-science  data  database  dataset  debate  deep-materialism  democracy  demographics  dependence-independence  dimensionality  discussion  disease  distribution  dropbox  duplication  dysgenics  economics  education  effect-size  efficiency  egalitarianism-hierarchy  elite  embodied  empirical  enhancement  epidemiology  error  essay  estimate  ethics  evidence-based  evolution  expectancy  explanans  explanation  extrema  faq  fda  fisher  flux-stasis  frontier  futurism  gcta  gender  generalization  genetic-correlation  genetic-load  genetics  genomics  giants  gnon  gnxp  government  guide  gwas  gwern  hard-tech  health  high-dimension  history  hmm  hn  homo-hetero  hsu  human-capital  hypothesis-testing  ideas  identity-politics  ideology  idk  iidness  imarket  immune  impact  impetus  incentives  india  inequality  info-dynamics  info-foraging  init  innovation  intelligence  interdisciplinary  intersection-connectedness  intervention  interview  intricacy  ioannidis  iq  iteration-recursion  jargon  knowledge  left-wing  len:short  lens  let-me-see  letters  levers  limits  linear-models  linearity  liner-notes  links  list  load  lol  longitudinal  magnitude  manufacturing  map-territory  math  measurement  medicine  mendel-randomization  meta-analysis  meta:medicine  meta:science  metabuch  methodology  microfoundations  milwaukee  minimum-viable  miri-cfar  missing-heritability  ml-map-e  model-organism  models  moloch  moments  money  monte-carlo  morality  mostly-modern  multi  mutation  neuro-nitgrit  neuro  new-religion  news  nibble  no-go  nonlinearity  norms  novelty  nuclear  objektbuch  occam  old-anglo  orders  org:anglo  org:biz  org:data  org:edu  org:foreign  org:gov  org:lite  org:mag  org:mat  org:nat  org:popup  org:rec  org:sci  organization  orient  other-xtian  outliers  parenting  parsimony  paternal-age  pdf  personality  phase-transition  physics  piracy  poast  podcast  policy  politics  poll  pop-diff  pop-structure  popsci  population-genetics  prediction  preprint  presentation  probability  protestant-catholic  psychiatry  psychology  psychometrics  public-private-partnerships  q-n-a  qtl  quotes  race  random-matrices  randy-ayndy  ratty  realness  recent-selection  reddit  reference  regression  regularization  regularizer  regulation  religion  replication  research-commercialization  rhetoric  right-wing  roots  s-factor  s:***  s:**  s:*  saas  sanctity-degradation  sapiens  scalability  scale  science  scitariat  search  selection  shakespeare  sib-study  signal-noise  similarity  simulation  singularity  sinosphere  skunkworks  sleuthin  slides  slippery-slope  social  society  software  sparsity  spearhead  speedometer  spreading  ssc  stackex  startups  stat-mech  stat-power  state-of-art  stats  stories  stream  street-fighting  stress  structure  study  stylized-facts  summary  survey  tails  talks  tech  technology  the-trenches  the-world-is-just-atoms  theos  thinking  this-week-262  this-week-306  tidbits  tightness  tip-of-tongue  tools  toys  track-record  tracker  tradeoffs  trends  trivia  tutorial  twitter  unaffiliated  usa  values  variance-components  video  visualization  volo-avolo  webdev  west-hunter  westminster  white-paper  wiki  wisconsin  wonkish  workflow  yak-shaving  yvain  🌞  🎩  🐸  🔬  🤖 

Copy this bookmark: