graphical-models   284

« earlier    

[1801.08364] Model selection and local geometry
We consider problems in model selection caused by the geometry of models close to their points of intersection. In some cases, including common classes of causal or graphical models as well as time series models, distinct models may nevertheless have identical tangent spaces. This has two immediate consequences: first, in order to obtain constant power to reject one model in favour of another we need local alternative hypotheses that decrease to the null at a slower rate than the usual parametric n−1/2 (typically we will require n−1/4 or slower); in other words, to distinguish between the models we need large effect sizes or very large sample sizes. Second, we show that under even weaker conditions on their tangent cones, models in these classes cannot be made simultaneously convex by a reparameterization.
This shows that Bayesian network models, amongst others, cannot be learned directly with a convex method similar to the graphical lasso. However, we are able to use our results to suggest methods for model selection that learn the tangent space directly, rather than the model itself. In particular, we give a generic algorithm for learning discrete ancestral graph models, which includes Bayesian network models as a special case.
model-selection  graphical-models  geometry  via:rvenkat 
july 2018 by arsyed
[1803.01422] DAGs with NO TEARS: Smooth Optimization for Structure Learning
Estimating the structure of directed acyclic graphs (DAGs, also known as Bayesian networks) is a challenging problem since the search space of DAGs is combinatorial and scales superexponentially with the number of nodes. Existing approaches rely on various local heuristics for enforcing the acyclicity constraint and are not well-suited to general purpose optimization packages for their solution. In this paper, we introduce a fundamentally different strategy: We formulate the structure learning problem as a smooth, constrained optimization problem over real matrices that avoids this combinatorial constraint entirely. This is achieved by a novel characterization of acyclicity that is not only smooth but also exact. The resulting nonconvex, constrained program involves smooth functions whose gradients are easy to compute and only involve elementary matrix operations. By using existing black-box optimization routines, our method uses global search to find an optimal DAG and can be implemented in about 50 lines of Python and outperforms existing methods without imposing any structural constraints.
graphical-models  dag  structure-learning  optimization  black-box 
march 2018 by arsyed
Ordinal Graphical Models: A Tale of Two Approaches
Undirected graphical models or Markov random fields (MRFs) are widely used for modeling multivariate probability distributions. Much of the work on MRFs has focused on continuous variables, and nominal variables (that is, unordered categorical variables). However, data from many real world applications involve ordered categorical variables also known as ordinal variables, e.g., movie ratings on Netflix which can be ordered from 1 to 5 stars. With respect to univariate ordinal distributions, as we detail in the paper, there are two main categories of distributions; while there have been efforts to extend these to multivariate ordinal distributions, the resulting distributions are typically very complex, with either a large number of parameters, or with non-convex likelihoods. While there have been some work on tractable approximations, these do not come with strong statistical guarantees, and moreover are relatively computationally expensive. In this paper, we theoretically investigate two classes of graphical models for ordinal data, corresponding to the two main categories of univariate ordinal distributions. In contrast to previous work, our theoretical developments allow us to provide correspondingly two classes of estimators that are not only computationally efficient but also have strong statistical guarantees.
graphical-models  ordinal 
march 2018 by arsyed
Fitting a Structural Equation Model
seems rather unrigorous: nonlinear optimization, possibility of nonconvergence, doesn't even mention local vs. global optimality...
pdf  slides  lectures  acm  stats  hypothesis-testing  graphs  graphical-models  latent-variables  model-class  optimization  nonlinearity  gotchas  nibble  ML-MAP-E  iteration-recursion  convergence 
november 2017 by nhaliday
Does Learning to Read Improve Intelligence? A Longitudinal Multivariate Analysis in Identical Twins From Age 7 to 16
Stuart Richie, Bates, Plomin


The variance explained by each path in the diagrams included here can be calculated by squaring its path weight. To take one example, reading differences at age 12 in the model shown in Figure​Figure33 explain 7% of intelligence differences at age 16 (.262). However, since our measures are of differences, they are likely to include substantial amounts of noise: Measurement error may produce spurious differences. To remove this error variance, we can take an estimate of the reliability of the measures (generally high, since our measures are normed, standardized tests), which indicates the variance expected purely by the reliability of the measure, and subtract it from the observed variance between twins in our sample. Correcting for reliability in this way, the effect size estimates are somewhat larger; to take the above example, the reliability-corrected effect size of age 12 reading differences on age 16 intelligence differences is around 13% of the “signal” variance. It should be noted that the age 12 reading differences themselves are influenced by many previous paths from both reading and intelligence, as illustrated in Figure​Figure33.


The present study provided compelling evidence that improvements in reading ability, themselves caused purely by the nonshared environment, may result in improvements in both verbal and nonverbal cognitive ability, and may thus be a factor increasing cognitive diversity within families (Plomin, 2011). These associations are present at least as early as age 7, and are not—to the extent we were able to test this possibility—driven by differences in reading exposure. Since reading is a potentially remediable ability, these findings have implications for reading instruction: Early remediation of reading problems might not only aid in the growth of literacy, but may also improve more general cognitive abilities that are of critical importance across the life span.

Does Reading Cause Later Intelligence? Accounting for Stability in Models of Change:
Results from a state–trait model suggest that reported effects of reading ability on later intelligence may be artifacts of previously uncontrolled factors, both environmental in origin and stable during this developmental period, influencing both constructs throughout development.
study  albion  scitariat  spearhead  psychology  cog-psych  psychometrics  iq  intelligence  eden  language  psych-architecture  longitudinal  twin-study  developmental  environmental-effects  studying  🌞  retrofit  signal-noise  intervention  causation  graphs  graphical-models  flexibility  britain  neuro-nitgrit  effect-size  variance-components  measurement  multi  sequential  time  composition-decomposition  biodet  behavioral-gen  direct-indirect  systematic-ad-hoc  debate  hmm  pdf  piracy  flux-stasis 
september 2017 by nhaliday
Lecture notes for Stanford probabilistic graphical models course.
statistics  graphical-models  probabilistic-graphical-models 
april 2017 by Bartcardi
PsycARTICLES - Is education associated with improvements in general cognitive ability, or in specific skills?
Results indicated that the association of education with improved cognitive test scores is not mediated by g, but consists of direct effects on specific cognitive skills. These results suggest a decoupling of educational gains from increases in general intellectual capacity.

look at Model C for the coefficients

How much does education improve intelligence? A meta-analysis:
Intelligence test scores and educational duration are positively correlated. This correlation can be interpreted in two ways: students with greater propensity for intelligence go on to complete more education, or a longer education increases intelligence. We meta-analysed three categories of quasi-experimental studies of educational effects on intelligence: those estimating education-intelligence associations after controlling for earlier intelligence, those using compulsory schooling policy changes as instrumental variables, and those using regression-discontinuity designs on school-entry age cutoffs. Across 142 effect sizes from 42 datasets involving over 600,000 participants, we found consistent evidence for beneficial effects of education on cognitive abilities, of approximately 1 to 5 IQ points for an additional year of education. Moderator analyses indicated that the effects persisted across the lifespan, and were present on all broad categories of cognitive ability studied. Education appears to be the most consistent, robust, and durable method yet to be identified for raising intelligence.

three study designs: control for prior IQ, exogenous policy change, and school age cutoff regression discontinuity
It’s surprising that there isn’t much of a fadeout (p11) – half of the effect size is still there by age 70 (?!). That wasn’t what I expected. Maybe they’re being pulled upwards by smaller outlier studies – most of the bigger ones tend towards the lower end.
These gains are hollow, as they acknowledge in the discussion. Examples:
albion  spearhead  scitariat  study  psychology  cog-psych  iq  large-factor  education  intervention  null-result  longitudinal  britain  anglo  psychometrics  psych-architecture  graphs  graphical-models  causation  neuro-nitgrit  effect-size  stylized-facts  direct-indirect  flexibility  input-output  evidence-based  preprint  multi  optimism  meta-analysis  west-hunter  poast  commentary  aging  marginal  europe  nordic  shift  twitter  social  backup  ratty  gwern  links  flynn  environmental-effects  debate  roots 
march 2017 by nhaliday
[1701.00652] Semidefinite tests for latent causal structures
Testing whether a probability distribution is compatible with a given Bayesian network is a fundamental task in the field of causal inference, where Bayesian networks model causal relations. Here we consider the class of causal structures where all correlations between observed quantities are solely due to the influence from latent variables. We show that each model of this type imposes a certain signature on the observable covariance matrix in terms of a particular decomposition into positive semidefinite components. This signature, and thus the underlying hypothetical latent structure, can be tested in a computationally efficient manner via semidefinite programming. This stands in stark contrast with the algebraic geometric tools required if the full observable probability distribution is taken into account. The semidefinite test is compared with tests based on entropic inequalities.
statistics  bayesian  graphical-models  causality  inference  rather-interesting  to-understand  consider:feature-discovery 
february 2017 by Vaguery

« earlier    

related tags

acm  acmtariat  active-learning  aging  akrasia  albion  algebraic-geometry  algebraic-statistics  algorithmic-econ  algorithms  analysis  anglo  applications  approximation  article  arxiv  assortative-mating  atoms  backup  baeysian  bayes-ball  bayes  bayesian-belief-networks  bayesian-methods  bayesian-networks  bayesian  bayesnet  behavioral-gen  belief-functions  bernd-sturmfels  big-data  big-picture  bio  biodet  bioinformatics  biophysical-econ  black-box  blowhards  books  britain  c++  canada  cancer  carnegie-mellon  causal-inference  causal  causality  causation  cmu  code  coding-theory  cog-psych  columbia  commentary  comparison  complexity  composition-decomposition  compressed-sensing  compression  computer-vision  computerscience  concept  conditional-instruments  confidence  confounding  confusion  consider:complexity-vs-accuracy  consider:feature-discovery  consider:the-other-path  contingency-tables  control  convergence  convexity-curvature  convnet  correlation  course  courses  cpt  crf  cs228  curvature  cython  dag  data-mining  data-science  data  debate  decision-tree  deep-learning  definition  demographics  dempster-shafer  developmental  diagram  direct-indirect  discussion  distribution  diversity  dp  draft  dysgenics  ebl  eden  edu  education  effect-size  embeddings  empirical  endo-exo  endogenous-exogenous  environmental-effects  essay  europe  evidence-based  examples  experiment  expert-experience  expert  explanation  exposition  factor-graph  factor-graphs  features  fertility  flexibility  flux-stasis  flynn  fol  game-theory  gelman  gene-expression  generative  genetics  genomics  geometry  github  gotchas  gradient-descent  graph  graphs  ground-up  gwas  gwern  gxe  hari-seldon  hierarchical  hmm  homepage  human-capital  hypothesis-testing  ideas  idk  igraph  independence  inference  info-dynamics  information-theory  init  input-output  intelligence  interdisciplinary  intersting  intervention  iq  ising  iteration-recursion  iv  james-cussens  jan-draisma  jargon  judea-pearl  language  large-factor  latent-variables  latex  learning-from-data  learning-theory  learning  lecture-notes  lectures  libraries  libs  likelihood  linear-models  linkage-analysis  links  list  log-linear  logic  longitudinal  machine-learning  machine_learning  machinelearning  marginal  markov-chain  markov-chains  markov  math.ds  math  matlab  matrix-factorization  mcmc  measurement  mechanism-design  meta-analysis  meta:science  metabuch  methodology  microfoundations  mit  ml-map-e  ml  model-class  model-selection  models  monte-carlo  motivation  multi  networks  neural-net  neuro-nitgrit  nibble  nitty-gritty  nlp  nonlinearity  nonparametric  nordic  norms  notation  notes  nudge-targets  null-result  off-convex  oncology  optimism  optimization  ordinal  org:bleg  org:edu  org:junk  org:mat  oss  overflow  p:***  p:*  p:someday  pac  papers  paste  pdf  pearl.judea  peeling  pennsylvania  people  personality  pessimism  pgmpy  phase-transition  philosophy  phys-energy  physics  piracy  pkg  plate-notation  poast  pop-structure  population-genetics  ppl  preprint  princeton  probabilistic-graphical-models  probabilistic-modeling  probabilistic-models  probabilistic-reasoning  probability  productivity  prof  programming  psych-architecture  psychology  psychometrics  publications  python  q-n-a  qra  qtl  quantified-self  quixotic  r  random-fields  rather-interesting  ratty  reading  recent-selection  recommendations  reference  regression  reinforcement  repo  research-article  research  retrofit  roots  rot  sample-complexity  sampling  sanjeev-arora  sapiens  saving  scala  science  scitariat  scripts  sequential  shift  sib-study  signal-noise  simpsons-paradox  simulation  slides  social  software  sparsity  spearhead  spock  stanford  stat-mech  stat-power  statcomp  state-of-art  statistical-physics  statistical-tests  statistics  stats  steffen-lauritzen  stochastic-processes  structure-learning  structured-learning  students  study  studying  stylized-facts  submodularity  sufficient-statistics  surrogate-likelihood  surveys  systematic-ad-hoc  talks  tcs  techtariat  test  the-bones  thermo  thesis  thinking  time-series  time  to-read  to-understand  toolkit  tools  transportability  trends  tutorial  tutorials  twin-study  twitter  unit  validation  valuation-system  variable-length  variable-order  variance-components  video-lecture  virginia-dc  visualization  vmm  volo-avolo  webapps  west-hunter  wiki  woah  working-stiff  yann-lecun  yoga  🌞  👳  🔬 

Copy this bookmark: