csantos + statistics   667

Entropy | Free Full-Text | On Accuracy of PDF Divergence Estimators and Their Applicability to Representative Data Sampling
This paper investigates the possibility of using various probability density function divergence measures for the purpose of representative data sampling. As it turned out, the first difficulty one needs to deal with is estimation of the divergence itself. In contrast to other publications on this subject, the experimental results provided in this study show that in many cases it is not possible unless samples consisting of thousands of instances are used. Exhaustive experiments on the divergence guided representative data sampling have been performed using 26 publicly available benchmark datasets and 70 PDF divergence estimators, and their results have been analysed and discussed.
kullback-leibler  InformationTheory  divergences  estimation  statistics  density-estimation 
11 weeks ago by csantos
[0803.4101] Measuring and testing dependence by correlation of distances
Distance correlation is a new measure of dependence between random vectors. Distance covariance and distance correlation are analogous to product-moment covariance and correlation, but unlike the classical definition of correlation, distance correlation is zero only if the random vectors are independent. The empirical distance dependence measures are based on certain Euclidean distances between sample elements rather than sample moments, yet have a compact representation analogous to the classical covariance and correlation. Asymptotic properties and applications in testing independence are discussed. Implementation of the test and Monte Carlo results are also presented.
Statistics  correlation  dependence_measures 
11 weeks ago by csantos
Fanning the Flames of Hate: Social Media and Hate Crime by Karsten Müller, Carlo Schwarz :: SSRN
This paper investigates the link between social media and hate crime using Facebook data. We study the case of Germany, where the recently emerged right-wing party Alternative für Deutschland (AfD) has developed a major social media presence. We show that right-wing anti-refugee sentiment on Facebook predicts violent crimes against refugees in otherwise similar municipalities with higher social media usage. To further establish causality, we exploit exogenous variation in major internet and Facebook outages, which fully undo the correlation between social media and hate crime. We further find that the effect decreases with distracting news events; increases with user network interactions; and does not hold for posts unrelated to refugees. Our results suggest that social media can act as a propagation mechanism between online hate speech and real-life violent crime.
facebook  doom  SocialSciences  social-media  statistics  causation 
12 weeks ago by csantos
Archive ouverte HAL - Assessing and tuning brain decoders: cross-validation, caveats, and guidelines
Decoding, ie prediction from brain images or signals, calls for empirical evaluation of its predictive power. Such evaluation is achieved via cross-validation, a method also used to tune decoders' hyper-parameters. This paper is a review on cross-validation procedures for decoding in neuroimaging. It includes a didactic overview of the relevant theoretical considerations. Practical aspects are highlighted with an extensive empirical study of the common decoders in within-and across-subject predictions, on multiple datasets –anatomical and functional MRI and MEG– and simulations. Theory and experiments outline that the popular " leave-one-out " strategy leads to unstable and biased estimates, and a repeated random splits method should be preferred. Experiments outline the large error bars of cross-validation in neuroimaging settings: typical confidence intervals of 10%. Nested cross-validation can tune decoders' parameters while avoiding circularity bias. However we find that it can be more favorable to use sane defaults, in particular for non-sparse decoders.
Neuroscience  medical  imaging  crossvalidation  bias  statistics 
july 2018 by csantos
Police killings and their spillover effects on the mental health of black Americans: a population-based, quasi-experimental study - The Lancet
Police kill more than 300 black Americans—at least a quarter of them unarmed—each year in the USA. These events might have spillover effects on the mental health of people not directly affected.
mentalHealth  health  healthcare  statistics  racism  inequality  violence 
june 2018 by csantos
Predictive modeling of U.S. health care spending in late life | Science
Most deaths are unpredictable; hence, focusing on end-of-life spending does not necessarily identify “wasteful” spending.
health  healthcare  medical  informatics  Statistics 
june 2018 by csantos
RLN keras tutorial
This is a quick tutorial of the use of the Keras RLN implementation.
First, let's import and create the train and test set. In this tutorial, we're using the Boston housing price regression dataset, with additional noise features.
Statistics  MachineLearning  NeuralNetworks  DeepLearning  keras  regularization 
june 2018 by csantos
[1609.06840] Exact Sampling from Determinantal Point Processes
Determinantal point processes (DPPs) are an important concept in random matrix theory and combinatorics. They have also recently attracted interest in the study of numerical methods for machine learning, as they offer an elegant "missing link" between independent Monte Carlo sampling and deterministic evaluation on regular grids, applicable to a general set of spaces. This is helpful whenever an algorithm explores to reduce uncertainty, such as in active learning, Bayesian optimization, reinforcement learning, and marginalization in graphical models. To draw samples from a DPP in practice, existing literature focuses on approximate schemes of low cost, or comparably inefficient exact algorithms like rejection sampling. We point out that, for many settings of relevance to machine learning, it is also possible to draw exact samples from DPPs on continuous domains. We start from an intuitive example on the real line, which is then generalized to multivariate real vector spaces. We also compare to previously studied approximations, showing that exact sampling, despite higher cost, can be preferable where precision is needed.
sampling  Statistics  MachineLearning  Probability 
april 2018 by csantos
[1707.04345] Gaussian Graphical Models: An Algebraic and Geometric Perspective
Gaussian graphical models are used throughout the natural sciences, social sciences, and economics to model the statistical relationships between variables of interest in the form of a graph. We here provide a pedagogic introduction to Gaussian graphical models and review recent results on maximum likelihood estimation for such models. Throughout, we highlight the rich algebraic and geometric properties of Gaussian graphical models and explain how these properties relate to convex optimization and ultimately result in insights on the existence of the maximum likelihood estimator (MLE) and algorithms for computing the MLE.
via:arthegall  GraphicalModels  statistics  AlgebraicGeometry 
april 2018 by csantos
Deep Learning, Structure and Innate Priors | Abigail See
Video. Yann LeCun and Christopher Manning discuss the role of priors/structure in machine learning.
yannlecun  ChrisManning  watchlist  papers  prior  statistics  NeuralNetworks  MachineLearning 
february 2018 by csantos
[1711.11561] Measuring the tendency of CNNs to Learn Surface Statistical Regularities
Our main finding is that CNNs exhibit a tendency to latch onto the Fourier image statistics of the training dataset, sometimes exhibiting up to a 28% generalization gap across the various test sets. Moreover, we observe that significantly increasing the depth of a network has a very marginal impact on closing the aforementioned generalization gap. Thus we provide quantitative evidence supporting the hypothesis that deep CNNs tend to learn surface statistical regularities in the dataset rather than higher-level abstract concepts.
machinelearning  deeplearning  deep-learning  machine-learning  by:YoshuaBengio  NeuralNetworks  generalization  statistics 
january 2018 by csantos
[1706.09141] Causal Structure Learning
Graphical models can represent a multivariate distribution in a convenient and accessible form as a graph. Causal models can be viewed as a special class of graphical models that not only represent the distribution of the observed system but also the distributions under external interventions. They hence enable predictions under hypothetical interventions, which is important for decision making. The challenging task of learning causal models from data always relies on some underlying assumptions. We discuss several recently proposed structure learning algorithms and their assumptions, and compare their empirical performance under various scenarios.
papers  surveys  graphicalmodels  Causality  Statistics  MachineLearning  via:arsyed 
november 2017 by csantos
Statistics IB
Spiegelhalter's course at Cambridge.
Statistics  by:DavidSpiegelhalter 
september 2017 by csantos
What physics can tell us about inference ? [video]
There is a deep analogy between statistical inference and statistical physics; I will give a friendly introduction to both of these fields. I will then discuss phase transitions in two problems of interest to a broad range of data sciences: community detection in social and biological networks, and clustering of sparse high-dimensional data. In both cases, if our data becomes too sparse or too noisy, it suddenly becomes impossible to find the underlying pattern, or even tell if there is one. Physics both helps us locate these phase transiitons, and design optimal algorithms that succeed all the way up to this point. Along the way, I will visit ideas from computational complexity, random graphs, random matrices, and spin glass theory.
Statistics  physics  MachineLearning  watchlist  by:ChristopherMoore 
december 2016 by csantos
The Great Minds Journal Club discusses Westfall & Yarkoni (2016) – [citation needed]
“The basic problem the authors highlight is pretty simple,” said Samantha. “It’s easy to illustrate with an example. Say you want to know if eating more bacon is associated with a higher incidence of colorectal cancer–like that paper that came out a while ago suggested. In theory, you could just ask people how often they eat bacon and how often they get cancer, and then correlate the two. But suppose you find a positive correlation–what can you conclude?”
epidemiology  measurement  dialogue  statistics  causality 
june 2016 by csantos
libFM
Factorization machines (FM) are a generic approach that allows to mimic most factorization models by feature engineering. This way, factorization machines combine the generality of feature engineering with the superiority of factorization models in estimating interactions between categorical variables of large domain. libFM is a software implementation for factorization machines that features stochastic gradient descent (SGD) and alternating least squares (ALS) optimization as well as Bayesian inference using Markov Chain Monte Carlo (MCMC).

Source code
factorization  MachineLearning  statistics  interactions 
june 2016 by csantos
Collaborative Statistics
Collaborative Statistics was written by Barbara Illowsky and Susan Dean, faculty members at De Anza College in Cupertino, California. The textbook was developed over several years and has been used in regular and honors-level classroom settings and in distance learning classes. This textbook is intended for introductory statistics courses being taken by students at two– and four–year colleges who are majoring in fields other than math or engineering. Intermediate algebra is the only prerequisite. The book focuses on applications of statistical knowledge rather than the theory behind it.
Statistics  book  to:ipe 
june 2014 by csantos
« earlier      
per page:    204080120160

related tags

3d  academic  academics  activelearning  adaboost  adaptive  additivemodels  age  ai  alcohol  AleksJakulin  algebra  algebraic  AlgebraicGeometry  algorithm  algorithms  alice:ir  analysis  analytics  AndreNg  AndrewGelman  AnscombeQuartet  apache  apt  architecture  article  articles  arxiv  AUC  audio  AutomaticDifferentiation  autoregressivemodels  aws  bad-data-analysis  basics  baye  bayes  bayesforecast  bayesian  bayesiannetworks  BayesNets  behavior  BellCurve  benchmark  BenfordLaw  berkeley  BesselCorrection  bias  bibliography  BIC  bigdata  bigmemory  bioinformatics  biology  biopython  BlackSwan  blog  blogging  blogs  book  books  booleannetworks  boosting  bootstrap  bounds  brain  Brazil  Brillinger  BrownianDistanceCovariance  business  BusinessIntelligence  by:AlexSmola  by:AndrewGelman  by:CathyOneil  by:ChristopherMoore  by:CosmaShalizi  by:DanahBoyd  by:DavidSpiegelhalter  by:gappy  by:GuyLebanon  by:HadleyWickham  by:JonKleinberg  by:LiorPachter  by:MichaelDriscoll  by:MichaelIJordan  by:RadfordNeal  by:TerryTao  by:YoshuaBengio  by;cshalizi  c  c++  caida  calculus  cancer  causal  causality  causation  cbir  cca  CentralLimitTheorem  CharlesDarwin  chart  charting  charts  children  chips  ChrisManning  citation  citations  classification  ClimateChange  clojure  cloudcomputing  clustering  code  coding  cognition  cognitive  collaboration  collaborative  collectl  color  communication  community_detection  company  comparison  compiler  complexity  compressedsensing  compression  computational  computer  ComputerScience  computervision  computing  concurrent  conditionaldensityestimation  conditional_independence  conferences  conjugatepriors  contentretrieval  convex  cool  copula  corpus  correlation  CosmaShalizi  course  courses  cpp  crawler  crawling  CRF  crime  crossvalidation  crypto  cuda  culture  curriculum  dadospublicos  data  dataanalysis  database  databases  dataflow  datamining  dataset  datasets  DataVisualization  Data_Mining  DavidMacKay  DavidMackay  debian  debugging  decision  decisionmaking  deep-learning  deeplearning  del.icio.us  delicious  democracy  density-estimation  dependence_measures  design  development  diagrams  dialogue  dictionary  differential_privacy  DiffusionGeometry  DiffusionMaps  dimensionality  dimensionalityreduction  discrimination  distributed  divergences  django  documentation  doom  dsp  DynamicalSystems  ebooks  ec2  ecology  econometrics  economics  economy  education  educational  election  EM  EmpiricalBayes  encryption  energyStatistics  engineering  english  Entrepreneurship  entropy  epidemiology  epistemology  equality  essay  estadística  estatística  estimation  estimators  ethics  eugenics  evaluation  everyblock  examples  excel  experimentaldesign  experiments  explainable  ExponentialDistributions  facebook  factorization  fairness  faithfulness  FalseDiscovery  feature-extraction  FeatureSelection  feltron  filter  filters  finance  finland  Fisher  FisherInformation  flickr  flu  fmri  folksonomy  forecasting  framework  fraud  freakonomics  free  Friedman  friedman-rafsky  fun  functional  FunctionalDataAnalysis  futebol  gallery  gamma  gaussian  GaussianProcesses  gbm  genepathways  generalization  generation  generator  generegulation  genetics  genomics  geodata  geography  geometry  geostatistics  geotagging  ggplot2  gibbs  GibbsSampling  gis  glm  glossary  gnur  google  gpu  GraceWahba  graph  graphic  graphical  graphicalmodels  graphics  graphing  graphs  graphtheory  graphviz  grid  gridcomputing  gsl  gtk  gui  guide  hacking  hadoop  handbook  happiness  hardening  hardware  haskell  health  healthcare  help  heuristics  history  hmm  howto  HumanRights  humor  HypothesisTest  i2pi  IanHacking  ica  IHC  image  imageprocessing  ImageReconstruction  ImageSegmentation  imaging  imported-bookmarks  incanter  incor  independence  induction  inequalities  inequality  inference  infographics  informatics  information  informationgeometry  InformationRetrieval  informationtheory  InformationVisualization  infovis  innovation  insurance  Intelligence  InteractionModels  interactions  interactive  interface  internet  interpretability  ipython  ir  isa  java  Jaynes  jobs  journal  journals  JunkScience  jython  kalman  kalmanfilter  KarlPearson  kdd  Kendall  kepler  keras  kernel  KernelDensityEstimation  kernelMethods  KernelMethods  KevinMurphy  khan  kikuchi  knowledge  kriging  kullback-leibler  language  languages  lasso  latent  LatentDirichletAllocation  latentvariableanalysis  later  latex  lattice  law  layout  learning  LearningGraphs  learningGuides  lecture  lectures  libraries  library  lies_damn_lies_statistics  lifestream  linear  LinearAlgebra  lingpipe  linguistics  linkedin  links  linux  lisp  list  LiterateProgramming  literature  littler  lmer  log  logging  logic  LogisticRegression  LogitModels  lsa  lsi  lucene  lush  LutzPrechelt  MacGyver  machine  machine-learning  machinelearning  MachineTranslation  machine_learning  maiges  Mallat  management  manifolds  mapping  mapreduce  maps  marginalization  markov  MarkovChains  markovlogic  MarkovLogicNetworks  math  mathematics  matlab  matplotlib  matrix  MaxEnt  Maximum  mcmc  MDL  MDP  mean  measurement  media  median  medical  medicine  mef  memory  mentalHealth  metcalfe  methodology  methods  metric  MichaelJordan  MichaelWakin  microarray  microsoft  minilanguage  MinimalSpanningTree  minimumdescriptionlength  mining  mistakes  mit  mixture  MixtureModels  mmmf  mobile  model  modeling  modelling  models  monitor  monitoring  monographs  montecarlo  MOOCs  multiplecomparisons  multipletests  multivariate  music  MutualInformation  mycrocosm  mysql  NaiveBayes  nassimtaleb  NassinTaleb  nature  NegativeResults  netflix  network  networks  neural  neuralcoding  neuralnetworks  neurocomp  neuroimaging  neuroscience  news  nfs  nips  nlp  nonlinear  nonparametrics  NormalDistribution  norvig  nullhypothesis  number  numeric  numerical  numerics  numpy  nvidia  ObjectRecognition  OccamRazor  occams_razor  OCR  octave  ocw  olpc  online  onlineLearning  oom  OpenAccess  opengl  opensource  optimization  orange  orderstatistics  outliers  p-values  pac  pandas  paper  papers  parallel  ParetoDistribution  parsing  pattern  PatternRecognition  patterns  pca  pdf  people  performance  perl  personal  PeterNorvig  phd  PhiloshopyOfStatistics  philosophy  philosophyOfStatistics  phone  photos  physics  plotting  politics  popper  PowerLaws  prediction  prior  privacy  probabilistic  probabilisticProgramming  probability  ProbitModels  processes  processing  productivity  program  programming  projectEuclid  projecteuler  prolog  pseudorandom  pseudoscience  psychology  ptolemy  pt_br  public  publicdata  publishing  pymc  pystan  python  quality  quora  r  r-project  racism  random  random-numbers  randomwalks  RankCorrelation  ranking  rdf  readinglist  reasoning  recognition  recommended  reduction  reference  regression  regularization  reinforcementlearning  RelationalLearning  renyi  repository  ReproducibleResearch  resampling  research  review  risk  rmetrics  robotics  robust  roc  rpy  rpy2  rstas  rstats  rvm  salmon  sampler  sampling  sawzall  ScaleFree  scheme  science  sciencecommons  ScienceMagazine  scientific  scientificsoftware  scilab  scipy  scripting  search  searchengine  security  segmentation  semantic  semantics  semanticweb  Sequences  series  server  signal  signalprocessing  sklearn  slides  Smale  smile  sna  snow  social  social-media  SocialNetworkAnalysis  SocialSciences  socialsoftware  society  sociology  software  SoftwareEngineering  Spain  sparse  SparseCoding  sparsity  spatial  spatial-statistics  spatial_data  Spearman  Spectral  spectralgraphclustering  SpectralMethods  spellcheck  spelling  splines  spreadsheet  sql  stability  stan  stanford  statistical  StatisticalPhysics  statistical_mechanics  statistical_testing  statistics  statlog  stemming  stochastic  StockTrading  style  superstition  surveillance  survey  surveys  svd  svm  Sweave  sysadmin  systems  SãoPaulo  tagging  tags  tapply  teaching  tech  technology  template  testing  tests  tetrad  text  textmining  texture  theory  thermal  thermodynamics  thesis  thinking  time  timeseries  tips  to:AM  to:ipe  to:teach  tool  toolbox  toolkit  tools  topic_modelling  toread  towatch  tracking  trading  transforming  transparency  trees  tricks  TropicalGeometry  TropicalSemiring  tutorial  tutorials  type_safety  typography  ufabc_causal  UI  unix  usability  useR  Valencia  variance  variational  vi  via:?  via:AndrewGelman  via:arsyed  via:arthegall  via:BrendanOConnor  via:chl  via:CMastication  via:cscheid  via:cshalizi  via:dhellmann  via:gappy  via:hirata  via:kogler  via:mirwox  via:mreid  via:pskomoroch  video  videolecture  videolectures  vim  violence  vision  visual  visualization  watchlist  wavelets  web  web2.0  webcast  webdev  wiki  wikipedia  windows  workflow  WorldCup  writing  yahoo  yannlecun  zelig 

Copy this bookmark:



description:


tags: