overfitting   118

« earlier    

Ranlot/single-parameter-fit: Real numbers, data science and chaos: How to fit any dataset with a single parameter
Real numbers, data science and chaos: How to fit any dataset with a single parameter - Ranlot/single-parameter-fit
overfitting  machine-learning 
may 2019 by pmigdal
Do ImageNet Classifiers Generalize to ImageNet?
We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have beenthe focus of intense research for almost a decade, raising the danger of overfitting to excessivelyre-used test sets. By closely following the original dataset creation processes, we test to whatextent current classification models generalize to new data. We evaluate a broad range of modelsand find accuracy drops of 3% – 15% on CIFAR-10 and 11% – 14% on ImageNet. However,accuracy gains on the original test sets translate to larger gains on the new test sets. Our resultssuggest that the accuracy drops are not caused by adaptivity, but by the models’ inability togeneralize to slightly “harder” images than those found in the original test sets.
overfitting  imagenet  AI  mechanical-turk  cifar-10 
february 2019 by tarakc02
Machine Learning: Ridge Regression in Detail – Towards Data Science
Ridge regression is used to quantify the overfitting of the data through measuring the magnitude of coefficients.
DataScience  Overfitting 
december 2018 by neuralmarket
[D] Have we overfit to ImageNet? : MachineLearning
Just came across yet another architecture search paper: https://arxiv.org/abs/1712.00559 Everyone is treating .1% improvements as...
overfitting  imagenet  deep-learning 
august 2018 by pmigdal
[1806.00451] Do CIFAR-10 Classifiers Generalize to CIFAR-10?
Machine learning is currently dominated by largely experimental work focused on improvements in a few key tasks. However, the impressive accuracy numbers of the best performing models are questionable because the same test sets have been used to select these models for multiple years now. To understand the danger of overfitting, we measure the accuracy of CIFAR-10 classifiers by creating a new test set of truly unseen images. Although we ensure that the new test set is as close to the original data distribution as possible, we find a large drop in accuracy (4% to 10%) for a broad range of deep learning models. Yet more recent models with higher original accuracy show a smaller drop and better overall performance, indicating that this drop is likely not due to overfitting based on adaptivity. Instead, we view our results as evidence that current accuracy numbers are brittle and susceptible to even minute natural variations in the data distribution.
neural-net  deep-learning  computer-vision  generalization  cifar  overfitting 
june 2018 by arsyed
The Sharpe Ratio Efficient Frontier
We evaluate the probability that an estimated Sharpe ratio exceeds a given threshold in presence
of non-Normal returns. We show that this new uncertainty-adjusted investment skill metric
(called Probabilistic Sharpe ratio, or PSR) has a number of important applications: First, it
allows us to establish the track record length needed for rejecting the hypothesis that a measured
Sharpe ratio is below a certain threshold with a given confidence level. Second, it models the
trade-off between track record length and undesirable statistical features (e.g., negative skewness
with positive excess kurtosis). Third, it explains why track records with those undesirable traits
would benefit from reporting performance with the highest sampling frequency such that the IID
assumption is not violated. Fourth, it permits the computation of what we call the Sharpe ratio
Efficient Frontier (SEF), which lets us optimize a portfolio under non-Normal, leveraged returns
while incorporating the uncertainty derived from track record length. Results can be validated
using the Python code in the Appendix.
edu  bailey  backtest  overfitting 
march 2018 by ludaavics
Pseudo-Mathematics and Financial Charlatanism: The Effects of Backtest Overfitting on Out-of-Sample Performance by David H. Bailey, Jonathan Borwein, Marcos Lopez de Prado, Qiji Jim Zhu :: SSRN
We prove that high simulated performance is easily achievable after backtesting a relatively small number of alternative strategy configurations, a practice we denote “backtest overfitting”. The higher the number of configurations tried, the greater is the probability that the backtest is overfit. Because most financial analysts and academics rarely report the number of configurations tried for a given backtest, investors cannot evaluate the degree of overfitting in most investment proposals.

The implication is that investors can be easily misled into allocating capital to strategies that appear to be mathematically sound and empirically supported by an outstanding backtest. Under memory effects, backtest overfitting leads to negative expected returns out-of-sample, rather than zero performance. This may be one of several reasons why so many quantitative funds appear to fail.
edu  backtest  overfitting  bailey 
march 2018 by ludaavics

« earlier    

related tags

a_bendaizer  academia  adam  adriancolyer  adversarial  advice  ai  algorithm  analysis  annealing  artificial-intelligence  arxiv  asr  augmentation  backfitting  backtest  bagging  baidu  bailey  bayesian  betting  bias-variance  bias  biase  bigdata  bioarxiv  blog  brandon_loudermilk__ph.d.  brittle-measures  c  capacity  cartoon  centralbank  cheating  china  cifar-10  cifar  civil  classification  cleverhans  clinical-data  cognition  cogsci  common  common_sense  complexity  computer-vision  computerscience  convergence  correlation  cost  coursera  cross-validation  crossvalidation  data-mining  data-science  data  data_science  data_science__machine_learning__and_nlp  datascience  debugging  decision  deep-learning  deeplearning  degrees-of-freedom  differential-privacy  dlib  dnn  domain-adaptation  dropout  dynare  early-stopping  economics  edu  education  emergence  en.wikipedia.org  epa  error  example  fast.ai  feature-selection  financial-engineering  fitting  frequentist  gambling  gamma  gay  generalization-error  generalization  gep_gep  gopnik  gotchas  history  hn  holdout  home  horse  horseracing  human-learning  hyperparameters  hypothesis-testing  imagenet  insightful  introduction  jeremy_zhou  johnlangford  kaggle  kaldi  kernel  kifi  knn  l2  langford  learning-rate  learning-theory  learning  levine  liberalism  locations  lstm  lucas  machine-learning  machine.learning  machine  machine_learning  machinelearning  marriage  master_class_data_science  math  mechanical-turk  medicine  mistakes  ml  mladvice  mldm  model-complexity  model-fitting  model-selection  models  modelselection  nested-cross-validation  networks  neural-net  neural  neuronalnetworks  nn  noise  norms  optimisation  optimizer  over-fitting  overview  p-hacking  paper  papers  parameter  parameters  pentland  performance  pitfalls  play  polynomial  post  practical  prediction  predictions  predictiveanalytics  prior  private_discussions  process  programming  python  quora  r-project  rademacher  radialbasisfunction  randomization  rbf  regression  regularization  research-article  reusable  rights  rnn  same  scikit-learn  sense  sex  slides  smoothing  social  stat  statistics  storytelling  svm  taylorrule  tdnn  teaching  techniques  testing  tips  to-read  trading  transfer-learning  traps  tree  trends  underfitting  useful  validation  variance  winner  wop  xgboost  xkcd 

Copy this bookmark: