dropout   294

« earlier    

(429) https://twitter.com/i/web/status/1072176624693596160
RT : COMING SOON: Our UPDATED report summarizes long-term trends in & completion rates by race/ethn…
HighSchool  dropout  from twitter
10 weeks ago by LibrariesVal
[1806.09783] Gradient Acceleration in Activation Functions
Dropout has been one of standard approaches to train deep neural networks, and it is known to regularize large models to avoid overfitting. The effect of dropout has been explained by avoiding co-adaptation. In this paper, however, we propose a new explanation of why dropout works and propose a new technique to design better activation functions. First, we show that dropout is an optimization technique to push the input towards the saturation area of nonlinear activation function by accelerating gradient information flowing even in the saturation area in backpropagation. Based on this explanation, we propose a new technique for activation functions, gradient acceleration in activation function (GAAF), that accelerates gradients to flow even in the saturation area. Then, input to the activation function can climb onto the saturation area which makes the network more robust because the model converges on a flat region. Experiment results support our explanation of dropout and confirm that the proposed GAAF technique improves performances with expected properties.
neural-net  analysis  dropout  activation 
july 2018 by arsyed
[1806.01337] Backdrop: Stochastic Backpropagation
We introduce backdrop, a flexible and simple-to-implement method, intuitively described as dropout acting only along the backpropagation pipeline. Backdrop is implemented via one or more masking layers which are inserted at specific points along the network. Each backdrop masking layer acts as the identity in the forward pass, but randomly masks parts of the backward gradient propagation. Intuitively, inserting a backdrop layer after any convolutional layer leads to stochastic gradients corresponding to features of that scale. Therefore, backdrop is well suited for problems in which the data have a multi-scale, hierarchical structure. Backdrop can also be applied to problems with non-decomposable loss functions where standard SGD methods are not well suited. We perform a number of experiments and demonstrate that backdrop leads to significant improvements in generalization.
dropout  backprop  multiscale  hierarchical 
june 2018 by arsyed
[1307.1493] Dropout Training as Adaptive Regularization
Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.
dropout  regularization  linear-model 
april 2018 by arsyed
University of Arizona tracks student ID cards to detect who might drop out • The Verge
Shannon Liao:
<p>The University of Arizona is tracking freshman students’ ID card swipes to anticipate which students are more likely to drop out. University researchers hope to use the data to lower dropout rates. (Dropping out refers to those who have left higher-education entirely and those who transfer to other colleges.)

The card data tells researchers how frequently a student has entered a residence hall, library, and the student recreation center, which includes a salon, convenience store, mail room, and movie theater. The cards are also used for buying vending machine snacks and more, putting the total number of locations near 700. There’s a sensor embedded in the CatCard student IDs, which are given to every student attending the university.

“By getting their digital traces, you can explore their patterns of movement, behavior and interactions, and that tells you a great deal about them,” Sudha Ram, a professor of management information systems who directs the initiative, <a href="https://uanews.arizona.edu/story/ua-looks-digital-traces-help-students">said in a press release</a>.

Researchers have gathered freshman data over a three-year time frame so far, and they found that their predictions for who is more likely to drop out are 73% accurate.</p>

Big data brother is everywhere.
university  dropout  bigdata 
march 2018 by charlesarthur

« earlier    

related tags

!fromtwitter  /learning/!resources  16ccc  1960s  2012  2016  2018-08-09  2018  2pnyc  3013  abenomics  abtesting  accounting  activation  active-learning  activism  adam  addiction  adversarial  ai  algorithm  alternative  analog  analysis  approximation  archive  art  article  asr  atelier  attainment  attention  attentioneconomy  austerity  awd-lstm  backprop  bagging  batch  bayesian-optimization  bayesian  best-practices  bicycle  bigdata  bossgirl  brake  brexit  bricolage  bullsi  burnout  caffe  calculation  calibration  cameron  career  child  church  cnn  cnns  code  college  community  conservative  convnet  convolutions  core  counterculture  crda  crime  cuda  data-augmentation  data  david  deep-learning  deeplearning  derping  developer  device  digg  diggers  digital  disc  discrimination  dit  diversity  diy  dreams  dropconnect  dropout_cost  dropout_prevention  dropout_rate  drugs  education  educational_attainment  electronics  encouragement  ensemble  entrepreneur  entrepreneurship  escape  evaluation  example  examples  experiment  exploitation  exploration  fashionfilm  fast.ai  fix  fixie  freepress  friday  fullstack  gap  gapyyear  gaussian-processes  gcse  geoffreyhinton  george  google  government  gpu  gradient-boosting  gradient-descent  grid-search  grind  hack  hardware  health  hierarchical  highereducation  highschool  hinton  howto  ifttt  impact  inapp  inspiration  islam  jam  japan  jazzdance  jsa  laguageprocessing  layers  league  learning-rate-annealing  learning  led  lenet  lgbt  life  linear-model  linux  long-short-term-memory_networks  loss  lstm  machine-learning  machinelearning  markov  marvinsroom  maxout  may  medicine  medium  melbourne  mobility  moyeu  multiscale  mécanique  narration  nasty  neet  neural-net  neural-networks  neural_network  neuralnet  neuralnetwork  neurology  neurons  neurosurgery  nn  nontrad  npr  ofsted  opportunity  optimism  optimization  osborne  overfitting  overworked  papers  parent  party  pattes  people  person  peterthiel  petty  pignon-fixe  pisa  plan  poetry  policy  politics  poverty  precariat  probabilities  pylearn  python  pytorch  qbc  quebec  radicalisation  reality  rectified-linear-unit  recurrent-neural-networks  recurrent  reference  regdomain  regularization  reinforcement-learning  research  retention  risk  rnn  rnns  rossgoodwin  samuel_shin  sanfranciscodiggers  scholarship  school  schoolpolicy  schwarzesherz  secondary  sequence  sgd  sgdr  shakeout  shortage  silicon  skills  smartphone  sms  soc  social  sortopia  sounequal  speech  staff  startup  state  statistics  strategy  structural  student  supplieracc  table  tanzverbot  tech  technology  teenager  teenagers  temperature  tensorflow  test-time-augmentation  textmessages  theano  theater  theresa  thielfellowship  time  tips  titanium  tories  training  transport  trap  trickledowneconomics  tristanharris  twt  ujap  uk  uncertainty  uncollege  unemployment  university  unplug  unreality  unschooling  us  user  valley  video  voltage  vélo  web  welfare  wellbeing  wifi  wireless  wisdom  wise  wisest  wit  work  ws2  youtube  zoneout 

Copy this bookmark: