[1806.09783] Gradient Acceleration in Activation Functions
Dropout has been one of standard approaches to train deep neural networks, and it is known to regularize large models to avoid overfitting. The effect of dropout has been explained by avoiding co-adaptation. In this paper, however, we propose a new explanation of why dropout works and propose a new technique to design better activation functions. First, we show that dropout is an optimization technique to push the input towards the saturation area of nonlinear activation function by accelerating gradient information flowing even in the saturation area in backpropagation. Based on this explanation, we propose a new technique for activation functions, gradient acceleration in activation function (GAAF), that accelerates gradients to flow even in the saturation area. Then, input to the activation function can climb onto the saturation area which makes the network more robust because the model converges on a flat region. Experiment results support our explanation of dropout and confirm that the proposed GAAF technique improves performances with expected properties.
[1806.01337] Backdrop: Stochastic Backpropagation
We introduce backdrop, a flexible and simple-to-implement method, intuitively described as dropout acting only along the backpropagation pipeline. Backdrop is implemented via one or more masking layers which are inserted at specific points along the network. Each backdrop masking layer acts as the identity in the forward pass, but randomly masks parts of the backward gradient propagation. Intuitively, inserting a backdrop layer after any convolutional layer leads to stochastic gradients corresponding to features of that scale. Therefore, backdrop is well suited for problems in which the data have a multi-scale, hierarchical structure. Backdrop can also be applied to problems with non-decomposable loss functions where standard SGD methods are not well suited. We perform a number of experiments and demonstrate that backdrop leads to significant improvements in generalization.
[1307.1493] Dropout Training as Adaptive Regularization
Dropout and other feature noising schemes control overfitting by artificially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is first-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and find that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classification tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset.
University of Arizona tracks student ID cards to detect who might drop out • The Verge
Shannon Liao:
<p>The University of Arizona is tracking freshman students’ ID card swipes to anticipate which students are more likely to drop out. University researchers hope to use the data to lower dropout rates. (Dropping out refers to those who have left higher-education entirely and those who transfer to other colleges.)

The card data tells researchers how frequently a student has entered a residence hall, library, and the student recreation center, which includes a salon, convenience store, mail room, and movie theater. The cards are also used for buying vending machine snacks and more, putting the total number of locations near 700. There’s a sensor embedded in the CatCard student IDs, which are given to every student attending the university.

“By getting their digital traces, you can explore their patterns of movement, behavior and interactions, and that tells you a great deal about them,” Sudha Ram, a professor of management information systems who directs the initiative, <a href="https://uanews.arizona.edu/story/ua-looks-digital-traces-help-students">said in a press release</a>.

Researchers have gathered freshman data over a three-year time frame so far, and they found that their predictions for who is more likely to drop out are 73% accurate.</p>

Big data brother is everywhere.
