Generalized additive models with flexible response functions | SpringerLink
"Common generalized linear models depend on several assumptions: (i) the specified linear predictor, (ii) the chosen response distribution that determines the likelihood and (iii) the response function that "maps the linear predictor to the conditional expectation of the response. Generalized additive models (GAM) provide a convenient way to overcome the restriction to purely linear predictors. Therefore, the covariates may be included as flexible nonlinear or spatial functions to avoid potential bias arising from misspecification. Single index models, on the other hand, utilize flexible specifications of the response function and therefore avoid the deteriorating impact of a misspecified response function. However, such single index models are usually restricted to a linear predictor and aim to compensate for potential nonlinear structures only via the estimated response function. We will show that this is insufficient in many cases and present a solution by combining a flexible approach for response function estimation using monotonic P-splines with additive predictors as in GAMs. Our approach is based on maximum likelihood estimation and also allows us to provide confidence intervals of the estimated effects. To compare our approach with existing ones, we conduct extensive simulation studies and apply our approach on two empirical examples, namely the mortality rate in São Paulo due to respiratory diseases based on the Poisson distribution and credit scoring of a German bank with binary responses."
[1608.00696] Can we trust the bootstrap in high-dimension?
"We consider the performance of the bootstrap in high-dimensions for the setting of linear regression, where p<n but p/n is not close to zero. We consider ordinary least-squares as well as robust regression methods and adopt a minimalist performance requirement: can the bootstrap give us good confidence intervals for a single coordinate of β? (where β is the true regression vector).
"We show through a mix of numerical and theoretical work that the bootstrap is fraught with problems. Both of the most commonly used methods of bootstrapping for regression -- residual bootstrap and pairs bootstrap -- give very poor inference on β as the ratio p/n grows. We find that the residuals bootstrap tend to give anti-conservative estimates (inflated Type I error), while the pairs bootstrap gives very conservative estimates (severe loss of power) as the ratio p/n grows. We also show that the jackknife resampling technique for estimating the variance of β̂  severely overestimates the variance in high dimensions.
"We contribute alternative bootstrap procedures based on our theoretical results that mitigate these problems. However, the corrections depend on assumptions regarding the underlying data-generation model, suggesting that in high-dimensions it may be difficult to have universal, robust bootstrapping techniques."
Building A Logistic Regression in Python, Step by Step
Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. In logistic regression, the dependent variable is a…
