Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints | BMC Medical Research Methodology | Full Text

… We compared three relatively modern modelling techniques: support vector machines (SVM), neural nets (NN), and random forests (RF) and two classical techniques: logistic regression (LR) and classification and regression trees (CART). We created three large artificial databases with 20 fold, 10 fold and 6 fold replication of subjects, where we generated dichotomous outcomes according to different underlying models. We applied each modelling technique to increasingly larger development parts (100 repetitions). The area under the ROC-curve (AUC) indicated the performance of each model in the development part and in an independent validation part. Data hungriness was defined by plateauing of AUC and small optimism (difference between the mean apparent AUC and the mean validated AUC <0.01).


We found that a stable AUC was reached by LR at approximately 20 to 50 events per variable, followed by CART, SVM, NN and RF models. Optimism decreased with increasing sample sizes and the same ranking of techniques. The RF, SVM and NN models showed instability and a high optimism even with >200 events per variable."

@via http://www.fharrell.com/post/stat-ml/
stats:models  stats:machine-learning 
8 hours ago
RmS < Main < Vanderbilt Biostatistics Wiki
> Regression Modeling Strategies
> With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis

Frank Harrell.
r  stats:regression 
8 hours ago
Navigating Statistical Modeling and Machine Learning | Statistical Thinking
"1. Do you want to isolate the effect of special variables or have an interpretable model? If yes, turn left toward SM [stat. modeling]; if no, keep driving …
2. Is your sample size less than huge? If yes, park in the space designated “SM”; if no, …
3. Is your signal:noise low? If yes, take the ramp toward “SM”; if no, …
4. Is there interest in estimating the uncertainty in forecasts? If yes, merge into SM lane; if no, …
5. Is non-additivity/complexity expected to be strong? If yes, gun the pedal toward ML; if no, … you can continue the journey with SM."

@seealso http://www.fharrell.com/post/stat-ml/
stats:models  stats:machine-learning 
12 hours ago
EUROPP – The death of ‘business as usual’ in the EU
"Ironically, it is precisely these member states whose publics are more Eurosceptic that are also more successful in weakening, or watering down, these CSRs during the amendment phase in the Council. And it is also these member states which are least likely to actually act on the Commission’s recommendations. While this raises questions about how the Commission responds to such activities in the Council and vice versa, it also sheds light on just how deeply the EU has been penetrated by the forces of politicisation: even in relatively technocratic procedures like the European Semester, politicisation affects the behaviour of individual actors and the interaction between them."
eu:politics  eu:economy 
17 hours ago
How to Survive a Laptop Fire
"Everything on my local hard-disk is disposable." -- I get the idea, but one consequence is that everything you own is "double-owned" -- some corporation also owns your stuff online. If that corporation goes dark, you lose your stuff; if that corporation backdoors your stuff, you share it forcibly.

Is that problematic? Yes, in both scenarios, because it damages your sense of ownership and your sense of privacy, two things that we actually need to function properly. Without ownership, we become addicted to the networks that replace possession -- and that's definitely not as desirable as a sense of private property, which is, to some extent, a component of privacy. (Not everything about private property is right, but some of it is.)

The loss of privacy is largely a side-effect of the point above, and the combination of both is terrible -- the mental equation is just toxic, unlike sharing some things voluntarily (see creation of public goods like F/LOSS, commons etc.).
mac:backup  web:social-networks 
21 hours ago
Facebook management moves around but people don’t leave - Recode
"It’s sometimes hard to grasp Facebook’s massive scale. The site has more than 2.2 billion users worldwide. In fact, there are nearly as many people who actively use Facebook every month as there are followers of Christianity. Facebook’s 2017 revenue — around $40 billion — was more than the GDP of about 100 different countries. It’s not an exaggeration to think that second- and third-tier Facebook executives have a chance to impact more lives than most of the world’s elected politicians."
Does "status threat" explain the 2016 presidential vote? - Statistical Modeling, Causal Inference, and Social Science
Elementary points on how to read regression coefficients.

Also, this:

"The z-score, or p-value, or statistical significance of a coefficient is a noisy random variable. It’s just wrong to equate statistical significance with reality. It’s wrong in that the posited story can be wrong and you can still get statistical significance (through a combination of model misspecification and noise), and in that the posited story can be right and you can still not get statistical significance (again, model misspecification and noise). I know this is how lots of people do social science, but it’s not how statistics works."
polisci:elections  usa:politics  stats:regression 
FreshRSS, a free, self-hostable aggregator
Recommends PHP7, though PHP5 should work.
Banlieues : deux députés relancent le débat sur les statistiques ethniques
"Chargés d’interroger l’action de l’Etat dans l’exercice de ses missions régaliennes dans le « 9-3 », les auteurs confient avoir été confrontés, « dès les premières auditions », à « une méconnaissance de la population », à commencer par le nombre réel d’habitants du département francilien."
fr:ethnic-politics  soc:quantification  fr:immigration 
azat-co/fullstack-javascript: Source code for the Fullstack JavaScript book
"Full Stack JavaScript: Learn Backbone.js, Node.js and MongoDB, 2nd and 3rd Editions"
2 days ago
[1805.06005] Reconstructing mesoscale network structures
"When facing complex mesoscale network structures, it is generally believed that (null) models encoding the modular organization of nodes must be employed. The present paper focuses on two block structures that characterize the mesoscale organization of many real-world networks, i.e. the bow-tie and the core-periphery ones. Our analysis shows that constraining the network degree sequence is often enough to reproduce such structures, as confirmed by model selection criteria as AIC or BIC. As a byproduct, our paper enriches the toolbox for the analysis of bipartite networks - still far from being complete. The aforementioned structures, in fact, partition the networks into asymmetric blocks characterized by binary, directed connections, thus calling for the extension of a recently-proposed method to randomize undirected, bipartite networks to the directed case."
networks:bipartite  networks:null-models 
4 days ago
Yiqing Xu | Software - interflex: Producing Flexible Marginal Effect Estimates with Multiplicative Interaction Models
"This package performs diagnostics to assess assumptions of multiplicative interaction models (namely, linear interaction effect and common support) and impliments [sic] flexible estimation strategies that allow for nonlinear interaction effects and safeguard against excessive extrapolation."

@reference http://yiqingxu.org/papers/english/2018_HMX_interaction/main.pdf
r  stats:stata  stats:regression  academia:publishing 
4 days ago
Economic development and democracy: An electoral connection - KNUTSEN - - European Journal of Political Research - Wiley Online Library
"… only election‐centred indicators are robustly associated with economic development. […] Further analysis shows that development affects electoral democracy by reducing electoral fraud, election violence and vote buying."
polisci:democracy  polisci:elections 
6 days ago
Why combatants fight: the Irish Republican Army and the Bosnian Serb Army compared | SpringerLink
"The study demonstrates that nationalism played a relatively marginal role in combatants’ motivation to fight. Instead our research indicates that individualist motivations, small group solidarity, and local networks dominate."
violence:war  polisci:ideology 
6 days ago
Attack When the World Is Not Watching? US News and the Israeli-Palestinian Conflict | Journal of Political Economy: Ahead of Print
"We find that Israeli attacks are more likely to occur when US news on the following day is dominated by important predictable events. Strategic timing applies to attacks that bear risk of civilian casualties and are not too costly to postpone. Content analysis suggests that Israel’s strategy aims at minimizing next-day coverage, which is especially charged with negative emotional content. Palestinian attacks do not appear to be timed to US news."
world:israel-palestine  world:israel 
6 days ago
Reproducibility of research: Issues and proposed remedies | PNAS
" The Colloquium was organized by David Allison, Richard Shiffrin, Victoria Stodden, and Stephen Fienberg [†]."
academia:research  via:cshalizi 
8 days ago
The Matthew effect in science funding | PNAS
"… Our results show that winners just above the funding threshold accumulate more than twice as much funding during the subsequent eight years as nonwinners with near-identical review scores that fall just below the threshold. This effect is partly caused by nonwinners ceasing to compete for other funding opportunities, revealing a “participation” mechanism driving the Matthew effect." -- I have seen that mechanism at work in France: young researchers spend huge energy and time on the first project, get rejected, depressed and black-sheeped by the top brass, and never submit again.
academia:funding  via:cshalizi 
8 days ago
Mass Mobilization in Autocracies Database
"The Mass Mobilization in Autocracies Database (MMAD) contains sub-national data on mass mobilization events in autocracies worldwide."
data:social-science  polisci 
8 days ago
Comment les « leaks » ont changé la façon de travailler de la justice
"Bercy, qui a déclenché au total 411 contrôles fiscaux dans cette affaire, préfère généralement traiter les cas de fraude fiscale par des redressements avec pénalités financières à la clé, sans les transmettre à l’autorité judiciaire. Un choix qui leur appartient, et qu’on surnomme le « verrou de Bercy ». Mais qui n’a pas pu jouer en l’espèce, les informations étant déjà dans l’espace public."
fr:economy  taxation  corruption 
9 days ago
Nous avons rencontré Josep Borrell – Le grand continent
"La social-démocratie a perdu le contact avec les classes populaires qui lui garantissaient de compter au moins pour 40% de son électorat. Auparavant, les sociaux-démocrates étaient autour de 40% dans toutes les élections en Europe. Désormais, ils n’obtiennent plus que 20%, au maximum – et beaucoup moins en France et aux Pays-Bas. Sir Julian Pristley, qui fut secrétaire du Parlement européen, avait écrit un article intitulé « Getting Used to 25 percent » en 2016. C’était un avertissement pour les sociaux-démocrates."

-- With much more stuff (Catalonia, Eastern European countries, …).
eu:politics  polisci:elections  polisci:ideology  world:spain 
11 days ago
omercadopopular/cgoes: Research by Carlos Góes
"This repository aggregates research by Carlos Góes, Economic Advisor at the Office of the President of Brazil. "
11 days ago
pirg/AIISL: An Interactive Introduction to Supervised Learning
Includes a good joke on "Decision Trees", and a spoiler: "we've been doing machine learning for quite some time".

Also, about neural networks: Convolutional NN for spatial correlations, Recurrent NN for time dependence.
python  stats:machine-learning  stats:random-forests  stats:neural-networks 
11 days ago
« earlier      
