feature-selection 14
[1112.6045] Comparing intermittency and network measurements of words and their dependency on authorship
january 2012 by Vaguery
Many features from texts and languages can now be inferred from statistical analyses using concepts from complex networks and dynamical systems. In this paper we quantify how topological properties of word co-occurrence networks and intermittency (or burstiness) in word distribution depend on the style of authors. Our database contains 40 books from 8 authors who lived in the 19th and 20th centuries, for which the following network measurements were obtained: clustering coefficient, average shortest path lengths, and betweenness. We found that the two factors with stronger dependency on the authors were the skewness in the distribution of word intermittency and the average shortest paths. Other factors such as the betweeness and the Zipf's law exponent show only weak dependency on authorship. Also assessed was the contribution from each measurement to authorship recognition using three machine learning methods. The best performance was a ca. 65 % accuracy upon combining complex network and intermittency features with the nearest neighbor algorithm. From a detailed analysis of the interdependence of the various metrics it is concluded that the methods used here are complementary for providing short- and long-scale perspectives of texts, which are useful for applications such as identification of topical words and information retrieval.
natural-language-processing
document-clustering
clustering
feature-selection
algorithms
nudge-targets
january 2012 by Vaguery
MLboost: Machine Learning boost library in Python
november 2010 by arsyed
"MLboost main goal is to speedup any Machine Learning projects by simplifying data preprocessing, features selection and data visualisation."
python
libs
machine-learning
boosting
visualization
feature-selection
november 2010 by arsyed
When should I use lasso vs ridge? - Statistical Analysis
november 2010 by arsyed
"Keep in mind that ridge regression can't zero out coefficients; thus, you either end up including all the coefficients in the model, or none of them. In contrast, the LASSO does both parameter shrinkage and variable selection automatically. If some of your covariates are highly correlated, you may want to look at the Elastic Net [3] instead of the LASSO.
I'd personally recommend using the Non-negative Garotte (NNG) [1] as its consistent in terms of estimation and variable selection [2]. Unlike LASSO and ridge regression, NNG requires an initial estimate that is then shrunk towards the origin. In the original paper, Breiman recommends the least squares solution for the initial estimate (you may however want to start the search from a ridge regression solution and use something like GCV to select the penalty parameter)."
statistics
regression
penalized
lasso
ridge
garrotte
feature-selection
I'd personally recommend using the Non-negative Garotte (NNG) [1] as its consistent in terms of estimation and variable selection [2]. Unlike LASSO and ridge regression, NNG requires an initial estimate that is then shrunk towards the origin. In the original paper, Breiman recommends the least squares solution for the initial estimate (you may however want to start the search from a ridge regression solution and use something like GCV to select the penalty parameter)."
november 2010 by arsyed
Feature Selection with the Boruta Package (Kursa, Rudnicki)
november 2010 by arsyed
"This article describes a R package Boruta, implementing a novel feature selection algorithm for finding \emph{all relevant variables}. The algorithm is designed as a wrapper around a Random Forest classification algorithm. It iteratively removes the features which are proved by a statistical test to be less relevant than random probes. The Boruta package provides a convenient interface to the algorithm. The short description of the algorithm and examples of its application are presented."
R
pkg
feature-selection
random-forest
november 2010 by arsyed
learning to select features using their properties
november 2008 by chl
"[...] we assume that each feature is represented by a set of properties, referred to as meta-features. this approach enables prediction of the quality of features without measuring their value on the training instances."
feature-selection
ml
meta-features
november 2008 by chl
"Eliminating the Birthday Paradox for Universal Features" (John Langford, Machine Learning (Theory))
april 2008 by arthegall
I don't quite understand, but I only skimmed the post. At the same time, I think he's missing the point about Bloomier filters in the comments? Maybe?
machinelearning
bloom-filters
feature-selection
online-algorithms
april 2008 by arthegall
[math/0506081] The Dantzig selector: Statistical estimation when $p$ is much larger than $n$
march 2008 by arthegall
Tao and Candes on arXiv. Includes links to responses.
research-article
linear-regression
feature-selection
statistics
inference
artxiv
march 2008 by arthegall
Information gain in decision trees - Wikipedia, the free encyclopedia
april 2007 by jmason
another interesting feature-selection algo to investigate for SpamAssassin rule development
spamassassin
feature-selection
information-gain
entropy
statistics
classification
rule-dev
rule-qa
april 2007 by jmason
Feature selection - Wikipedia, the free encyclopedia
april 2007 by jmason
might be useful to investigate some alternative feature-selection algorithms for SpamAssassin rules
feature-selection
spamassassin
statistics
rule-dev
rule-qa
april 2007 by jmason
Copy this bookmark: