feature-extraction 12
Machine Learning :: Text Feature Extraction (tf-idf) - Part II | Web Builder Zone
february 2012 by leecarrot
"Machine Learning :: Text Feature Extraction (tf-idf) - Part II" #MachineLearning #Python #TextFeatureExtraction #tfidf
tf-idf
feature-extraction
textmining
tfidf
Python
TextFeatureExtraction
MachineLearning
february 2012 by leecarrot
[1112.6209] Building high-level features using large scale unsupervised learning
january 2012 by Vaguery
We consider the problem of building detectors for high-level concepts using only unsupervised feature learning. For example, we would like to understand if it is possible to learn a face detector using only unlabeled images downloaded from the internet. To answer this question, we trained a simple feature learning algorithm on a large dataset of images (10 million images, each image is 200x200). The simulation is performed on a cluster of 1000 machines with fast network hardware for one week. Extensive experimental results reveal surprising evidence that such high-level concepts can indeed be learned using only unlabeled data and a simple learning algorithm.
image-analysis
image-segmentation
unsupervised-learning
learning-by-doing
feature-extraction
nudge-targets
january 2012 by Vaguery
[1105.1033] Adaptively Learning the Crowd Kernel
october 2011 by Vaguery
"We introduce an algorithm that, given n objects, learns a similarity matrix over all n^2 pairs, from crowdsourced data alone. The algorithm samples responses to adaptively chosen triplet-based relative-similarity queries. Each query has the form "is object 'a' more similar to 'b' or to 'c'?" and is chosen to be maximally informative given the preceding responses. The output is an embedding of the objects into Euclidean space (like MDS); we refer to this as the "crowd kernel." SVMs reveal that the crowd kernel captures prominent and subtle features across a number of domains, such as "is striped" among neckties and "vowel vs. consonant" among letters."
classification
ontology-discovery
crowdsourcing
feature-extraction
algorithms
nudge-targets
performance-space-analysis
october 2011 by Vaguery
[1101.4744] Clustering functional data using wavelets
october 2011 by Vaguery
"We present two methods for detecting patterns and clusters in high dimensional time-dependent functional data. Our methods are based on wavelet-based similarity measures, since wavelets are well suited for identifying highly discriminant local time and scale features. The multiresolution aspect of the wavelet transform provides a time-scale decomposition of the signals allowing to visualize and to cluster the functional data into homogeneous groups. For each input function, through its empirical orthogonal wavelet transform the first method uses the distribution of energy across scales generate a handy number of features that can be sufficient to still make the signals well distinguishable. Our new similarity measure combined with an efficient feature selection technique in the wavelet domain is then used within more or less classical clustering algorithms to effectively differentiate among high dimensional populations. The second method uses dissimilarity measures between the whole time-scale representations and are based on wavelet-coherence tools. The clustering is then performed using a k-centroid algorithm starting from these dissimilarities. Practical performance of these methods that jointly designs both the feature selection in the wavelet domain and the classification distance is demonstrated through simulations as well as daily profiles of the French electricity power demand."
classification
time-series
feature-extraction
machine-learning
multiobjective-optimization
ontology-discovery
wavelets
nudge-targets
october 2011 by Vaguery
[1108.0986] A proximal point algorithm for sequential feature extraction applications
october 2011 by Vaguery
"We propose a proximal point algorithm to solve LAROS problem, that is the problem of finding a "large approximately rank-one submatrix". This LAROS problem is used to sequentially extract features in data. We also develop a new stopping criterion for the proximal point algorithm, which is based on the duality conditions of eps-optimal solutions of the LAROS problem, with a theoretical guarantee. We test our algorithm with two image databases and show that we can use the LAROS problem to extract appropriate common features from these images."
algorithms
image-segmentation
feature-extraction
nudge-targets
october 2011 by Vaguery
Feature extraction - Wikipedia, the free encyclopedia
january 2010 by ianlewis
In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction.
When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (much data, but not much information) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input.
feature-extraction
image
image-processing
When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (much data, but not much information) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input.
january 2010 by ianlewis
extremely fast text feature extraction for classification and indexing
august 2008 by chl
"[...] describes a fast method for text feature extraction that folds together unicode conversion, forced lowercasing, word boundary detection, and string hash computation. we show empirically that our integer hash features result in classifiers with equivalent statistical performance to those built using string word features, but require far less computation and less memory."
hashing
text-mining
feature-extraction
august 2008 by chl
Copy this bookmark: