feature-extraction   12

Machine Learning :: Text Feature Extraction (tf-idf) - Part II | Web Builder Zone
"Machine Learning :: Text Feature Extraction (tf-idf) - Part II" #MachineLearning #Python #TextFeatureExtraction #tfidf
tf-idf  feature-extraction  textmining  tfidf  Python  TextFeatureExtraction  MachineLearning 
february 2012 by leecarrot
[1112.6209] Building high-level features using large scale unsupervised learning
We consider the problem of building detectors for high-level concepts using only unsupervised feature learning. For example, we would like to understand if it is possible to learn a face detector using only unlabeled images downloaded from the internet. To answer this question, we trained a simple feature learning algorithm on a large dataset of images (10 million images, each image is 200x200). The simulation is performed on a cluster of 1000 machines with fast network hardware for one week. Extensive experimental results reveal surprising evidence that such high-level concepts can indeed be learned using only unlabeled data and a simple learning algorithm.
image-analysis  image-segmentation  unsupervised-learning  learning-by-doing  feature-extraction  nudge-targets 
january 2012 by Vaguery
[1105.1033] Adaptively Learning the Crowd Kernel
"We introduce an algorithm that, given n objects, learns a similarity matrix over all n^2 pairs, from crowdsourced data alone. The algorithm samples responses to adaptively chosen triplet-based relative-similarity queries. Each query has the form "is object 'a' more similar to 'b' or to 'c'?" and is chosen to be maximally informative given the preceding responses. The output is an embedding of the objects into Euclidean space (like MDS); we refer to this as the "crowd kernel." SVMs reveal that the crowd kernel captures prominent and subtle features across a number of domains, such as "is striped" among neckties and "vowel vs. consonant" among letters."
classification  ontology-discovery  crowdsourcing  feature-extraction  algorithms  nudge-targets  performance-space-analysis 
october 2011 by Vaguery
[1101.4744] Clustering functional data using wavelets
"We present two methods for detecting patterns and clusters in high dimensional time-dependent functional data. Our methods are based on wavelet-based similarity measures, since wavelets are well suited for identifying highly discriminant local time and scale features. The multiresolution aspect of the wavelet transform provides a time-scale decomposition of the signals allowing to visualize and to cluster the functional data into homogeneous groups. For each input function, through its empirical orthogonal wavelet transform the first method uses the distribution of energy across scales generate a handy number of features that can be sufficient to still make the signals well distinguishable. Our new similarity measure combined with an efficient feature selection technique in the wavelet domain is then used within more or less classical clustering algorithms to effectively differentiate among high dimensional populations. The second method uses dissimilarity measures between the whole time-scale representations and are based on wavelet-coherence tools. The clustering is then performed using a k-centroid algorithm starting from these dissimilarities. Practical performance of these methods that jointly designs both the feature selection in the wavelet domain and the classification distance is demonstrated through simulations as well as daily profiles of the French electricity power demand."
classification  time-series  feature-extraction  machine-learning  multiobjective-optimization  ontology-discovery  wavelets  nudge-targets 
october 2011 by Vaguery
[1108.0986] A proximal point algorithm for sequential feature extraction applications
"We propose a proximal point algorithm to solve LAROS problem, that is the problem of finding a "large approximately rank-one submatrix". This LAROS problem is used to sequentially extract features in data. We also develop a new stopping criterion for the proximal point algorithm, which is based on the duality conditions of eps-optimal solutions of the LAROS problem, with a theoretical guarantee. We test our algorithm with two image databases and show that we can use the LAROS problem to extract appropriate common features from these images."
algorithms  image-segmentation  feature-extraction  nudge-targets 
october 2011 by Vaguery
Feature extraction - Wikipedia, the free encyclopedia
In pattern recognition and in image processing, feature extraction is a special form of dimensionality reduction.
When the input data to an algorithm is too large to be processed and it is suspected to be notoriously redundant (much data, but not much information) then the input data will be transformed into a reduced representation set of features (also named features vector). Transforming the input data into the set of features is called feature extraction. If the features extracted are carefully chosen it is expected that the features set will extract the relevant information from the input data in order to perform the desired task using this reduced representation instead of the full size input.
feature-extraction  image  image-processing 
january 2010 by ianlewis
extremely fast text feature extraction for classification and indexing
"[...] describes a fast method for text feature extraction that folds together unicode conversion, forced lowercasing, word boundary detection, and string hash computation. we show empirically that our integer hash features result in classifiers with equivalent statistical performance to those built using string word features, but require far less computation and less memory."
hashing  text-mining  feature-extraction 
august 2008 by chl

Copy this bookmark:



description:


tags: