wjy + datamining   16

Introduction to Data Mining
Introduction to Data Mining (Second Edition)
datascience  datamining  books  book  bigdata 
august 2018 by wjy
QMiner is a data analytics platform for processing large-scale real-time streams containing structured and unstructured data.
bigdata  datamining  service  api  analytics 
january 2016 by wjy
Weka 3 - Data Mining with Open Source Machine Learning Software in Java
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
datamining  machinelearning  opensource  java  weka 
november 2014 by wjy
Mining of Massive Datasets
The book is based on Stanford Computer Science course CS246: Mining Massive Datasets (and CS345A: Data Mining).
datamining  bigdata  ebook  datasets  books  machinelearning  mooc  datascience  stanford 
november 2014 by wjy
Orange – Data Mining Fruitful & Fun
Open source data visualization and analysis for novice and experts. Data mining through visual programming or Python scripting. Components for machine learning. Add-ons for bioinformatics and text mining. Packed with features for data analytics.
visualization  datamining  python  machinelearning  ml  software 
september 2014 by wjy
Apache Mahout: Scalable machine learning and data mining
The Apache Mahout™ project's goal is to build a scalable machine learning library.
machinelearning  datamining  apache  mahout  hadoop  ml  library  opensource 
april 2014 by wjy
Jaccard index - Wikipedia, the free encyclopedia
a statistic used for comparing the similarity and diversity of sample sets.
similarity  statistics  algorithm  wikipedia  clustering  datamining  jaccard  math  sets  programming 
march 2014 by wjy
Web mining module for Python

Pattern is a web mining module for the Python programming language. It bundles tools for data mining (Google + Twitter + Wikipedia API, web spider, HTML DOM parser), natural language processing (tagger/chunker, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, k-means clustering, Naive Bayes + k-NN + SVM classifiers) and network analysis (graph centrality and visualization). It is well documented and bundled with 30+ examples and 350+ unit tests. The source code is licensed under BSD and available from http://www.clips.ua.ac.be/pages/pattern.
datamining  text-mining  tools  library  module  mining  scraping  webmining  nlp  python 
october 2012 by wjy
Python Data Analysis Library — pandas: Python Data Analysis Library
pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.
library  dataanalysis  data-analysis  pandas  opensource  datamining  analysis  data  statistics  python 
june 2012 by wjy
A collection of command-line tools for researchers in machine learning, data mining, and related fields. All of the functionality is also provided in a clean C++ class library. Demo apps are included to show how to use the class library.
research  opensource  library  c++  programming  datamining  machinelearning 
december 2011 by wjy

Copy this bookmark: